ComputersFile Types

Format CSV, or Where to put commas?

Among the applications that are encountered, if not a programmer, then an advanced user, importing data from one application to another is not uncommon. And quite often it is necessary to import tabular data, for which, in fact, the format of CSV was invented. Now several alternative solutions have been developed (XML among them is leading). But creating a CSV file in some cases remains the most convenient solution, and some - the most problematic.

The name of the format is deciphered as commaseparated values, separated by commas. In Notepad, the file looks like this:

00, 01, 02, 03;

01, 15, 34, 11;

16, 27, 33, 06 (at the very end, you do not need to put a sign)

And in an Excel spreadsheet or similar application, it's like this:

00 01 02 03
01 15 34 eleven
16 27th 33 06

But now we have analyzed the simplest case. The problems begin precisely due to the fact that due to the very compressed data representation, the CSV format is used to transmit large amounts of information. Most often in the databases there are different values: digital, alphabetic, alphabetic, with spaces, etc. Often in large arrays there are errors, and if, for example, somewhere there appears an extra comma within the contents of one of the fields, all cells can shift to one.

In general, the problems that arise when importing data through the CSV file format can be divided into three categories:

1 - Incorrect filling of fields.

2 - Incorrect conversion of data when saving a file to CSV.

3 - Incorrect format recognition by the importing program.

We have already partially examined the first case. To combat the problem, it is necessary to provide means to control the correctness of the input data. For example, if you need to build an address database for a mailing list, there are special scripts that check whether the person has entered his e-mail in the desired format. Where there are spaces, commas and any other data that can be read incorrectly, the contents of the field must be enclosed in quotes (so-called computer "").

The second case is related to the choice of the encoding. It is necessary to find out, often by experience, what encoding the program-importer will prefer. In Windows, 1251 is universally applied, so if the source file encounters Unicode text, it will be displayed in the recipient application as a chain of questions. In Excel, it is possible to save to CSV format with different encoding and delimiters (except for a comma, tab and semicolon are still used), but it is best to create a file in Notepad ++ or OpenOfficeCalc.

The third case is a consequence of the second. Historically, the CSV format does not have a single standard. Many programs accept both CSV files of TSV (tab-delimited) and SCSV (semicolon) separations, in particular, because the file extension does not report this. In this case, the data is read erroneously and is not distributed in the cells in the required order. The best that can be advised is to create a CSV file with your own hand. As already mentioned, for this you need to use a convenient editor.

The tricks of how to tame the CSV format are known to programmers. It's enough for a simple user to follow our recommendations for creating a file. The least problem occurs when the file is created specifically for import into a specific application (it also happens in another way). It's easier to take into account all the possible problems that arise in the way of data transfer.

Similar articles

 

 

 

 

Trending Now

 

 

 

 

Newest

Copyright © 2018 en.atomiyme.com. Theme powered by WordPress.