What is CSV?
.csv · text/csv
CSV (Comma-Separated Values) is a plain-text tabular data format universally supported by spreadsheets, databases, and programming languages.
Overview
CSV (Comma-Separated Values) is one of the oldest and most widely used data exchange formats. Each line represents a row of data, with individual values separated by commas (or other delimiters like tabs or semicolons). Despite its simplicity, CSV remains the go-to format for data import/export across spreadsheets, databases, analytics tools, and ETL pipelines. Its plain-text nature makes it universally readable and easy to generate programmatically.
History
Comma-separated data predates personal computers, with early use in FORTRAN programs in the 1960s and 1970s. The format became widely adopted with the rise of spreadsheet software in the 1980s. RFC 4180 was published in 2005 to formalize the format, though many implementations predate and deviate from this specification. Despite numerous attempts to replace it with more structured formats, CSV remains ubiquitous due to its simplicity.
File Structure
A CSV file consists of lines of text where each line is a record. Fields within a record are separated by a delimiter (typically a comma). Fields containing the delimiter, newlines, or double quotes must be enclosed in double quotes. Double quotes within quoted fields are escaped by doubling them. An optional header row can define column names. Line endings can be CRLF or LF.
Common Use Cases
- Data export from databases and applications
- Spreadsheet data interchange
- ETL pipeline data transfer
- Machine learning dataset distribution
- Financial data reporting
- Log file analysis
- Bulk data import into databases
- Data migration between systems
Advantages
- Universal compatibility with all tools
- Human-readable plain text
- Extremely simple to generate and parse
- Small file size for tabular data
- No special software required
- Easy to version control with git
Disadvantages
- No data type information (everything is text)
- No standard for encoding specification
- Delimiter conflicts require quoting rules
- No support for hierarchical/nested data
- No metadata or schema definition
- Large files can be slow to process
Frequently Asked Questions
What is a CSV file?
A CSV (Comma-Separated Values) file is a plain text file that stores tabular data (like a spreadsheet) where each line is a row and values are separated by commas. It's the most universal format for data exchange.
How do I open a CSV file?
CSV files can be opened with Microsoft Excel, Google Sheets, LibreOffice Calc, any text editor, or programming languages like Python (pandas), R, and JavaScript. Most database tools also support CSV import.
What is the difference between CSV and Excel (XLSX)?
CSV is plain text with no formatting, formulas, or multiple sheets. XLSX is a binary format that supports formatting, formulas, charts, multiple sheets, and data types. CSV is more portable; XLSX is more feature-rich.
Why does my CSV look wrong in Excel?
Common issues include: Excel auto-formatting numbers (removing leading zeros), incorrect delimiter detection (some regions use semicolons), and encoding problems with special characters. Try importing via Data > From Text/CSV for more control.
Technical Details
- Extension
- .csv
- MIME Type
- text/csv
- Magic Bytes
- None (text-based)
- Encoding
- Typically UTF-8 or system default
- Compression
- None (use gzip externally)
- Specification
- RFC 4180
- Max Size
- No specification limit