Free online tool
In developmentConvert CSV to Parquet
Turn a .csv file into a typed, compressed .parquet — usually 5–10× smaller, with real column types and far faster analytics queries. Online tool in development; methods that work today are below.
Why convert CSV to Parquet?
CSV is fine for tiny files and email attachments, but it falls over fast. Three classic reasons to switch:
- Size. Real-world CSVs are usually 5–10× larger than their Parquet equivalents (Snappy compression). A 2 GB CSV often becomes ~250 MB Parquet.
- Types.CSV stores everything as text. Parquet preserves INT64, FLOAT, BOOLEAN, DATE, TIMESTAMP, DECIMAL — you don't have to re-infer types every time you load the file.
- Selective reads. Analytics engines (DuckDB, Spark, BigQuery, ClickHouse) can read only the columns they need from Parquet — the speedup is often 10–100×.
For more depth, see Parquet vs CSV — when to use each.
Convert CSV to Parquet today (without this tool)
DuckDB (recommended)
One command, no Python environment, automatic type inference:
duckdb -c "COPY (SELECT * FROM 'data.csv')
TO 'data.parquet'
(FORMAT 'parquet', COMPRESSION 'zstd')"Python (pandas + pyarrow)
pip install pandas pyarrow
import pandas as pd
df = pd.read_csv("data.csv")
df.to_parquet("data.parquet", compression="snappy")Polars (fast, lower memory)
pip install polars
import polars as pl
pl.scan_csv("data.csv").sink_parquet("data.parquet")Tips for clean conversion
- Choose Snappy or Zstd. Snappy is faster to decode; Zstd compresses ~20% better. Both are excellent defaults — never use Gzip for new files.
- Watch out for inferred types.A column that's numeric in 99% of rows but contains
"N/A"in the rest will be read as string. Clean those values before conversion or pass an explicit schema. - Date formats. Pass
parse_dates=[...]to pandas or use DuckDB'sstrptime()to coerce date columns correctly. - Verify the result. Open the resulting Parquet in the Parqui online viewer to confirm types and row counts match the original CSV.