Free online tool
In developmentConvert JSON to Parquet
Convert JSON arrays or NDJSON (newline-delimited JSON) into typed, compressed .parquet — preserving nested objects, lists, and inferred types. Online tool in development; reliable methods that work today are below.
Why convert JSON to Parquet?
JSON is everywhere — APIs, logs, MongoDB exports, webhook archives. It's a great wire format and a poor analytics format:
- Size. JSON is verbose; field names repeat on every row. Parquet stores names once and compresses values column-by-column.
- No schema. Different rows can have different keys — fine for ingestion, painful for analysis. Parquet locks in a schema.
- Slow scans. Reading 10 GB of JSON to count rows means parsing 10 GB. Reading 10 GB of Parquet to count rows reads the footer (~kB).
- Nested data is first-class in Parquet. Lists, maps, and structs stay structured — you don't have to flatten or stringify them.
Convert JSON to Parquet today (without this tool)
DuckDB — NDJSON / JSON Lines
duckdb -c "COPY (SELECT * FROM read_json_auto('data.ndjson'))
TO 'data.parquet'
(FORMAT 'parquet', COMPRESSION 'zstd')"Python (pandas)
Works for both JSON arrays and NDJSON — pass lines=True for line-delimited:
import pandas as pd
# JSON array
df = pd.read_json("data.json")
# Or NDJSON
df = pd.read_json("data.ndjson", lines=True)
df.to_parquet("data.parquet", compression="snappy")PyArrow — preserves nested structures
import pyarrow.json as pj
import pyarrow.parquet as pq
table = pj.read_json("data.ndjson")
pq.write_table(table, "data.parquet", compression="zstd")Things to watch out for
- JSON array vs NDJSON. A regular JSON array (
[{...}, {...}]) loads into memory all at once; NDJSON / JSON Lines streams line-by-line and scales to much larger files. - Inconsistent keys. If different objects have different keys, the schema becomes the union of all keys — missing values become nulls. This is fine for analytics but check your row counts after conversion.
- Nested data preserved. Arrays of objects, maps, and structs are kept as native Parquet nested types — they show up correctly when you open the file in the Parqui viewer.
- Type ambiguity.A field that's sometimes a number and sometimes a string will be coerced to string by most readers. Clean before conversion or pass an explicit schema.