Convert JSON to Parquet

Convert JSON arrays or NDJSON (newline-delimited JSON) into typed, compressed .parquet — preserving nested objects, lists, and inferred types. Online tool in development; reliable methods that work today are below.

Why convert JSON to Parquet?

JSON is everywhere — APIs, logs, MongoDB exports, webhook archives. It's a great wire format and a poor analytics format:

Size. JSON is verbose; field names repeat on every row. Parquet stores names once and compresses values column-by-column.
No schema. Different rows can have different keys — fine for ingestion, painful for analysis. Parquet locks in a schema.
Slow scans. Reading 10 GB of JSON to count rows means parsing 10 GB. Reading 10 GB of Parquet to count rows reads the footer (~kB).
Nested data is first-class in Parquet. Lists, maps, and structs stay structured — you don't have to flatten or stringify them.

Convert JSON to Parquet today (without this tool)

DuckDB — NDJSON / JSON Lines

duckdb -c "COPY (SELECT * FROM read_json_auto('data.ndjson'))
           TO 'data.parquet'
           (FORMAT 'parquet', COMPRESSION 'zstd')"

Python (pandas)

Works for both JSON arrays and NDJSON — pass lines=True for line-delimited:

import pandas as pd

# JSON array
df = pd.read_json("data.json")
# Or NDJSON
df = pd.read_json("data.ndjson", lines=True)

df.to_parquet("data.parquet", compression="snappy")

PyArrow — preserves nested structures

import pyarrow.json as pj
import pyarrow.parquet as pq

table = pj.read_json("data.ndjson")
pq.write_table(table, "data.parquet", compression="zstd")

Things to watch out for

JSON array vs NDJSON. A regular JSON array ([{...}, {...}]) loads into memory all at once; NDJSON / JSON Lines streams line-by-line and scales to much larger files.
Inconsistent keys. If different objects have different keys, the schema becomes the union of all keys — missing values become nulls. This is fine for analytics but check your row counts after conversion.
Nested data preserved. Arrays of objects, maps, and structs are kept as native Parquet nested types — they show up correctly when you open the file in the Parqui viewer.
Type ambiguity.A field that's sometimes a number and sometimes a string will be coerced to string by most readers. Clean before conversion or pass an explicit schema.

Related tools

Parquet → JSON →

The reverse direction. Working now.

CSV → Parquet →

Same goal, different source format. Coming soon.

All tools →

Browse every Parquet utility on Parqui.