Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Bonus: Interactive Testing

This time, we can read a real parquet file (Not those created from our CLI).

Data

The data is the titanic dataset:

curl -L -o titanic.parquet "https://huggingface.co/datasets/BIT/titanic-dataset/resolve/refs%2Fconvert%2Fparquet/default/train/0000.parquet"

Command

cargo run read titanic.parquet

Result

    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.18s
     Running `target/debug/parquet-parser read titanic.parquet`
shape: (891, 15)
┌──────────┬────────┬────────┬──────┬───┬──────┬─────────────┬───────┬───────┐
│ survived ┆ pclass ┆ sex    ┆ age  ┆ … ┆ deck ┆ embark_town ┆ alive ┆ alone │
│ ---      ┆ ---    ┆ ---    ┆ ---  ┆   ┆ ---  ┆ ---         ┆ ---   ┆ ---   │
│ i64      ┆ i64    ┆ str    ┆ f64  ┆   ┆ str  ┆ str         ┆ str   ┆ bool  │
╞══════════╪════════╪════════╪══════╪═══╪══════╪═════════════╪═══════╪═══════╡
│ 0        ┆ 3      ┆ male   ┆ 22.0 ┆ … ┆ null ┆ Southampton ┆ no    ┆ false │
│ 1        ┆ 1      ┆ female ┆ 38.0 ┆ … ┆ C    ┆ Cherbourg   ┆ yes   ┆ false │
│ 1        ┆ 3      ┆ female ┆ 26.0 ┆ … ┆ null ┆ Southampton ┆ yes   ┆ true  │
│ 1        ┆ 1      ┆ female ┆ 35.0 ┆ … ┆ C    ┆ Southampton ┆ yes   ┆ false │
│ 0        ┆ 3      ┆ male   ┆ 35.0 ┆ … ┆ null ┆ Southampton ┆ no    ┆ true  │
│ …        ┆ …      ┆ …      ┆ …    ┆ … ┆ …    ┆ …           ┆ …     ┆ …     │
│ 0        ┆ 2      ┆ male   ┆ 27.0 ┆ … ┆ null ┆ Southampton ┆ no    ┆ true  │
│ 1        ┆ 1      ┆ female ┆ 19.0 ┆ … ┆ B    ┆ Southampton ┆ yes   ┆ true  │
│ 0        ┆ 3      ┆ female ┆ null ┆ … ┆ null ┆ Southampton ┆ no    ┆ false │
│ 1        ┆ 1      ┆ male   ┆ 26.0 ┆ … ┆ C    ┆ Cherbourg   ┆ yes   ┆ true  │
│ 0        ┆ 3      ┆ male   ┆ 32.0 ┆ … ┆ null ┆ Queenstown  ┆ no    ┆ true  │
└──────────┴────────┴────────┴──────┴───┴──────┴─────────────┴───────┴───────┘

Metadata

If you inspect the metadata, you can see that the compression is snappy!

cargo run metadata titanic.parquet
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s
     Running `target/debug/parquet-parser metadata titanic.parquet`
...
column 0:
--------------------------------------------------------------------------------
column type: INT64
column path: "survived"
encodings: PLAIN RLE RLE_DICTIONARY
file path: N/A
file offset: 228
num of values: 891
compression: SNAPPY
total compressed size (in bytes): 224
total uncompressed size (in bytes): 219
...