Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Bonus: Interactive Testing

The parser can now read data encoded with Dictionary Encoding. Let’s test it!

Data

The CSV data is in data/all.csv.

col_bool,col_integer,col_real,col_string
true,1,1.1,one
false,2,2.2,two
true,3,3.3,three
true,4,4.4,four
false,5,5.5,five
false,6,6.6,six
true,7,7.7,seven
true,8,8.8,eight

Command

To apply Dictionary Encoding, set the dictionary flag: --dictionary.

# write csv to a parquet file
cargo run write data/all.csv all.parquet --encodings col_bool=rle --dictionary

# read the parquet file
cargo run read all.parquet

Result

    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s
     Running `target/debug/parquet-parser read all.parquet`
shape: (8, 4)
┌──────────┬─────────────┬──────────┬────────────┐
│ col_bool ┆ col_integer ┆ col_real ┆ col_string │
│ ---      ┆ ---         ┆ ---      ┆ ---        │
│ bool     ┆ i64         ┆ f64      ┆ str        │
╞══════════╪═════════════╪══════════╪════════════╡
│ true     ┆ 1           ┆ 1.1      ┆ one        │
│ false    ┆ 2           ┆ 2.2      ┆ two        │
│ true     ┆ 3           ┆ 3.3      ┆ three      │
│ true     ┆ 4           ┆ 4.4      ┆ four       │
│ false    ┆ 5           ┆ 5.5      ┆ five       │
│ false    ┆ 6           ┆ 6.6      ┆ six        │
│ true     ┆ 7           ┆ 7.7      ┆ seven      │
│ true     ┆ 8           ┆ 8.8      ┆ eight      │
└──────────┴─────────────┴──────────┴────────────┘

Metadata

You can see from the metadata, there is a new RLE_DICTIONARY encoding added.

cargo run metadata all.parquet
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.18s
     Running `target/debug/parquet-parser metadata all.parquet`
...
column 1:
--------------------------------------------------------------------------------
column type: INT64
column path: "col_integer"
encodings: PLAIN RLE RLE_DICTIONARY
...