Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Runs Decoder

We have everything we need to decode the RLE Bit-packing Hybrid encoded data. Let’s apply it to our parser.

RLE bit-packed hybrid format

For now we only handle the encoded boolean values; this means the length must be included.

Page kindRLE-encoded data kindPrepend length?
Data page v1Definition levelsY
Repetition levelsY
Dictionary indicesN
Boolean valuesY

Task

rle_bit_packing_hybrid_decode

Implement the rle_bit_packing_hybrid_decode function in src/decoder/rle_bit_packing_hybrid.rs. It takes the encoded page data and returns a decoded vector of Scalar.

pub fn rle_bit_packing_hybrid_decode(
    encoded_data: Bytes,
    parquet_type: Type,
    bit_width: u8,
    num_values: usize,
    prepend_length: bool,
) -> Result<Vec<Scalar>> {
    todo!("step10-05: decode all runs")
}

Because a bit-packed run might contain garbage, the num_values might not equal the total number of values in all pages.

decode_page

Update the match arm Encoding::RLE in src/decoder/mod.rs. Again, the data type is always boolean with 1 bit-width.

pub fn decode_page(page: &Page, parquet_type: Type, num_values: usize) -> Result<Vec<Scalar>> {
    match page.encoding() {
        // ...
        Encoding::RLE => todo!("step10-05: rle bit-packing hybrid decoder"),
        // ...
    }
}

Test

Test case for this step is step10_05_runs_decoder.

Hints and Solution

Hint (steps)
  • Extract all the runs
  • Decode each run and concatenate the result
  • Handle the number of values in the final result
Solution

rle_bit_packing_hybrid_decode:

pub fn rle_bit_packing_hybrid_decode(
    encoded_data: Bytes,
    parquet_type: Type,
    bit_width: u8,
    num_values: usize,
    prepend_length: bool,
) -> Result<Vec<Scalar>> {
    let runs = read_rle_bit_packed_runs(encoded_data, bit_width, prepend_length)?;
    let mut result = Vec::with_capacity(num_values);
    for run in runs {
        let scalars = rle_bit_packing_hybrid_run_decode(run, parquet_type)?;
        result.extend(scalars);
    }
    result.truncate(num_values);
    Ok(result)
}

decode_data_page:

pub fn decode_page(page: &Page, parquet_type: Type, num_values: usize) -> Result<Vec<Scalar>> {
    match page.encoding() {
        // ...
        Encoding::RLE => {
            rle_bit_packing_hybrid_decode(page.encoded_values(), parquet_type, 1, num_values, true)
        }
        // ...
    }
}