Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Compression

Pages in a parquet file might be compressed, the compression codec is stored in the codec field in column metadata.

Page compression information is stored in the codec field

There are many codecs listed in the spec, however, our parser will only support SNAPPY.

Task

decompress

Implement the decompress function in src/compression.rs.

pub fn decompress(compressed_data: Bytes, codec: CompressionCodec) -> Result<Bytes> {
    match codec {
        CompressionCodec::UNCOMPRESSED => todo!("step13: implement compression"),
        CompressionCodec::SNAPPY => todo!("step13: implement compression"),
        _ => unimplemented!("Unsupported codec: {}", codec.0),
    }
}

read_page

Update the read_page function to decompress a compressed page data.

pub fn read_page(data: Bytes, codec: CompressionCodec) -> Result<(Page, Bytes)> {
    // ...
}

For snappy decompression, refer to snap crate.

Test

Test case for this step is step13_compression.

Hints and Solution

Hint (how to handle uncompressed data)

For uncompressed data, you return it directly in the decompress function.

Hint (how to handle snappy compression)

For snappy compression, you can decompress it with decompress_vec.

Solution

decompress:

pub fn decompress(compressed_data: Bytes, codec: CompressionCodec) -> Result<Bytes> {
    match codec {
        CompressionCodec::UNCOMPRESSED => Ok(compressed_data),
        CompressionCodec::SNAPPY => {
            let mut decompressor = snap::raw::Decoder::new();
            let buf = decompressor.decompress_vec(compressed_data.as_ref())?;
            Ok(Bytes::from(buf))
        }
        _ => unimplemented!("Unsupported codec: {}", codec.0),
    }
}

read_page:

pub fn read_page(data: Bytes, codec: CompressionCodec) -> Result<(Page, Bytes)> {
    let (page_header, mut remaining) = read_thrift_metadata::<PageHeader>(data)?;
    let page_data = remaining.split_to(page_header.compressed_page_size as usize);
    let mut page_data = decompress(page_data, codec)?;
    let page = match page_header.type_ {
        // ...
    };
    Ok((page, remaining))
}