December 28, 2009

About rle_unpack in libevecache

I have had several questions regarding the “rle_unpack” function in libevecache. In order to not repeat myself in e-mails, I decided to make a quick post describing the logic: The market rows are compressed with an odd variant of run-length-encoding. In this case, the extra “0” bytes are suppressed and encoded into one byte. The row starts with a opcode byte, which is split as follows (high bit to low bit). You can find this broken out in the struct packer_opcap.

x x x t y y y b
7 6 5 4 3 2 1 0

There “t” is a flag to say “unpack 0s” if the bit is set, and “copy from source” if the bit is not set. So if the bit is set, “XXX” bytes of “0” are copied to the output buffer, and if its not set, “XXX” bytes are copied from the input to the output. This repeats with B and YYY. After one opcode byte has been processed, the next opcode byte is read, or processing is terminated if the input buffer has been exhausted. This took a fair bit of time to figure out, with lots of sample data and market logs as the study set. How much space does it save per order row? About 10-30%, from my samples.

© 2020 Yann Ramin