Data compression - Prediction by Partial Matching (PPM)
PPM is fast compression algorithm with a high compression ratio for text files (and pretty "average" for anything else). Log files are a special case of text, because usually log files contain even more redundancy in text and PPMd compresses extremely good in those cases. PPM works for all kinds of files, but it excels in log file compression.
Usually logfiles are compressed about 10% smaller compared to XZ/LZMA compression, in a fraction of the time. A drawback is the decompression speed of PPM, in the same range at the compression speed.
Theory at wikipedia
https://en.wikipedia.org/wiki/Prediction_by_partial_matching
Original PPMd implementation by Dmitri Shkarin.
http://compression.ru/ds/
Derived from the above, there are many other implementations, e.g.
7-Zip and
p7zip.
A stripped down version of 7-zip/p7zip (only PPMd compression) can be found on github:
https://github.com/svpv/ppmd-mini
Benchmark
Performance on Linux/Ubuntu 18.04 with a 20Mb
sample webserver logfile.
The combination of first
flzp
and then
gzip
is very interesting... the combination is faster and more compressed.
Note: Click on header name to sort values