You are here: Foswiki>HomeInfra Web>PPMd (13 Sep 2019, AdminUser)Edit Attach

Data compression - Prediction by Partial Matching (PPM)

PPM is fast compression algorithm with a high compression ratio for text files (and pretty "average" for anything else). Log files are a special case of text, because usually log files contain even more redundancy in text and PPMd compresses extremely good in those cases. PPM works for all kinds of files, but it excels in log file compression.

Usually logfiles are compressed about 10% smaller compared to XZ/LZMA compression, in a fraction of the time. A drawback is the decompression speed of PPM, in the same range at the compression speed.

Theory at wikipedia

https://en.wikipedia.org/wiki/Prediction_by_partial_matching

Original PPMd implementation by Dmitri Shkarin.

http://compression.ru/ds/

Derived from the above, there are many other implementations, e.g. 7-Zip and p7zip.

A stripped down version of 7-zip/p7zip (only PPMd compression) can be found on github:

https://github.com/svpv/ppmd-mini

Benchmark

Performance on Linux/Ubuntu 18.04 with a 20Mb sample webserver logfile.

The combination of first flzp and then gzip is very interesting... the combination is faster and more compressed.

Program + options Compressed size Compression ratio (org/comp) Compression time (ms) Decompression time (ms) Remarks
cp 20617071 1.0 15 15 file copy on tmpfs
lz4 -1 2367153 8.7 40 30 v 1.92
lz4 -9 1572708 13.1 440 26 v 1.92
gzip -9 1337914 15.4 406 89 v 1.6
flzp 1311552 15.7 180 86  
flzp | gzip -9 901436 22.9 250 109  
xz -9 839228 24.5 2753 91 v 5.2.4
zstd -19 805203 25.6 12043 32 v 1.43
bzip2 -9 723415 28.4 2912 313 v 1.06
ppmd-mini -9 588886 35.0 385 426 v 0.2 ppmd-mini
7za -m0=ppmd:mem=2G:o=32 519061 39.7 932 959 p7zip 16.02
zpaq -m5 393798 52.4 43150 43644 v 7.15
fp8 -8 307808 67.0 116731 113250 v6
Note: Click on header name to sort values
Topic revision: r6 - 13 Sep 2019, AdminUser
This site is powered by FoswikiCopyright © by Eddy Vervest