Projects / grzip


grzip is a high-performance file compressor based on Burrows-Wheeler Transform, Schindler Transform, Move-To-Front, and Weighted Frequency Counting. It uses the Block-Sorting Lossless Data Compression Algorithm, which has received considerable attention in recent years for both its simplicity and effectiveness. This implementation has a compression rate of 2.234 bps on the Calgary Corpus (14 files) without preprocessing filters. This is essentially an adaptation/extension of GRZipII by Ilya Grebnov.

Operating Systems

Recent releases

  •  02 Feb 2007 09:55

    Release Notes: The grzip package now includes additional scripts such as grzcat, grzless, and grzdiff. Other minor corrections and enhancements have been made.

    •  06 Jan 2007 01:11

      Release Notes: The Makefile has been reworked, and all compilation warnings have been removed. This release shoud be 64-bit safe. New options have been added to work with stdin and stdout (allowing it to compress files larger than 2GB on 32-bit systems).

      •  03 Jan 2007 00:00

        Release Notes: The Makefile has been reworked, and all compilation warnings have been removed. A new -r option for removal of input files has been added. While this is still a work in progress (grzip is reported to be 64-bit unsafe), it is to be noticed that grzip provides very good compression while being generally not much slower than bzip2. For example, one gets a gain of almost 6 megabytes on a recent Linux kernel, and the compression is faster than bzip2 in that case.

        •  02 Jan 2007 04:10

          No changes have been submitted for this release.

          Recent comments

          02 Jan 2007 13:35 klausman

          Re: Not 64-bit safe, and horrible coding...
          Thanks. That (and our discussion here) should provide enough hints to anyone who's willing to test drive.

          Unfortunately my C knowledge is so bad that wouldn't improve a single thing if I laid hands on it. Otherwise, I'd be glad to do some polishing.

          02 Jan 2007 13:28 demailly

          Re: Not 64-bit safe, and horrible coding...

          > Thing is, the program's status is marked as "5 - Production/Stable". And that it definitely isn't.

          OK, that was a wrong interpretation of mine - my tests didn't show any problem on x86 (at least after I corrected one rather dull segfault problem) - and comments from testers on the Internet, mostly on Windows, didn't mention any issues. I have anyway recategorized 'grzip' as alpha since it is still a work in progress. It seems nevertheless more reliable than that on x86.

          02 Jan 2007 11:15 klausman

          Re: Not 64-bit safe, and horrible coding...

          Thing is, the program's status is marked as "5 - Production/Stable". And that it definitely isn't.

          I do see room for improvement in the area of file compressors. Both bzip2 and gzip are nice attempts at the problem -- but they aren't anywhere near the leading edge of compression research/information theory.

          My comment was mainly intended from keeping some user from compressing all their essays and texts they've written, only to find out they're gone because they unfortunately use a 64 bit machien and OS. Sure, one should test-drive programs before entrusting them with data (and they should have backups).

          02 Jan 2007 11:05 demailly

          Re: Not 64-bit safe, and horrible coding...
          Thanks to Tobias Klausmann for his detailed comments. First of all, I have to say that most of the coding is not mine - Ilya Grebnov is the author of 99% of the code - my early tests were made on a 32 bit Intel machine, without any attempt of mine to make the code more portable.

          The sole purpose of this FM announcement was to get Ilya Grebnov's efforts more widely known, as I think grzip is still a very promising program in spite of its obvious current limitations (that's why it's still only at version 0.2.5 !)

          (1) As far as rate of compression is concerned, grzip performs *always better* than other similar open source compressors (at least in all the tests I made...)

          (2) As far as execution time is concerned, grzip is sometimes slower than bzip2 by maybe 50%, sometimes faster by a similar margin, depending on the nature of the files. An interesting case is tar archives of recent Linux kernels : while compressing better than bzip2 (the gain is almost 6 MBytes!), grzip also requires less time in that particular case.

          - Getting 64-bit safe code is probably only a matter of getting people working on it - one of the additional reasons I felt useful to advertize this program.

          All in all, I still believe that grzip could be turned into a very useful general purpose compressor after a few iterations.

          02 Jan 2007 06:23 klausman

          Not 64-bit safe, and horrible coding...

          First off: use the FM download links, the package linked form the homepage doesn't include the same files.

          Both packages also have these troubles (which stem from identical or very similar source code):

          - decompression on 64-bit systems doesn't work at all (I tested a variety of files and they all couldn't be unpacked (CRC errors).

          - the source code is very hairy. Try compiling it with -Wall instead of the author's -Wno-error (sic!). No wonder it's not 64-bit safe.

          - the makefile is broken. Hard-coded CPU-specific optimization (-march=pentium) and over-the-top optimization (-O7) are both bad ideas. As is -ffast-math.

          In benchmarks (take those with a grain of salt, the binary is broken!), I found that compression is significantly slower than bzip2 (which isn't fast, either). And the resulting files weren't noticably smaller than when compressed with gzip or bzip2.

          In short: if you're on a 64-bit machien, avoid.

          Word of caution: once you compress a file, the uncompressed file gets deleted (bzip2 and gzip do the same). Unfortunately, with that, your file is gone forever as you can't decompress it!


          Project Spotlight


          A Fluent OpenStack client API for Java.


          Project Spotlight

          TurnKey TWiki Appliance

          A TWiki appliance that is easy to use and lightweight.