|
difference between pdf and jpeg is that while for both, you have to develop special recompression algorithm, pdf uses deflate compression from zip and then its contents once opened up, can be compressed well with general codecs.
on the other hand, jpeg has many variations so first recompression has to be able to take apart various jpegs and then you have to build completely custom codec to compress that picture inside.
So basically it is double the work compared to PDF.
Advantage with PDF is that we can use a lot of the code to recompress other formats like PNG, DOCX, ODT, SWF, since they all use deflate to give them (weak) compression.
For instance, DOCX containts XML files inside that are compressed with deflate. However since they are compressed, if you try to compress them again, you will gain very little... but if you unravel weak compression and then apply stronger one, big gains are possible.
Here is test example.
1. Contract in DOCX format - 104 KB
2. DOCX compressed with 7zip - 95KB
3. DOCX recompressed properly - 64 KB
So thats 40% gain on DOCX file for instance. Imagine if you have many of them on your computer, or if your company sends many via email or backup service... Time and cost savings are quite significant here.
Now actual % gained is different for different formats and there are further optimizations possible (for instance detect pictures and text inside single file differently and compress them with their own codecs), but you can see how much potential this has.
Main thing here is that it has to be done seamlessly and it has to be fast, otherwise people will not use it. And then we come to #2 part of new format - multicore optimizations.
|