Reasons to compress
by Kay Ewbank
Data compression has taken on a new importance as we move to the cloud. Kay Ewbank evaluates the tools and the techniques available.
HardCopy Issue: 58 | Published: November 1, 2012
For a while, it looked as though the idea of saving space when storing data had gone out of fashion along with kipper ties and flared trousers. The price per gigabyte of storage continues to fall, and no-one needs to worry about how much space their data takes – right? Well, not quite right. It’s true that local storage continues to look inexpensive, but the increasing move to cloud-based storage has implications both in terms of cost and convenience. When you’re paying per gigabyte of storage, you want to be efficient in terms of how much space you’re using. There’s also the cost in terms of time and money of uploading and downloading that data.
This is because the way in which people interact with applications and data has changed radically in recent times. No-one would expect to have been able to get the most recent data on their laptop once they’d left the office; you copied the data files before you left and hoped things didn’t change that much while you were away. The idea of using a device such as a smartphone to work with large amounts of data wasn’t even in the laughable category. Time and hardware moves on, and users now expect their data to be available at the click of a button. That usually means a lot of data being uploaded and downloaded across expensive connection routes.
Modern compression software does a lot more than simply squash data files into smaller spaces. The more comprehensive packages include built in encryption options that will let you create packages that are both compressed and encrypted, so the data contained within the file is protected. Furthermore, multiple files can be compressed and combined into a single ZIP package. This can be particularly important if documents are stored in the cloud, as it provides an additional level of privacy for commercially sensitive or personal data.
Many companies routinely use compression software for documents that are being backed up to reduce both the amount of storage used and the time taken to copy the data, although this does bring the overhead of needing to decompress the documents if they are required for use outside the backup.
Backup tools such as CA ARCserve and Symantec’s Backup Exec include options for compressing and encrypting the data as it is backed up. They can also use sophisticated techniques to reduce the unnecessary duplication of data that is inevitable in a modern computing environment.
Another reason why compression and de-duplication is coming back into fashion is that the software is a lot easier, quicker and more integrated than it ever was in the past. If the computer the compression software is used on is Windows based, then compression software such as WinZip actually uses the shell extension to add itself to Windows Explorer so that it becomes available directly from My Computer, Windows Explorer and the desktop itself. Users of Microsoft Office software will also find compression software available within their applications, so enabling them to send emails with compressed attachments, for example. Depending on the way you’ve configured the software, compression packages can work transparently in the background so you don’t even have to make the decisions about which files need to be compressed.
While the standard image of compression software is as a utility on the desktop, an increasingly important use is coming from software developers. There are components and libraries available that allow you to add your own compression options to the software you’re developing so that the amount of space required by data files is lowered both in storage and when it comes to data transfer.
The need to take up the minimum amount of space when storing files has meant the market is always willing to try a new compression format. One of the earliest used on personal computers was ARC but this was quickly replaced in popularity by ZIP files. ZIP is a format that was invented by a developer called Phil Katz who created a utility called PKZIP (Katz had previously brought the ARC format to the PC through a utility called PKARC). PKZIP was a command line program, so WinZip was developed as a more user friendly front-end for working with ZIP format files on machines running Windows.
WinZip started out with very generous free trial conditions: while you were supposed to pay for it, the consequence of not registering and paying after the trial version expired was an increase in the level of severity of the reminders you received when opening the software. This gave WinZip a dominant position in the market which it retained even when rival compression packages were developed that beat WinZip in offering better compression and new options such as encryption and splitting of compressed files. The more recent versions of WinZip support a more effective compression method and use the extension ZIPX to indicate files that use them.
One reason why ZIP has retained its popularity has been because, from Windows XP onwards, it has been possible to create compressed versions of files and folders in ZIP format using Windows’ own Compressed Folder option. However there have always been alternatives. RAR is one of the more popular. The name is short for Roshal Archive, Eugene Roshal being the name of the software developer who designed the method. RAR gained popularity because it offered better compression than ZIP, and because at the time it was developed it could handle larger archives by splitting the compressed files and re-joining them on decompression. The built-in support for multi-volume compressed files meant that you could extract a file from a single part of the compressed set without needing all the pieces. RAR would also prompt the user for the next part of its file set if that wasn’t in the expected location.
Another benefit offered by RAR was support for recovery records which lower the risk of being unable to extract compressed files because of corruption in the compression process. The RAR format came with a stronger encryption method than ZIP, although newer versions use AES encryption. It also offers the ability to encrypt the filenames in the archive so that it isn’t possible to view the filenames and sizes of the files stored within the archive without supplying the correct password. The superior technical abilities of RAR meant it gained popularity even though the main application offering it, WinRAR, was only available as a commercial application.
LHA and LHZ are related formats that were developed in Japan and are still used in some Japanese applications and games such as Doom. Windows from XP onwards also provides LHZ compression in its Japanese version because of the popularity of LHZ in Japan. LHA offers the benefit of being able to create very small self-extracting EXE files that the recipient can extract without needing a compression program.
Probably the best known of the packages providing ways to compress and extract files, WinZip has been available for Windows since the mid-1990s. It started life as a friendlier front-end for PKZIP, and originally was limited to working with ZIP format compressed files. Versions are now available for other operating systems including Mac, iOS and Android, and WinZip can be used to open files in any of the main compressed formats, including ZIP, ZIPX, RAR, 7Z, TAR and GZIP. For Windows users, WinZip offers the advantage of longstanding familiarity; most users will have encountered it at some point in the past. Indeed, many people think it is part of Windows, although this has never been the case.
In many ways, WinZip suffers from its popularity and the fact it’s been around for so long. A lot of people got used to using WinZip as though it was free, and then complained about the harshness of the warnings when the free trial period expired. They complained because it only worked with ZIP files, and because the compressed files it created were larger than those from rival packages.
Another criticism of early versions of WinZip concerns the way its encryption used to work. For example, when used to encrypt a file, it encrypted the contents but left the filename and size un-encrypted, leaving this information visible to anyone attempting to crack the encryption. WinZip before version 9 also used its own less-than-adequate encryption techniques that were relatively easy to crack, no matter what the strength of the password. More recent versions of WinZip offer the option of using AES (Advance Encryption Standard), with a choice of strengths. This still leaves WinZip open to the criticism that it uses Symmetric encryption, with a single password for both encrypting and decrypting the ZIP archive. This means the choice of password is vital as anyone who acquires one of your encrypted files has unlimited attempts to guess your password.
WinZip has now been acquired by Corel Corporation and so benefits from having a larger software company behind it. Improvements in recent editions are aimed at areas such as working with data in the cloud. The most recent release lets you connect to cloud services such as Google Drive, SkyDrive and Dropbox, and zip or unzip files for uploading and sharing. WinZip also now has an online app called ZipShare that can be used to share files on Facebook, Twitter or LinkedIn.
Many users think of NXPowerLite purely as a PowerPoint compressor because this is what the original version of the software did, but in fact it works with JPEG images, PDF files and documents produced by any Microsoft Office application. It comes in versions for both desktop and servers.
Unlike straightforward compression software that looks for standard patterns of data that can be compressed, NXPowerLite uses knowledge about the file formats to remove elements added by the software that are unnecessary. In particular, NXPowerLite removes the extra information stored when PowerPoint carries out fast saves, when a file can contain multiple copies of data. It also converts any embedded graphics to formats and resolutions that require less space. Used on a PowerPoint presentation, NXPowerLite can reduce the space used by up to 95 per cent of the original size.
The other advantage of NXPowerLite is that the files remain in their original format which means they can be opened and used as they stand without the need to unzip or decompress them. This is a big advantage given that, by their very nature, many PowerPoint presentations are designed to be shared with other people.
While NXPowerLite remains a great tool for use with PowerPoint, it has moved on since the early days and now works with a wider range of file formats. The desktop edition can be used from within Microsoft Office to compress PowerPoint, Word, Excel, JPEG and PDF files. It can also be used from within Microsoft Outlook to compress files you send as email attachments. The server version of NXPowerLite also works on PowerPoint, Word, Excel and JPEG files, replacing documents with smaller optimised versions.
PowerTCP Zip Compression
If you’re developing applications, you can still minimise the amount of space your data files require by using components such as PowerTCP Zip Compression. There are versions for .NET (100 per cent managed code) and ActiveX. Either version can be used to compress and decompress data files.
As the name suggests, these create files in ZIP format, and the components can be used in their simplest form by adding a single line of code to your application. They also provide the option of compressing data in memory using the Stream interface, so if you needed to send data to a client app in a Web browser, for example, you could do so without having to create a temporary file to hold the compressed version. The compressed files can be spanned across multiple disks, and you can create split files so files that are to be transmitted don’t get too large.
Earlier versions of PowerTCP Zip could cause problems when used on very large files as there was a limit of 4GB for the file to be archived, but this restriction was raised a couple of years ago with the addition of support for 64-bit ZIP extensions. The components can be used to create self-extracting compressed files so recipients don’t need to have the extraction software installed. PowerTCP also has the option of letting you encrypt your compressed files using AES data encryption. One good thing about PowerTCP Zip is the fact that there are good tutorials showing how to use the software, along with sample apps that come with the code in both C# and Visual Basic. Sample apps are included as both Windows and Web applications.
Xceed Zip comes in ActiveX and .NET libraries for adding compression features to your applications. The ActiveX component is fully self-contained and is written with ATL 3.0 to avoid having external dependencies, while the .NET library is written as 100 per cent managed code. Xceed Zip can be used to create compressed files with AES encryption, and to create self-extracting zips. The software is well thought out with options such as the ability to work in the background without blocking use of the files being encrypted, and the option of using wildcards to select both files and folders. In addition to ZIP and GZIP, Xceed Zip supports TAR format compressed files.
Xceed Zip is part of a library of applications, and this gives you access to operations such as working with remote files across FTP and SSL links, including the use of HTTP proxy servers. The software also comes in a number of versions including Xceed Real-Time Zip for .NET which can be used to create applications that can read and write Zip archives without needing intermediate disk storage or memory.
Another version provides support for Windows Phone mobile apps so you can create apps using Silverlight for Windows Phone 7 or XNA Framework 4.0, using the component to compress and decompress data files on the phone.