This article is good for general audiences and provides an introduction to data compression techniques and uses.
File compression is a technique for “squeezing” data files so that they take up less storage space, whether on a hard drive or other media. Many different kinds of software, including backup programs, operating systems, media apps, and file management utilities, use this technique. While the type of source file and the type of compression algorithm determines how well compression works, a compressed set of an average mix of files typically takes about 50 percent less space than the originals. This technology has applications ranging from archives and backups to media and software distribution.
Most compression techniques work by reducing the space redundant information in a file takes up. The more redundancy the compression algorithm detects, the smaller the compressed file becomes. Text files, for example, may have many repeated words or letter combinations that can produce significant compression–as much as 80%, in some cases.
Databases and spreadsheets often also make good candidates for file compression because they, too, typically have repeated content. Conversely, files that have already been compressed, such as MP3s and JPEGs, have low redundancy. Compressing them further yields results only a few percent smaller than the originals–in some cases, they may become slightly larger when compressed, since the compression can add a small amount of management data to the file.
Lossless vs. Lossy Compression
Compression comes in two basic types, lossless and lossy. A lossless compressed file retains all information so that decompressing it restores the original file in its entirety. Most lossless compression algorithms build upon the work Abraham Lempel and Jacov Ziv pioneered in the late 1970s in creating the algorithms that would be called LZ (many subsequent compression algorithms build upon this work, so their names begin with this pattern: LZO, LZW, LSWL, LZX, LZJB, etc.). The algorithm uses an adaptive technique that analyzes the source file for strings of characters that repeat. The larger the string it can find, and the more often that string recurs through the file, the more it can compress the output file. Documents, spreadsheets, and similar other files are often compressed with lossless techniques like these LZ-based algorithms.
Lossy compression can often produce more compact results by discarding data that may not affect the final resolution of the file. Files relying upon human perception often utilize lossy compression, since the source material may have more resolution than we can realistically perceive. For example, a photo in its raw form may take 5MB, but if you want to use it on a web page, using that photo would cause the page to load more slowly. Using an image editor and lossy compression, you might create a compressed version of that photo that is 200KB. It may lose some of the clarity of the original but is still perfectly usable and is far quicker to download.
It is frequently convenient to package many files and/or folders into a single compressed file, such as for emailing a collection of files or distributing a complex software application. This packaged collection of files is called an archive. Some compression programs also let you combine multiple files together, providing the dual benefit of smaller space and archival packaging. Other programs, particularly in the Linux/Unix domain, only handle compression of one file at a time. Archiving usually requires a separate program.
Windows Compression Software
PKZIP, a commercially-available utility program first introduced in the late 1980s, has become a de facto compression standard for the Microsoft Windows environment. PKZIP compresses, decompresses, and allows the creation of complex archives, saving them with the file extension
.zip. In recent years, Microsoft has bundled PKZIP technology into Windows, allowing the operating system to automatically recognize and open most zip files. Open-source compression utilities are also available, such as Peazip, 7-Zip, and gzip. Windows has its own built-in software that lets you designate files, folders, and entire drives as compressed, extending the capacity of storage media.
Linux Compression Software
Linux has several different useful utilities for file compression, such as bzip2, gzip, and xz. These utilities are single-purpose and compress single files only–they do not by themselves create archives. The tar package (from “Tape ARchive”) often does archiving in conjunction with other utilities. Linux, like Windows, uses the combination of compression and archiving to reduce the space some files (such as log files) take up.
File compression lets you pack more data into a given amount of storage space. In addition to saving space on hard drives and other media, compression can dramatically improve the speed of file downloads. The technology is available as an integral part of most modern operating systems or as stand-alone programs.
Atlantic.Net offers state of the art cloud servers to handle huge amounts of data for over 50,000 customers on a daily basis. Redundant backup, excellent customers service, and technical support go hand in hand with our popular hosting solutions like Cpanel and HIPAA Storage Hosting.