Read this article if you use CRC32, or if you know it is unsecure but think it is good enough in your case.
CRC32 - Cyclic Redundancy Check with 32 bits of output - is a widely used checksum algorithm. It is designed to detect accidental alteration of data during transmission or storage. It is not meant to be used in security-related situations. Now, my crypto-friends are probably already bored and will tell me it has been known for ages. Sure, but if it is that obvious, why do people keep on using CRC32 in wrong situations ? Have a look at the following bad ideas, all have been taken from real situations and sometimes widely deployed:
WEP uses CRC32 for data integrity of each packet. Consequently, WEP is vulnerable to practical chop-chop attacks. Do not use CRC32 for data integrity.
CRC32 has been used to generate unique IDs. This is a bad idea too, because CRC32 collisions are far too likely. For example, the two words a1sellers and advertees produce the same checksum. Note that, in addition, they have the same length. Do not use CRC32 for unique ID generation.
Outlook PST's passwords are 'protected' with CRC32. As a consequence, an attacker merely needs to find any word whose CRC32 matches the original password's CRC32. Have a look at this video to see how easy it is to break PSTs. Do not use CRC32 for authentication.
CRC32 has been used for anti-virus whitelisting. The checksum of known clean files (for example, legitimate files of your favoritate Operating System) are gathered in a so-called white list. Then, whenever a file is scanned, it is first checked against the white-list, thus speeding up the process for clean files and reducing chances for False Positives. However, this is a bad idea, because an attacker can easily craft a modified virus, for example with random overlay, so as to match the checksum of a whitelisted clean file. Such a virus would bypass the anti-virus detection. Adding the file's size to each clean file checksum does not significantly complicate the attack: crafting a malware with the same CRC32 and size as a clean file is no more than a few dozen minutes of computing. Do not use CRC32 for data matching.
Why isn't CRC32 good for security? There are at least two reasons. The first is because its output is only 32-bit long. This is far too small for a low collision rate, using today's computers. Lenstra and Verheul recommend at least 154 bits. The second reason is that checksums do not fulfill the three mathematic properties asked for cryptographic hash functions which are pre-image resistance (also known as one-way functions), second pre-image resistance (also known as weak collision resistance) and collision resistance.
If CRC32 detects accidental errors, why can't it detect malicious alteration ? what's so different between both? In practice, accidental errors often occur sporadically rather than at random. CRC32 typically produces a very different checksum for similar input with only a few errors. However, a clever attacker can intentionally make an error in one location and then craft the necessary modifications in the rest of the input to compensate for the initial error. CRC32 is not designed to detect intentional modifications.
I need high performance, hash functions are slow. While this statement might be true, people usually say that without conducting benchmarks for their specific use. Performance is a complicated matter: depending on implementations, underlying hardware, machine load (etc), there are cases where CRC32 will be slower than SHA1. However, reliable benchmarks of software cryptographic libraries generally show that SHA1 is 20 to 40 percent slower than CRC32 (Crypto++, LibTomCrypt, Botan). With a SHA1 chip, it is quite likely you won't notice any difference at all...
CRC32, MD5, SHA1 (etc) are all broken. What difference does it make if I use CRC32 ? Actually, there is a big difference. The meaning of "broken" is quite different for these algorithms. For SHA1, collisions are theoretical, or if McDonald et al's recent work is confirmed - at best, they require very high computational power. For MD5, a few researchers have demonstrated collisions could be found in practice. For example, read the recent work of Sotirov et al. Anybody with a good understanding of their work (or related) can find collisions in MD5. Finally, CRC32 is the worst case: collisions can be found very easily. It does not require any specific knowledge nor equipment. A 100-line Perl script will find collisions.
Okay... but which hash function should I choose ? If you need a widely supported algorithm, I recommend you select SHA1. Otherwise, you could look into RIPEMD-160 or SHA512. Check the current status of those algorithms on the Hash Function Lounge, and be sure to keep an eye on NIST's SHA3 contest.
All those situations would have required use of a **cryptographic hash function **such as SHA1, SHA256, RIPEMD-160, or a Hashed MAC - HMAC - for authentication. But some of you might not be convinced yet.