How to Calculate CRC Checksum in Linux using Cksum Command

by Himanshu Arora on July 13, 2012

Checksum is used for verifying the integrity of the data. Suppose some file is being copied over a network or over a system and due to some event like network connection loss or sudden reboot of machine the data did not get copied completely.

Now, how would you verify the integrity of data? Well, its through the CRC checksum mechanism the data integrity can be verified. There are various mechanisms through which a CRC checksum can be calculated. For example in one of our articles (IP header check sum) we discussed how to find the checksum of an IP header. In this article, we will focus on the Linux ‘cksum’ command which is used to calculate the check sum of files or the data provided on standard input.

What is CRC?

CRC stands for cyclic redundancy check.

Checksum can be calculated by applying cyclic redundancy check (CRC) mechanism over the data that is being communicated. Each block of data that is traveling the communication channel is attached with a CRC code or checksum and when the data block reaches the destination, this check is applied again to generate a checksum value. If the checksum generated at the destination and the checksum value in the data block are same then data is believed to be non-corrupted and can be used further but if the two checksum values are not same then in that case data is said to be corrupted or infected.

The name CRC is because:

This mechanism is based on the fundamentals of cyclic codes (hence cyclic).
The code attached with the data as checksum is redundant ie it adds no value to the data being transferred (hence redundancy).
Its a check (hence check)

The cksum command

The cksum command is used for computing the cyclic redundancy check (CRC) for each file provided to it as argument. CRC becomes important in situations where data integrity needs to be verified. Using the cksum command, one can compare the checksum of destination file with that of the source file to conclude that whether the data transfer was successful or not.

Besides providing the CRC value, this command also produces the file size and file name in the output. The command exits with status zero in case of success and any other status value indicates failure.

One can get a detailed information on this command by typing the following on the command prompt :

$ info coreutils 'cksum invocation'

cksum command examples

1. A basic example

On a very basic level, the cksum command can be used to display the checksum for a file.

$ cksum testfile.txt
3000792507 3 testfile.txt

The first value (big number) in the output above is the checksum for the file, then we have the size of the file and finally the name of the file.

2. Checksum changes with change in content

The test file ‘testfile.txt’ has following contents:

$ cat testfile.txt
Hi

To calculate the checksum of the test file, pass it as argument to the cksum command :

$ cksum testfile.txt
3000792507 3 testfile.txt

Now, Modify the contents of file :

$ cat testfile.txt
Hi everybody.

Again pass the test file as argument to cksum command :

$ cksum testfile.txt
2559130041 14 testfile.txt

So we see that with change in contents, the checksum changes.

3. Change in content does not always mean increase or decrease in size

Well the above is true fundamentally also and even for chksum too. Lets see what it means :

Check the contents of the test file ‘testfile.txt’ :

$ cat testfile.txt
Hi everybody

Note the checksum :

$ cksum testfile.txt
2559130041 14 testfile.txt

Now, change the content by not actually adding or deleting something but by replacing one character with other so that size of the file remains same.

$ cat testfile.txt
Hi everybudy.

So as you can see, I replaced ‘o’ with ‘u’.

Compare the checksum now:

$ cksum testfile.txt
3252191934 14 testfile.txt

So we see that the checksum changed even if the change was of one character replaced by other.

4. An interrupted copy

Suppose you are copying a zipped folder containing various sub-folders and files from one location to another and due to any reason whatsoever the copy process got interrupted, so how would you check whether everything was copied successfully or not? Well, cksum makes it possible as now we know that in case of the partial copy, the overall checksum of the destination would differ from that of the source folder.

You can simulate this scenario in the following way:

I created Linux.tar.gz and Linux_1.tar.gz from the same ‘Linux’ folder. The difference being that Linux_1.tar.gz was made when ‘Linux’ folder contained an extra text file.

So the above scenario simulates when Linux_1.tar.gz was being copied but got interrupted when just one text file was left to be copied in the target Linux.tar.gz

Now when I compare the checksum of both these files, I see

$ cksum Linux.tar.gz
756656601 1037079 Linux.tar.gz

$ cksum Linux_1.tar.gz
2598429125 1037184 Linux_1.tar.gz

So the above output shows different checksum values suggesting incorrect copy of file.

5. Checksum of standard output

This command provides a feature where-in the user can type just ‘cksum’ or ‘cksum-‘ and write on stdin and then press Ctrl+D couple of times. This way cksum gives the checksum of the data entered at the input.

$ cksum
Lets check the checksum1135634677 23

In the example above, we actually calculated the checksum of the string “Lets check the checksum”.

Add your comment

If you enjoyed this article, you might also like..

Comments on this entry are closed.

Rajesh Kumar V July 13, 2012, 9:11 am

Really excellant article.now I know cksum.thanks a lot for this…

∞
bob July 13, 2012, 10:28 am

Thanks!!! Great article…

∞
HAK July 13, 2012, 11:40 am

Nice article! Now i am no more confused about CRC. Thanks.

∞
Jalal Hajigholamali July 13, 2012, 9:42 pm

Hi,

Thanks a lot..

Very nice and useful article, i sent it to my students in university….

∞
steve July 15, 2012, 8:49 pm

Hello,

Great article thanks.

However, I think md5sum is much better than the cksum as the output value is wider in length.

∞
PB July 17, 2012, 3:14 am

Checksum and CRC are very different and distinct things. Checksums are used (among other places) in IP transfers while CRCs are often used in internal file integrity checks. Checksum is a simple addition of all the bytes (or words or longs etc.) of a file together and is not infallible. CRC is much more sophisticated and as a result, much more compute intensive and much more reliable at detecting transmition errors. So, which is it? Does cksum produce a checksum or a CRC? I suspect the former as CRC needs more input paramaters in the versions I am familiar with. RFM.

∞
Ehan Chang March 8, 2013, 1:08 am

helpfull, but could explain CRC in more detail in the future article.

∞
uzzal October 2, 2013, 6:42 am

Thank you.. It’s helpful…

∞
TJa October 18, 2013, 3:46 am

@PB
No, CRC is just a way to calculate a checksum.

The most simple way to calculate a checksum is by exclusive-oring (adding without carry) all bytes. The next step is by adding them all up. Both methods sometimes use a preset value or a complementing algorithm for mathematical reliability reasons.

A more sophisticated calculation is a CRC (cyclic redundancy check). If done cleverly it is almost as fast as adding up the bytes. CRC is a well defined and CCITT standardised method so there can be no disambiguation. CRC’s are available in 16, 32 and 64 bits. If we want to make things even more reliable we use MD5 or SHA1 checksums ranging from 128 bytes up to several kilobytes. We tend to not call these ‘checksums’ but ‘hash values’ but basically they are still checksums.

There is no difference between ‘on disk’ CRC checksums and CRC checksums used in transmission systems.

∞
Nihar Ranjan Sahoo June 16, 2015, 3:24 am

How can I check for image with checksum from header??

∞
annonymous coward February 5, 2016, 10:36 am

just my 2 cents… checksums, crc, md5, etc.. they’re all integrity checks.. some have more strength than others.

on CRC vs Checksums:
I tend to associate ‘summation’ with check-SUM. For example: common checksum implementations include one’s or two’s compliment summations. e.g. Add up all the numbers (including the checksum value) and you get 0x00 in the last 8 bits of the accumulator.

Compare that to CRCs where there is no summation. Its all bit shifting and XOR’ing at specific bit values (determined by the polynomial and width of the CRC value).

anyway… that’s my thesis on why checksums are not CRCs and therefore why using a utility called ‘cksum’ to compute a CRC is confusing.

Furthermore, if cksum is really performing a CRC then you should indicate the width of the CRC and the polynomial being used.

∞
annynomous caword February 10, 2016, 3:26 am

How is a XOR different from adding? How is shifting different from multiplying?
CRC is just a calculation, no different from other methods. It makes no sense to differentiate between calculation methods other than some are stronger than others. The 8-bit CRC8 and MOD11 are just polynomials as are MD5 SHA-x, CRC-x, and what not. The choice of polynomial depends on your needs. You need to balance performance (speed), reliability and security.

∞

Next post: Linux Export Command Examples (How to Set Environment Variables)

Previous post: 20 Awesome Google Maps Tips and Tricks