# CRC32 Combine in Rust

2019-05-21## Background

I'm working on a Rust project where I need to check a CRC32 checksum of a file. It is somewhat easy to do so using the CRC crate. This crate provides multiple methods to compute a CRC32.

CRC, Cyclic Redundancy Code, computing enable one receiver to check that transmitted data is error free. To do so, the data sender compute a checksum that depends on the sent data and transmit it. The receiver also compute the checksum, this time from the received data and match it against the sender's one. If they are equal, data was transmitted without error.

My goal was to reconstruct a file split in multiple encoded chunks. yEnc anyone ?

Each chunk has it's own CRC32 checksum embedded as an hexadecimal string, that I can match against once I have decoded it's data.

Something like that :

fn check_crc32(input: &String, crc32: u32) -> bool { match u32::from_str_radix(input, 16) { Ok(parsed_crc32) => parsed_crc32 == crc32, Err(_) => false } } ... check_crc32("aca76043", my_computed_crc)

Once I have decoded all the file's chunks, I recreate the file by writing all of the chunks at their destination offset in my output file.

One of the chunks may have the overall CRC32 checksum of the file. If I have one, I would like to verify it against my reconstructed file's one.

## First try

When decoding chunks, I first thought about sharing a `crc32::Digest`

and update
it while writing decoded data to my `Vec<u8>`

destination buffer. Once all
chunks have been decoded, I would ask the digest for the overall CRC32 sum.

After a bit of struggling with the borrow checker, I succeeded in decoding a 175 parts, 100MB file. Noice !

However, when I checked my computed CRC32, it :

- was
**not**the same as the one provided in the file chunks and definitely - was
**not**the same one as computed by the`crc32`

unix utility.

But my data written to disk matched the provided CRC32. So, I was obviously doing something wrong.

## Not so fast, you fool

To speed up the decoding, I was reading one chunk at a time, decoding it to memory and updating the digest on the fly. Once in a while, I would flush to disk the decoded data, keeping in memory only metadata about chunks already read.

I noticed quite fast that reading data chunks *in a random order* was not such
a good idea...

Indeed, CRC32 checksum is computed as data is written to the Digest. So updating it by throwing decoded data chunks in random order wasn't the right way to do it.

I could have reopened the file, reading every bytes out of it and recomputed the
checksum. But I thought that if I was reconstructing a 4GB data file I wouldn't
want to spend my time recomputing the checksum by reading *again* all of the
data that I just had decoded.

Furthermore, if I wanted to multithread *fearlessly*
the decoding or checksuming I needed to find another path.

And... what if I could compute the CRC32 of the whole file by combining all of
the given chunks' checksums ? *THAT* would be nice.

## Standing on the shoulders of giants

Obviously, this is an issue that others already tackled. And *way* better than
you would ever conceive with your single mind.

In the famous zlib library Mark Adler wrote this *not so
well* documented function :
https://github.com/madler/zlib/blob/master/crc32.c#L372 .

I won't go into the nitty gritty details of how it works, he already explained it long and wide here.

What *I* know is that I needed it. And there was no such function in the CRC crate.

A quick web search about it reflected that most of the time people prefer to depend on the libz-sys crate. But I didn't want to depend on many numerous crates on my tiny side project. Rather than doing unsafe rust by calling the C library, I decided to take the bull by the horns and going pure rusty !

## The solution

If you are looking for a pure rust `crc32_combine`

function, here you go.
You'll guess that it is massively inspired by the C function of the Zlib.

Combines two CRC checksums into one. For two data sets, Data1 and Data2 with sizes of len1 and len2, with CRC checksums crc1 and crc2. This function returns the CRC checksum of [Data1,Data2], requiring only crc1, crc2, and len2.

/* zlib.h -- interface of the 'zlib' general purpose compression library version 1.2.11, January 15th, 2017 Copyright (C) 1995-2017 Jean-loup Gailly and Mark Adler This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution. Jean-loup Gailly Mark Adler jloup@gzip.org madler@alumni.caltech.edu */ const GF2_DIM: usize = 32; // dimension of GF(2) vectors (length of CRC) fn gf2_matrix_times(mat: &[u32], vec: u32) -> u32 { let mut sum = 0; let mut idx = 0; let mut vec = vec; while vec != 0 { if (vec & 1) != 0 { sum ^= mat[idx]; } vec >>= 1; idx += 1; } sum } fn gf2_matrix_square(square: &mut [u32; GF2_DIM], mat: &[u32; GF2_DIM]) { for n in 0..GF2_DIM { let d = mat[n]; square[n] = gf2_matrix_times(mat, d); } } fn crc32_combine(crc1: u32, crc2: u32, len2: u32) -> u32 { let mut even: [u32; GF2_DIM] = [0; GF2_DIM]; // even-power-of-two zeros operator let mut odd: [u32; GF2_DIM] = [0; GF2_DIM]; // odd-power-of-two zeros operator let mut crc1 = crc1; let mut len2 = len2; // degenerate case (also disallow negative lengths if len2 <= 0 { return crc1; } // put operator for one zero bit in odd odd[0] = 0xedb88320; // CRC-32 polynomial let mut row = 1; for i in 1..GF2_DIM { odd[i] = row; row <<= 1; } // put operator for two zero bits in even gf2_matrix_square(&mut even, &odd); // put operator for four zero bits in odd gf2_matrix_square(&mut odd, &even); // apply len2 zeros to crc1 (first square will put the operator for one // zero byte, eight zero bits, in even) loop { // apply zeros operator for this bit of len2 gf2_matrix_square(&mut even, &odd); if (len2 & 1) != 0 { crc1 = gf2_matrix_times(&even, crc1); } len2 >>= 1; // if no more bits set; then done if len2 == 0 { break; } // another iteration of the loop with odd and even swapped gf2_matrix_square(&mut odd, &mut even); if (len2 & 1) != 0 { crc1 = gf2_matrix_times(&odd, crc1); } len2 >>= 1; if len2 == 0 { break; } } // return combined crc crc1 ^= crc2; return crc1; }

Usage example :

let mut computed_crc32 = 0; for part in &self.parts { let part_crc32 = match u32::from_str_radix( &part.infos.trailer.part_crc32.as_ref().unwrap().clone(), 16, ) { Ok(parsed_crc32) => parsed_crc32, Err(error) => return Err(decoding_error("gnuh".to_string())), }; println!("parsed pcrc32 {:x}", part_crc32); computed_crc32 = crc32_combine(computed_crc32, part_crc32, part.infos.trailer.file_size); } println!("combined crc32 : {:x}", computed_crc32);