我们有一个很老的,不支持的程序,将文件复制整个SMB共享。它有一个校验和算法,以确定该文件的内容在复制之前已经改变。该算法似乎很容易上当 - 我们刚刚发现了一个例子,其中两个文件,除了一个1改为2相同,返回相同的校验和。这里的算法:
无符号长GetFileCheckSum(CString的PathFilename) { FILE *文件; 无符号长校验= 0; 无符号长数据= 0; 无符号长计数= 0; 如果((文件= FOPEN(PathFilename,RB))!= NULL) { 而(FREAD(安培;数据,1,sizeof的(无符号长),文件)= FALSE!) { 校验^ =数据+ + +计数; 数据= 0; } fclose函数(文件); } 返回校验; }
我没有太大的程序员(我是一个系统管理员),但我知道一个XOR为基础的检验将是pretty的原油。哪边该算法返回相同的校验和相同大小的不同含量的两个文件的机会? (我不期待一个确切的答案,远程或很可能是好的。)
怎么可能没有一个巨大的性能损失得到改善呢?
最后,这是怎么回事与 FREAD()?我的文档的快速扫描,但我不能弄明白。为数据被设置为依次文件的每个字节? 修改的:好了,所以它把文件读入无符号长(我们假设一个32位操作系统在这里)的块。什么是每个块包含哪些内容?如果该文件的内容是 ABCD ,什么数据的在第一轮的价值?它是(在Perl):
(条例('A')<< 24)及(ORD('B')LT;< 16)及(ORD('C')<< 8)及ORD(D)解决方案
MD5 常用验证传输文件的完整性。来源$ C $ c是在C一应俱全++。它被广泛认为是一种快速,准确的算法
另请参见stackoverflow/questions/122982/robust-and-fast-checksum-algorithm
We have a very old, unsupported program which copies files across SMB shares. It has a checksum algorithm to determine if the file contents have changed before copying. The algorithm seems easily fooled -- we've just found an example where two files, identical except a single '1' changing to a '2', return the same checksum. Here's the algorithm:
unsigned long GetFileCheckSum(CString PathFilename) { FILE* File; unsigned long CheckSum = 0; unsigned long Data = 0; unsigned long Count = 0; if ((File = fopen(PathFilename, "rb")) != NULL) { while (fread(&Data, 1, sizeof(unsigned long), File) != FALSE) { CheckSum ^= Data + ++Count; Data = 0; } fclose(File); } return CheckSum; }I'm not much of a programmer (I am a sysadmin) but I know an XOR-based checksum is going to be pretty crude. What're the chances of this algorithm returning the same checksum for two files of the same size with different contents? (I'm not expecting an exact answer, "remote" or "quite likely" is fine.)
How could it be improved without a huge performance hit?
Lastly, what's going on with the fread()? I had a quick scan of the documentation but I couldn't figure it out. Is Data being set to each byte of the file in turn? Edit: okay, so it's reading the file into unsigned long (let's assume a 32-bit OS here) chunks. What does each chunk contain? If the contents of the file are abcd, what is the value of Data on the first pass? Is it (in Perl):
(ord('a') << 24) & (ord('b') << 16) & (ord('c') << 8) & ord('d')解决方案
MD5 is commonly used to verify the integrity of transfer files. Source code is readily available in c++. It is widely considered to be a fast and accurate algorithm.
See also stackoverflow/questions/122982/robust-and-fast-checksum-algorithm
更多推荐
可这校验算法加以改进?
发布评论