如果文件的大小非常大,如何确保文件在vb.net中具有唯一的行

编程入门 行业动态 更新时间:2024-10-24 10:21:59
本文介绍了如果文件的大小非常大,如何确保文件在vb中具有唯一的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

语言: vb 文件大小:1GB和其他东西。

文本文件的编码::UTF8(所以每个字符都由不同的字节数表示)。

校对:同样,最受欢迎的版本将是唯一的版本。)。我想我知道如何处理它。

由于每个字符都由不同的字节数表示,每行的字符数不同,每行也有所不同。

我想我们必须为每一行计算散列值。我们还需要将缓冲区的位置存储在每行的位置。然后我们必须比较缓冲区。然后,我们将检查是否显示相同的行。

是否有特别的功能最适合?

解决方案

根据行的长度,您可以为每行和每一行计算一个MD5散列值,而不是在 HashMap :

使用sr作为新的StreamReader(myFile) Dim lines As New HashSet(Of String) Dim md5 As New Security.Cryptography.MD5Cng() 虽然sr.BaseStream.Position< sr.BaseStream.Length Dim l As String = sr.ReadLine() Dim hash As String = String.Join(String.Empty,md5.ComputeHash(System.Text.Encoding.UTF8.GetBytes( l))。Select(Function(x)x.ToString(x2))) 如果lines.Contains(hash)Then '行不是唯一的 Exit while else lines.Add(hash) End If End While End使用

未经测试,但这可能足够满足您的需求。我想不出更快的东西,仍然保持一些简洁的表象:)

Language: vb File size: 1GB, and stuff.

Encoding of the text file: UTF8 (so each character is represented by different numbers of bytes).

Collation: UnicodeCI (when several characters are essentially the same, the most popular version will be the one unique.). I think I know how to handle t his one.

Because each character is represented by different numbers of bytes and each line has different numbers of characters, the number of bytes in each line also vary.

I suppose we have to compute hash for each line. We also need to store buffers location where the line each. Then we have to compare buffers. Then we will check whether the same line shows up or not.

Is there special functions best for that?

解决方案

Depending on how long the lines are, you may be able to compute an MD5 hash value for each line and store than in a HashMap:

Using sr As New StreamReader("myFile") Dim lines As New HashSet(Of String) Dim md5 As New Security.Cryptography.MD5Cng() While sr.BaseStream.Position < sr.BaseStream.Length Dim l As String = sr.ReadLine() Dim hash As String = String.Join(String.Empty, md5.ComputeHash(System.Text.Encoding.UTF8.GetBytes(l)).Select(Function(x) x.ToString("x2"))) If lines.Contains(hash) Then 'Lines are not unique Exit While Else lines.Add(hash) End If End While End Using

Untested, but this may be fast enough for your needs. I can't think of something much faster that still maintains some semblance of conciseness :)

更多推荐

如果文件的大小非常大,如何确保文件在vb.net中具有唯一的行

本文发布于:2023-11-04 05:25:31,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1557064.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:文件   非常大   大小   net   vb

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!