从StreamReader中批量读取(Read from StreamReader in batches)

编程入门 行业动态 更新时间:2024-10-11 09:29:36
从StreamReader中批量读取(Read from StreamReader in batches)

尝试通过StreamReader将800MB文本文件加载到DataTable中时,我一直在运行OutOfMemory异常。 我想知道是否有一种方法可以批量加载内存流中的DataTable,即从StreamReader读取文本文件的前10,000行,创建DataTable,用DataTable做一些事情,然后将下一个10,000行加载到StreamReader中,等等。

我的谷歌在这里没有太大的帮助,但似乎应该有一个简单的方法来做到这一点。 最终,我将使用SqlBulkCopy将DataTables写入MS SQL数据库,所以如果比我所描述的更容易,我会很感激正确方向上的快速指针。

编辑 - 这是我正在运行的代码:

public static DataTable PopulateDataTableFromText(DataTable dt, string txtSource) { StreamReader sr = new StreamReader(txtSource); DataRow dr; int dtCount = dt.Columns.Count; string input; int i = 0; while ((input = sr.ReadLine()) != null) { try { string[] stringRows = input.Split(new char[] { '\t' }); dr = dt.NewRow(); for (int a = 0; a < dtCount; a++) { string dataType = dt.Columns[a].DataType.ToString(); if (stringRows[a] == "" && (dataType == "System.Int32" || dataType == "System.Int64")) { stringRows[a] = "0"; } dr[a] = Convert.ChangeType(stringRows[a], dt.Columns[a].DataType); } dt.Rows.Add(dr); } catch (Exception ex) { Console.WriteLine(ex.ToString()); } i++; } return dt; }

这里是返回的错误:

“System.OutOfMemoryException:抛出了'System.OutOfMemoryException'类型的异常。 在System.String.Split(Char []分隔符,Int32计数,StringSplitOptions选项) 在System.String.Split(Char []分隔符} 在Harvester.Config.PopulateDataTableFromText(DataTable dt,String txtSource)in C:....“

关于将数据直接加载到SQL中的建议 - 当涉及到C#时,我有些不知所措,但我认为这基本上是我在做的事情? SqlBulkCopy.WriteToServer接受我从文本文件创建的DataTable并将其导入到sql中。 有没有更容易的方法来做到这一点,我失踪了?

编辑:哦,我忘了提及 - 这个代码不会与SQL Server在同一台服务器上运行。 数据文本文件位于服务器B上,需要写入服务器A的表中。这是否排除使用bcp?

I have been running into OutOfMemory Exceptions while trying to load an 800MB text file into a DataTable via StreamReader. I was wondering if there a way to load the DataTable from the memory stream in batches, ie, read the first 10,000 rows of the text file from StreamReader, create DataTable, do something with DataTable, then load the next 10,000 rows into the StreamReader and so on.

My googles weren't very helpful here, but it seems like there should be an easy way to do this. Ultimately I will be writing the DataTables to an MS SQL db using SqlBulkCopy so if there is an easier approach than what I have described, I would be thankful for a quick pointer in the right direction.

Edit - Here is the code that I am running:

public static DataTable PopulateDataTableFromText(DataTable dt, string txtSource) { StreamReader sr = new StreamReader(txtSource); DataRow dr; int dtCount = dt.Columns.Count; string input; int i = 0; while ((input = sr.ReadLine()) != null) { try { string[] stringRows = input.Split(new char[] { '\t' }); dr = dt.NewRow(); for (int a = 0; a < dtCount; a++) { string dataType = dt.Columns[a].DataType.ToString(); if (stringRows[a] == "" && (dataType == "System.Int32" || dataType == "System.Int64")) { stringRows[a] = "0"; } dr[a] = Convert.ChangeType(stringRows[a], dt.Columns[a].DataType); } dt.Rows.Add(dr); } catch (Exception ex) { Console.WriteLine(ex.ToString()); } i++; } return dt; }

And here is the error that is returned:

"System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown. at System.String.Split(Char[] separator, Int32 count, StringSplitOptions options) at System.String.Split(Char[] separator} at Harvester.Config.PopulateDataTableFromText(DataTable dt, String txtSource) in C:...."

Regarding the suggestion to load the data directly into SQL - I'm a bit of a noob when it comes to C# but I thought that is basically what I am doing? SqlBulkCopy.WriteToServer takes the DataTable that I create from the text file and imports it to sql. Is there an even easier way to do this that I am missing?

Edit: Oh, I forgot to mention - this code will not be running on the same server as the SQL Server. The Data text file is on Server B and needs to be written to table in Server A. Does that preclude using bcp?

最满意答案

你是否真的需要通过批处理来处理数据? 或者你可以逐行处理它? 在后一种情况下,我认为Linq在这里可能非常有用,因为它可以很容易地通过方法的“管道”来传输数据。 这样您就不需要一次加载大量数据,一次只能加载一行

首先,你需要让你的StreamReader可枚举。 这很容易通过扩展方法完成:

public static class TextReaderExtensions { public static IEnumerable<string> Lines(this TextReader reader) { string line; while((line = reader.ReadLine()) != null) { yield return line; } } }

这样你可以使用StreamReader作为Linq查询的源代码。

然后你需要一个接受一个字符串并将其转换为DataRow :

DataRow ParseDataRow(string input) { // Your parsing logic here ... }

使用这些元素,您可以轻松地将每行从文件投影到DataRow,并根据需要执行任何操作:

using (var reader = new StreamReader(fileName)) { var rows = reader.Lines().Select(ParseDataRow); foreach(DataRow row in rows) { // Do something with the DataRow } }

(注意,你可以用一个简单的循环来做类似的事情,而不使用Linq,但我认为Linq使得代码更具可读性......)

Do you actually need to process the data by batches of rows ? Or could you process it row by row ? In the latter case, I think Linq could be very helpful here, because it makes it easy to stream data across a "pipeline" of methods. That way you don't need to load a lot of data at once, only one row at a time

First, you need to make your StreamReader enumerable. This is easily done with an extension method:

public static class TextReaderExtensions { public static IEnumerable<string> Lines(this TextReader reader) { string line; while((line = reader.ReadLine()) != null) { yield return line; } } }

That way you can use the StreamReader as the source for a Linq query.

Then you need a method that takes a string and converts it to a DataRow:

DataRow ParseDataRow(string input) { // Your parsing logic here ... }

With those elements, you can easily project each line from the file to a DataRow, and do whatever you need with it:

using (var reader = new StreamReader(fileName)) { var rows = reader.Lines().Select(ParseDataRow); foreach(DataRow row in rows) { // Do something with the DataRow } }

(note that you could do something similar with a simple loop, without using Linq, but I think Linq makes the code more readable...)

更多推荐

本文发布于:2023-07-18 07:37:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1157039.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:批量   StreamReader   batches   Read

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!