我必须对数百万行数据运行一次C#计算并将结果保存在另一个表中。 几年之后,我还没有在C#中使用线程。 我使用.NET v4.5和EF v5。
原始代码是沿着以下方向的:
public static void Main() { Stopwatch sw = new Stopwatch(); sw.Start(); Entities db = new Entities(); DoCalc(db.Clients.ToList()); sw.Stop(); Console.WriteLine(sw.Elapsed); } private static void DoCalc(List<Client> clients) { Entities db = new Entities(); foreach(var c in clients) { var transactions = db.GetTransactions(c); var result = calulate(transactions); //the actual calc db.Results.Add(result); db.SaveChanges(); } }这是我在多线程的尝试:
private static int numberOfThreads = 15; public static void Main() { Stopwatch sw = new Stopwatch(); sw.Start(); Entities db = new Entities(); var splitUpClients = SplitUpClients(db.Clients()); Task[] allTasks = new Task[numberOfThreads]; for (int i = 0; i < numberOfThreads; i++) { Task task = Task.Factory.StartNew(() => DoCalc(splitupClients[i])); allTasks[i] = task; } Task.WaitAll(allTasks); sw.Stop(); Console.WriteLine(sw.Elapsed); } private static void DoCalc(List<Client> clients) { Entities db = new Entities(); foreach(var c in clients) { var transactions = db.GetTransactions(c); var result = calulate(transactions); db.Results.Add(result); db.SaveChanges(); } } //splits the list of clients into n subgroups private static List<List<Client>> SplitUpClients(List<Client> clients) { int maxPerGroup = (int)Math.Ceiling((double)clients.Count() / numberOfThreads); return ts.Select((s, i) => new { Str = s, Index = i }). GroupBy(o => o.Index / maxPerGroup, o => o.Str). Select(coll => coll.ToList()). ToList(); }我的问题是:
这是否是安全和正确的方法?是否存在明显的缺点(尤其是EF方面)?
另外,我如何找到最佳的线程数? 它是更多的更好吗?
I have to run a once off C# calculation on millions of rows of data and save the results in another table. I haven't worked with threading in C# in a couple of years. I'm using .NET v4.5 and EF v5.
The original code is something along the lines of:
public static void Main() { Stopwatch sw = new Stopwatch(); sw.Start(); Entities db = new Entities(); DoCalc(db.Clients.ToList()); sw.Stop(); Console.WriteLine(sw.Elapsed); } private static void DoCalc(List<Client> clients) { Entities db = new Entities(); foreach(var c in clients) { var transactions = db.GetTransactions(c); var result = calulate(transactions); //the actual calc db.Results.Add(result); db.SaveChanges(); } }Here is my attempt at multi-threading:
private static int numberOfThreads = 15; public static void Main() { Stopwatch sw = new Stopwatch(); sw.Start(); Entities db = new Entities(); var splitUpClients = SplitUpClients(db.Clients()); Task[] allTasks = new Task[numberOfThreads]; for (int i = 0; i < numberOfThreads; i++) { Task task = Task.Factory.StartNew(() => DoCalc(splitupClients[i])); allTasks[i] = task; } Task.WaitAll(allTasks); sw.Stop(); Console.WriteLine(sw.Elapsed); } private static void DoCalc(List<Client> clients) { Entities db = new Entities(); foreach(var c in clients) { var transactions = db.GetTransactions(c); var result = calulate(transactions); db.Results.Add(result); db.SaveChanges(); } } //splits the list of clients into n subgroups private static List<List<Client>> SplitUpClients(List<Client> clients) { int maxPerGroup = (int)Math.Ceiling((double)clients.Count() / numberOfThreads); return ts.Select((s, i) => new { Str = s, Index = i }). GroupBy(o => o.Index / maxPerGroup, o => o.Str). Select(coll => coll.ToList()). ToList(); }My question is:
Is this the safe and correct way to do it and are there any obvious shortcomings (especially with regard to EF)?
Also, how do I find the optimum number of threads? Is it the more the merrier?
最满意答案
实体框架DbContext和ObjectContext类不是线程安全的。 所以你不应该在多个线程中使用它们。
虽然看起来你只是将实体传递给其他线程,但在涉及延迟加载时很容易出错。 这意味着在实体的掩护下,实体会回调上下文以获取更多数据。
相反,我会建议将实体列表转换为只需要计算所需数据的特殊不变数据结构列表。 那些不可变的结构不应该回到上下文中,并且不应该能够改变。 当你这样做时,将它们传递给其他线程来进行计算是安全的。
The entity framework DbContext and ObjectContext classes are NOT thread-safe. So you should not use them over multiple threads.
Although it seems like you're only passing entities to other threads, it's easy to go wrong at this, when lazy loading is involved. This means that under the covers the entity will callback to the context to get some more data.
So instead, I would advice to convert the list of entities to a list of special immutable data structures that only need the data that is needed for the calculation. Those immutable structures should not have to call back into the context and should not be able to change. When you do this, it will be safe to pass them on to other threads to do the calculation.
更多推荐
发布评论