我们有一个SSIS程序包,开发团队显然将其称为慢程序。由于他们没有SSIS ETL人员,因此作为一名DBA,我尝试进行深入研究。以下是我找到的信息: SQL Server已于2014年版本升级-就地升级到2017年,因此具有两个版本的SSIS。
We have an SSIS package that is apparently termed as 'slow' by the development team. Since they do not have a person with SSIS ETL, as a DBA I tried digging into it. Below is the information I found: SQL Server was 2014 version upgraded -inplace to 2017 so it has SSIS of both versions.
SQL和SSIS在同一服务器上。已为SQL分配了最大的内存,可为SSIS和OS留出大约100 GB的空间。
Both SQL and SSIS are on same server. SQL has been allocated max memory leaving around 100 GB for SSIS and OS.
请与我分享有关如何强制SQL Server使用多个线程运行此select命令的所有想法,以便整个表
Kindly share any ideas on how can I force the SQL Server to run this select command using multiple threads so that entire table gets inside SSIS buffer pool faster.
Edit :我知道 bcp 可以比任何进程更快地读取数据并将其保存到平面文件,但此时必须保持对SSIS软件包的更改最小,并探索可以纳入SSIS软件包的选项。
Edit: I am aware that bcp can read data faster than any process and save it to flatfile but at this point changes to the SSIS package has to be kept minimum and exploring options that can be incorporated within SSIS package.
Edit2 :并行性非常适合我为许多其他查询验证的SQL Server。该表为200 GB。仅仅是SSIS的东西并没有尽我所能地重击数据库。
Edit2: Parallelism works perfectly for my SQL Server as I verified for a lot of other queries.The table in question is 200 GB. It is something with SSIS only which is not hammering my DB as hard as it should.
Edit3 :我已经取得了一些进展,将缓冲区值调整为100 MB和最大行数达到100000,现在该程序包似乎做得更好。当我直接使用dtexec实用程序在服务器上运行此程序包时,它每秒产生40-50 MB的良好负载,但是通过SQL作业,它永远不会产生超过10 MB的负载。因此,我试图找出这种现象。
Edit3: I have made some progress, adjusted the buffer value to 100 MB and max rows to 100000 and now the package seem to be doing better. when I run this package on the server directly using dtexec utility, it generates good load of 40- 50 MB per second but through SQL job it never generates lod more than 10 MB. so I am trying to figure out this behavior.
Edit4 :我发现当我直接从登录到服务器并调用dtexec实用程序运行程序包时,它运行良好,因为它在DB上产生了良好的负载,导致数据I\O保持稳定在30-50 MB\sec之间。 SQL作业中的同一件事永远不会超过10 MB\sec。
Edit4: I found that when I run the package directly from logging to the server and invoking dtexec utility, it runs good because it generates good load on the DB causing data I\O to remain steady between 30-50 MB\sec. The same thing from SQL job never exceeds the I\O more than 10 MB\sec.
我什至尝试使用代理运行程序包并选择cmdline操作但没有变化。代理人确实在这里很烂,在这里有什么问题的任何指针?
I even tried to run the package using agent and opting for cmdline operation but no changes. Agent literally sucks here, any pointers on what could be wrong here?
最终尝试:我被观察困扰了,我终于: 1)相同的包装通过调用dtexc实用程序从Windows节点的命令提示符运行时,运行速度快3倍2)完全相同的程序包被具有Windows和SQL Server上的sysadmin权限的SQL Agent调用时,运行速度比上述速度慢3倍
Final Try: I am stumped at the observation I have finally: 1)Same package runs 3x faster when run from command prompt from windows node by invoking dtexc utility 2) Exact same package runs 3 times slower than above when involked by SQL agent which has sysadmin permissions on windows as well as SQL Server
在两种情况下,我都试图查看它们调用的DTEXEC版本,并且它们都调用相同的版本。所以为什么我会这么慢是出于我的理解。
In both cases, I tried to see the version of DTEXEC they invoke, and they both invoke the same version. So why one would be so slow is out of my understanding.
推荐答案我不认为有一个通用的解决方案此问题,因为在特殊情况下您没有提供太多信息。由于您的数据流任务中包含两个组件(OLE DB源和平面文件目标),因此我将尝试给出与每个组件相关的一些建议。
I don't think that there is a general solution to this issue since it is a particular case that you didn't provide much information. Since there are two components in your data flow task (OLE DB Source and Flat File Destination), I will try to give some suggestions related to each component.
在为每个组件提供建议之前,最好提及以下内容:
Before giving suggestions for each component, it is good to mention the following:
OLE DB源
如前所述,您正在使用 Select * from view 查询,其中数据存储在表中,包含大量数据。 SQL Server查询优化器可能会发现,使用表扫描读取数据要比从索引读取更为有效,尤其是在您的表没有聚集索引(行存储或列存储)的情况下。
OLE DB Source
As you mentioned, you are using a Select * from view query where data is stored in a table that contains a considerable amount of data. The SQL Server query optimizer may find that reading data using Table Scan is more efficient than reading from indexes, especially if your table does not have a clustered index (row store or column store).
您可以尝试通过许多方法来改善数据负载:
There are many things you may try to improve data load:
平面文件目标
更多推荐
SSIS程序包全表加载缓慢
发布评论