Google BigQuery查询速度很慢

编程入门 行业动态 更新时间:2024-10-28 04:30:11
本文介绍了Google BigQuery查询速度很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在使用Google BigQuery,并且正在执行一些来自PHP的简单查询。 (例如SELECT * from emails WHERE email='mail@test')我只是检查电子邮件是否存在于表格中。

表格emails是现在空了。但PHP脚本仍然需要大约4分钟的时间来检查一张空桌子上的175封电子邮件。我希望将来这张桌子将会被填满,并且将会有50万封邮件,那么我估计请求时间会更长。

这是正常的吗?或者是否有任何想法/解决方案来提高检查时间?

(PS:表格emails只包含8列,都是字符串类型)

谢谢!

解决方案

如果您只是检查字段的存在,考虑使用 SELECT COUNT(*)FROM emails where email='mail@test'来代替。这只需要读取一个字段,所以在大型表上花费更少,速度更快。

< 。你可以这样做:

SELECT SUM((IF(email ='mail1@test',1,0) )as m1, SUM((IF(email ='mail2@test',1,0))as m2, SUM((IF(email ='mail3@test', 1,0))as m3, ... FROM emails

在单个查询中,你将被限制为64k,但它的计算速度应该非常快,因为它只需要一次扫描一个列。

$ b $另外,如果你想把电子邮件作为每行一个,你可以做一些更有趣的事情,比如

选择电子邮件从电子邮件地址电子邮件在('mail1@test','mail2@test','mail3@test'...) GROUP BY电子邮件

作为进一步优化,您可以将它作为左连接:

SELECT t1.email as email,IF(t2.email is not null,true,false)as found FROM [interesting_emails] t1 LEFT OUTER JOIN [emails] t2 ON t1.email = t2.email

如果interesting_emails有你想检查的电子邮件列表,如

mail1@test mail2@test mail3@test

如果邮件表只包含mail1 @和maiil2 @,那么你会回来的结果:

发现电子邮件 ______________ _____ mail1@test true mail2@test false mail3@test true

这样做的好处是,如果需要的话,它可以扩展到数十亿的电子邮件(当数量变大时,可以考虑使用JOIN EACH而不是JOIN)。

I am using Google BigQuery and I am executing some simple queries from PHP. (e.g. SELECT * from emails WHERE email='mail@test') I am just checking if the email exists in the table.

The table "emails" is empty for now. But still the PHP script takes around 4 minutes to check 175 emails on an empty table .. As I wish in future the table will be filled and will have 500 000 mails then I guess the request time will be longer.

Is that normal ? Or are there any ideas/solutions to improve the checking time ?

(P.S. : The table "emails" contains only 8 columns, all are string type)

Thank you !

解决方案

If you are just checking for existence of a field, consider using SELECT COUNT(*) FROM emails where email='mail@test' instead. This will only require reading a single field, and so will cost less and be marginally faster on large tables.

And as Pentium10 suggested, consider using multiple lookups in a single query. You could do this like:

SELECT SUM((IF(email = 'mail1@test', 1, 0)) as m1, SUM((IF(email = 'mail2@test', 1, 0)) as m2, SUM((IF(email = 'mail3@test', 1, 0)) as m3, ... FROM emails

You're going to be limited to something like 64k of these in a single query, but it should be very fast to compute since it only requires scan of a single column in one pass.

Alternately,if you wanted the e-mails as one per row, you could do something a little bit fancier like

SELECT email FROM emails WHERE email IN ('mail1@test', 'mail2@test', 'mail3@test'...) GROUP BY email

As a further optimization, you could do it as a LEFT JOIN:

SELECT t1.email as email, IF(t2.email is not null, true, false) as found FROM [interesting_emails] t1 LEFT OUTER JOIN [emails] t2 ON t1.email = t2.email

If the interesting_emails had the list of emails you wanted to check, like

mail1@test mail2@test mail3@test

If the emails table contained only mail1@ and maiil2@, then you'd get back as results:

email found ______________ _____ mail1@test true mail2@test false mail3@test true

The advantage of doing it this way is that it will scale up to the billions of e-mails if needed (when the number gets large you might consider using a JOIN EACH instead of a JOIN).

更多推荐

Google BigQuery查询速度很慢

本文发布于:2023-10-23 00:29:41,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:速度很慢   Google   BigQuery

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!