我正在使用Google BigQuery,并且正在执行一些来自PHP的简单查询。 (例如SELECT * from emails WHERE email='mail@test')我只是检查电子邮件是否存在于表格中。
表格emails是现在空了。但PHP脚本仍然需要大约4分钟的时间来检查一张空桌子上的175封电子邮件。我希望将来这张桌子将会被填满,并且将会有50万封邮件,那么我估计请求时间会更长。
这是正常的吗?或者是否有任何想法/解决方案来提高检查时间?
(PS:表格emails只包含8列,都是字符串类型)
谢谢!
解决方案如果您只是检查字段的存在,考虑使用 SELECT COUNT(*)FROM emails where email='mail@test'来代替。这只需要读取一个字段,所以在大型表上花费更少,速度更快。
< 。你可以这样做: SELECT SUM((IF(email ='mail1@test',1,0) )as m1, SUM((IF(email ='mail2@test',1,0))as m2, SUM((IF(email ='mail3@test', 1,0))as m3, ... FROM emails在单个查询中,你将被限制为64k,但它的计算速度应该非常快,因为它只需要一次扫描一个列。
$ b $另外,如果你想把电子邮件作为每行一个,你可以做一些更有趣的事情,比如 选择电子邮件从电子邮件地址电子邮件在('mail1@test','mail2@test','mail3@test'...) GROUP BY电子邮件作为进一步优化,您可以将它作为左连接:
SELECT t1.email as email,IF(t2.email is not null,true,false)as found FROM [interesting_emails] t1 LEFT OUTER JOIN [emails] t2 ON t1.email = t2.email如果interesting_emails有你想检查的电子邮件列表,如
mail1@test mail2@test mail3@test如果邮件表只包含mail1 @和maiil2 @,那么你会回来的结果:
发现电子邮件 ______________ _____ mail1@test true mail2@test false mail3@test true这样做的好处是,如果需要的话,它可以扩展到数十亿的电子邮件(当数量变大时,可以考虑使用JOIN EACH而不是JOIN)。
I am using Google BigQuery and I am executing some simple queries from PHP. (e.g. SELECT * from emails WHERE email='mail@test') I am just checking if the email exists in the table.
The table "emails" is empty for now. But still the PHP script takes around 4 minutes to check 175 emails on an empty table .. As I wish in future the table will be filled and will have 500 000 mails then I guess the request time will be longer.
Is that normal ? Or are there any ideas/solutions to improve the checking time ?
(P.S. : The table "emails" contains only 8 columns, all are string type)
Thank you !
解决方案If you are just checking for existence of a field, consider using SELECT COUNT(*) FROM emails where email='mail@test' instead. This will only require reading a single field, and so will cost less and be marginally faster on large tables.
And as Pentium10 suggested, consider using multiple lookups in a single query. You could do this like:
SELECT SUM((IF(email = 'mail1@test', 1, 0)) as m1, SUM((IF(email = 'mail2@test', 1, 0)) as m2, SUM((IF(email = 'mail3@test', 1, 0)) as m3, ... FROM emailsYou're going to be limited to something like 64k of these in a single query, but it should be very fast to compute since it only requires scan of a single column in one pass.
Alternately,if you wanted the e-mails as one per row, you could do something a little bit fancier like
SELECT email FROM emails WHERE email IN ('mail1@test', 'mail2@test', 'mail3@test'...) GROUP BY emailAs a further optimization, you could do it as a LEFT JOIN:
SELECT t1.email as email, IF(t2.email is not null, true, false) as found FROM [interesting_emails] t1 LEFT OUTER JOIN [emails] t2 ON t1.email = t2.emailIf the interesting_emails had the list of emails you wanted to check, like
mail1@test mail2@test mail3@testIf the emails table contained only mail1@ and maiil2@, then you'd get back as results:
email found ______________ _____ mail1@test true mail2@test false mail3@test trueThe advantage of doing it this way is that it will scale up to the billions of e-mails if needed (when the number gets large you might consider using a JOIN EACH instead of a JOIN).
更多推荐
Google BigQuery查询速度很慢
发布评论