Google BigQuery查询速度很慢

编程入门行业动态更新时间:2024-10-28 04:30:11

本文介绍了Google BigQuery查询速度很慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在使用Google BigQuery，并且正在执行一些来自PHP的简单查询。（例如SELECT * from emails WHERE email='mail@test'）我只是检查电子邮件是否存在于表格中。

表格emails是现在空了。但PHP脚本仍然需要大约4分钟的时间来检查一张空桌子上的175封电子邮件。我希望将来这张桌子将会被填满，并且将会有50万封邮件，那么我估计请求时间会更长。

这是正常的吗？或者是否有任何想法/解决方案来提高检查时间？

（PS：表格emails只包含8列，都是字符串类型）

谢谢！

解决方案

如果您只是检查字段的存在，考虑使用 SELECT COUNT（*）FROM emails where email='mail@test'来代替。这只需要读取一个字段，所以在大型表上花费更少，速度更快。

< 。你可以这样做：
SELECT SUM（（IF（email ='mail1@test'，1,0））as m1， SUM（（IF（email ='mail2@test'，1,0））as m2， SUM（（IF（email ='mail3@test'， 1，0））as m3， ... FROM emails
在单个查询中，你将被限制为64k，但它的计算速度应该非常快，因为它只需要一次扫描一个列。
$ b $另外，如果你想把电子邮件作为每行一个，你可以做一些更有趣的事情，比如
选择电子邮件从电子邮件地址电子邮件在（'mail1@test'，'mail2@test'，'mail3@test'...） GROUP BY电子邮件
作为进一步优化，您可以将它作为左连接：
SELECT t1.email as email，IF（t2.email is not null，true，false）as found FROM [interesting_emails] t1 LEFT OUTER JOIN [emails] t2 ON t1.email = t2.email
如果interesting_emails有你想检查的电子邮件列表，如
mail1@test mail2@test mail3@test
如果邮件表只包含mail1 @和maiil2 @，那么你会回来的结果：
发现电子邮件 ______________ _____ mail1@test true mail2@test false mail3@test true
这样做的好处是，如果需要的话，它可以扩展到数十亿的电子邮件（当数量变大时，可以考虑使用JOIN EACH而不是JOIN）。
I am using Google BigQuery and I am executing some simple queries from PHP. (e.g. SELECT * from emails WHERE email='mail@test') I am just checking if the email exists in the table.

The table "emails" is empty for now. But still the PHP script takes around 4 minutes to check 175 emails on an empty table .. As I wish in future the table will be filled and will have 500 000 mails then I guess the request time will be longer.

Is that normal ? Or are there any ideas/solutions to improve the checking time ?

(P.S. : The table "emails" contains only 8 columns, all are string type)

Thank you !
解决方案
If you are just checking for existence of a field, consider using SELECT COUNT(*) FROM emails where email='mail@test' instead. This will only require reading a single field, and so will cost less and be marginally faster on large tables.

And as Pentium10 suggested, consider using multiple lookups in a single query. You could do this like:
SELECT SUM((IF(email = 'mail1@test', 1, 0)) as m1, SUM((IF(email = 'mail2@test', 1, 0)) as m2, SUM((IF(email = 'mail3@test', 1, 0)) as m3, ... FROM emails
You're going to be limited to something like 64k of these in a single query, but it should be very fast to compute since it only requires scan of a single column in one pass.

Alternately,if you wanted the e-mails as one per row, you could do something a little bit fancier like
SELECT email FROM emails WHERE email IN ('mail1@test', 'mail2@test', 'mail3@test'...) GROUP BY email
As a further optimization, you could do it as a LEFT JOIN:
SELECT t1.email as email, IF(t2.email is not null, true, false) as found FROM [interesting_emails] t1 LEFT OUTER JOIN [emails] t2 ON t1.email = t2.email
If the interesting_emails had the list of emails you wanted to check, like
mail1@test mail2@test mail3@test
If the emails table contained only mail1@ and maiil2@, then you'd get back as results:
email found ______________ _____ mail1@test true mail2@test false mail3@test true
The advantage of doing it this way is that it will scale up to the billions of e-mails if needed (when the number gets large you might consider using a JOIN EACH instead of a JOIN).

更多推荐

Google BigQuery查询速度很慢

本文发布于:2023-10-23 00:29:41，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1519202.html

版权声明:本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

速度很慢 Google BigQuery

上一篇：如何使用IAM角色来使用临时凭据访问资源？

下一篇： Flask查询Mongdb的速度很慢

发布评论取消回复

评论列表（有 0 条评论）

最近发表

荆门网站建设的重要性

win10蓝屏终止代码CRITICAL_PROCESS_DIED解决方法

您可以尝试添加 --skip-broken 选项来解决该问题您可以尝试执行：rpm -Va --nofiles --nodigest 解决方案

关于无线网络波动大的解决办法

Windows10 关于系统中断CPU占用过高导致电脑变卡的解决办法

VS 2019 点击页面自动定位到解决方案资源管理器目录位置

（亲测解决）VMware打开需要半天才进入、打开系统很慢、运行很慢解决办法

Typora官网下载的最新版本mac10.13以下版本用不了的解决办法

成功解决ModuleNotFoundError: No module named ‘torch._C‘

MySQL:由于找不到VCRUNTIME140_1.dll，无法继续执行代码。重新安装程序可能会解决此问题

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍！

热门文章

从源“http://localhost:5173”访问“...”处的 XMLHttpRequest 已被 CORS 策略阻止

币安API错误代码1102，未发送强制参数“时间戳”

如果我在bot telegram nodejs中使用editMessageMedia，我如何制作标题

在 Node.js 中从网络流创建 blob

使用 Node.js / ES6 如何设置 dotenv 文件的自定义路径？

使用 NODE.JS 和 html5 实现低延迟（50 毫秒）视频流

如何从nodejs连接laravel>laravel

使用nodejs观看目录

如果文件包含特定字符串，如何跳过 GitHub 工作流程步骤？

FirebaseError：无法从.env加载环境变量

标签列表

文件

如何在

Python

系统

java

方法

数据

错误

windows

函数

android

linux

教程

如何使用

代码

字符串

计算机

电脑

服务器

NET

应用程序

数组

PHP

MySQL

SQL

对象

项目

程序

数据库

word