我正在试图弄清楚如何压扁bigquery日志表(logs.cloudaudit_googleapis_com_data_access_20160404等),以便我基本上可以看到任何给定目标表的所有已完成的作业。
我理想情况下只需要下面的内容向我展示触及表[dataset_xyz.table_abc]的作业的所有条目,然后我就可以弄清楚如何理解根据作业类型填充的某些字段等。
SELECT * FROM [logs.cloudaudit_googleapis_com_data_access_20160404] where ( protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.datasetId='dataset_xyz' and protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.tableId='table_abc' ) or ( protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.tableCopy.destinationTable.datasetId='dataset_xyz' and protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.tableCopy.destinationTable.tableId='table_abc' ) or ( protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.datasetId='dataset_xyz' and protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.tableId='table_abc' )我试图做很多嵌套的扁平化但我无法弄明白,因为发现日志结构有点复杂。
我基本上希望能够查询日志导出,说“向我展示与编辑表[dataset_xyz.table_abc]有关的所有内容”,所以我猜主要是load,tableCopy和查询作业,它们附加或覆盖了任何数据table [dataset_xyz.table_abc]。
我唯一能想到的就是沿着它的嵌套记录逐字地分开表格然后以某种方式将它们全部重新组合在一起,但这似乎是一个疯狂的想法。 我确信有一种方法可以反复压扁它,但我不知道如何压扁这么复杂的结构。 即使我可以在protoPayload.serviceData.jobCompletedEvent下展平一切。*所以我可以做
select protoPayload.serviceData.jobCompletedEvent.* from flatten(...或者也许有一种更简单的方法可以解决这个问题,我错过了?
ps我认为这可能是导游的一个很好的例子,因为我认为这是人们想做的常见事情。
I'm trying to figure out how to flatten the bigquery log tables (logs.cloudaudit_googleapis_com_data_access_20160404 etc) so that i can basically see all completed jobs for any given destination table.
I ideally just want something like the below to show me all the entries for jobs that touched the table [dataset_xyz.table_abc] and then i can figure out how to make sense of some fields being populated based on type of job etc.
SELECT * FROM [logs.cloudaudit_googleapis_com_data_access_20160404] where ( protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.datasetId='dataset_xyz' and protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.tableId='table_abc' ) or ( protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.tableCopy.destinationTable.datasetId='dataset_xyz' and protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.tableCopy.destinationTable.tableId='table_abc' ) or ( protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.datasetId='dataset_xyz' and protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.tableId='table_abc' )I was trying to do lots of nested flattens but i could not really figure it out tbh as find the log structure a bit complex.
I basically want to be able to query the log exports to say "show me everything that had anything to do with editing table [dataset_xyz.table_abc]" so i guess mainly load, tableCopy, and query jobs that either appended or overwrote any data in table [dataset_xyz.table_abc].
The only next thing i can think of it to literally pick apart the table along its nested records and then somehow join them all back together separately but that seems like a crazy idea. I'm sure there is a way to flatten it repeatedly but i just cant figure out how to flatten such a complicated structure. Even if i could just flatten everything under protoPayload.serviceData.jobCompletedEvent.* so i could do
select protoPayload.serviceData.jobCompletedEvent.* from flatten(...Or maybe there is a much easier way to go about this that i'm missing?
p.s. i think this could be a good example for the guide as i'd imagine is a common enough thing people want to do.
最满意答案
你试过这个:
SELECT protoPayload.serviceData.jobCompletedEvent.job.jobName.jobId FROM [audit_logs.cloudaudit_googleapis_com_data_access_20160406]省略记录if(sum(sum)
(protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.tableId ='table'和protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.datasetId ='ds')或(protoPayload.serviceData.jobCompletedEvent.job .jobConfiguration.load.destinationTable.datasetId ='ds'AND protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.tableId ='table'
))= 0)
Have you tried this:
SELECT protoPayload.serviceData.jobCompletedEvent.job.jobName.jobId FROM [audit_logs.cloudaudit_googleapis_com_data_access_20160406] omit record if ( sum(
(protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.tableId = 'table' and protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.datasetId = 'ds' ) or ( protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.datasetId = 'ds' AND protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.tableId = 'table'
) ) = 0)
更多推荐
发布评论