我正尝试编写一个查询(PostgreSQL)以获取 2012年获奖次数最多的电影。
I am trying to write a query (PostgreSQL) to get "Movies with highest number of awards in year 2012."
我有以下表格:
CREATE TABLE Award( ID_AWARD bigserial CONSTRAINT Award_pk PRIMARY KEY, award_name VARCHAR(90), category VARCHAR(90), award_year integer, CONSTRAINT award_unique UNIQUE (award_name, category, award_year)); CREATE TABLE AwardWinner( ID_AWARD integer, ID_ACTOR integer, ID_MOVIE integer, CONSTRAINT AwardWinner_pk PRIMARY KEY (ID_AWARD));然后我写了以下查询,给出了正确的结果,但是我有很多代码重复
And I written following query, which gives correct results, but there's is quite a lot of code duplication I think.
select * from (select id_movie, count(id_movie) as awards from Award natural join awardwinner where award_year = 2012 group by id_movie) as SUB where awards = (select max(count) from (select id_movie, count(id_movie) from Award natural join awardwinner where award_year = 2012 group by id_movie) as SUB2);所以 SUB 和 SUB2 是完全相同的子查询。有更好的方法吗?
So SUB and SUB2 are exactly the same subquery. Is there a better way to do this?
推荐答案获取所有获奖影片
Get all winning movies
SELECT id_movie, awards FROM ( SELECT aw.id_movie, count(*) AS awards ,rank() OVER (ORDER BY count(aw.id_movie) DESC) AS rnk FROM award a JOIN awardwinner aw USING (id_award) WHERE a.award_year = 2012 GROUP BY aw.id_movie ) sub WHERE rnk = 1;要点
-
这应该比到目前为止的建议更简单,更快捷。使用 EXPLAIN ANALYZE 进行测试。
在某些情况下,CTE有助于避免代码重复。但是现在还不行:子查询可以很好地完成工作,而且通常更快。
There are cases where CTEs are instrumental to avoid code duplication. But not in this time: a subquery does the job just fine and is usually faster.
您可以在同一查询上的聚合函数上运行窗口函数水平。这就是它起作用的原因:
You can run a window function OVER an aggregate function on the same query level. That's why this works:
rank() OVER (ORDER BY count(aw.id_movie) DESC) AS rnk -
我建议在JOIN条件下使用显式列名,而不要使用 NATURAL JOIN ,如果以后更改/向基础表中添加列,则很容易损坏。 使用使用
I'd suggest to use explicit column names in the JOIN condition instead of NATURAL JOIN, which is prone to breakage if you later change / add columns to the underlying tables. The JOIN condition with USING is almost as short, but doesn't break as easily.
因为几乎是短的,但是却不那么容易破解。因为 id_movie 不能为NULL (由JOIN条件以及pk的一部分决定),使用 count(*)较短,但速度稍快。结果相同。
Since id_movie cannot be NULL (ruled out by the JOIN condition and also part of the pk) it is shorter ans slightly faster to use count(*) instead. Same result.
更快,但是,如果您只需要一个赢家:
Shorter and faster, yet, if you only need one winner:
SELECT aw.id_movie, count(*) AS awards FROM award a JOIN awardwinner aw USING (id_award) WHERE a.award_year = 2012 GROUP BY 1 ORDER BY 2 DESC, 1 -- as tie breaker LIMIT 1使用位置引用( 1 , 2 )作为速记。 我向 id_movie 添加了 ORDER BY 作为决胜局,以防多部电影获胜。
Using positional references (1, 2) here as shorthand. I added id_movie to ORDER BY as tie breaker in case multiple movies should qualify for the win.
更多推荐
查找某年获奖次数最多的电影
发布评论