我在sql数据库中有一些数据,我想计算斜率.数据具有以下布局:
I have some data in a sql database and I'd like to calculate the slope. The data has this layout:
Date | Keyword | Score 2012-01-10 | ipad | 0.12 2012-01-11 | ipad | 0.17 2012-01-12 | ipad | 0.24 2012-01-10 | taco | 0.19 2012-01-11 | taco | 0.34 2012-01-12 | taco | 0.45我希望通过使用SQL创建新表来使最终输出看起来像这样:
I'd like the final output to look like this by creating a new table using SQL:
Date | Keyword | Score | Slope 2012-01-10 | ipad | 0.12 | 0.06 2012-01-11 | ipad | 0.17 | 0.06 2012-01-12 | ipad | 0.24 | 0.06 2012-01-10 | taco | 0.19 | 0.13 2012-01-11 | taco | 0.34 | 0.13 2012-01-12 | taco | 0.45 | 0.13要使事情复杂化,并非所有的关键字都具有3个日期的数据,例如,有些只有2个日期.
To complicate things, not all Keywords have 3 dates worth of data, some have only 2 for instance.
SQL越简单越好,因为我的数据库是专有的,我不太确定可用的公式,尽管我知道它可以执行OVER(PARTITION BY).谢谢!
The simpler the SQL the better since my database is proprietary and I'm not quite sure what formulas are available, although I know it can do OVER(PARTITION BY) if that helps. Thank you!
更新:我将斜率定义为excel中最适合的y = mx + p,即= slope()
UPDATE: I define the slope as best fit y=mx+p aka in excel it would be =slope()
这是我通常在excel中操作的另一个实际示例:
Here is another actual example that I usually manipulate in excel:
date keyword score slope 1/22/2012 water bottle 0.010885442 0.000334784 1/23/2012 water bottle 0.011203949 0.000334784 1/24/2012 water bottle 0.008460835 0.000334784 1/25/2012 water bottle 0.010363991 0.000334784 1/26/2012 water bottle 0.011800716 0.000334784 1/27/2012 water bottle 0.012948411 0.000334784 1/28/2012 water bottle 0.012732459 0.000334784 1/29/2012 water bottle 0.011682568 0.000334784推荐答案
我能做的最干净的一个:
The cleanest one I could make:
SELECT Scores.Date, Scores.Keyword, Scores.Score, (N * Sum_XY - Sum_X * Sum_Y)/(N * Sum_X2 - Sum_X * Sum_X) AS Slope FROM Scores INNER JOIN ( SELECT Keyword, COUNT(*) AS N, SUM(CAST(Date as float)) AS Sum_X, SUM(CAST(Date as float) * CAST(Date as float)) AS Sum_X2, SUM(Score) AS Sum_Y, SUM(Score*Score) AS Sum_Y2, SUM(CAST(Date as float) * Score) AS Sum_XY FROM Scores GROUP BY Keyword ) G ON G.Keyword = Scores.Keyword;它使用简单线性回归来计算斜率.
结果:
Date Keyword Score Slope 2012-01-22 water bottle 0,010885442 0,000334784345222076 2012-01-23 water bottle 0,011203949 0,000334784345222076 2012-01-24 water bottle 0,008460835 0,000334784345222076 2012-01-25 water bottle 0,010363991 0,000334784345222076 2012-01-26 water bottle 0,011800716 0,000334784345222076 2012-01-27 water bottle 0,012948411 0,000334784345222076 2012-01-28 water bottle 0,012732459 0,000334784345222076 2012-01-29 water bottle 0,011682568 0,000334784345222076每个数据库系统似乎都有不同的方法将日期转换为数字:
Every database system seems to have a different approach to converting dates to numbers:
- MySQL :TO_SECONDS(date)或TO_DAYS(date)
- Oracle: TO_NUMBER(TO_CHAR(date, 'J'))或date - TO_DATE('1','yyyy')
- MS SQL Server: CAST(date AS float)(或等效的CONVERT)
- MySQL: TO_SECONDS(date) or TO_DAYS(date)
- Oracle: TO_NUMBER(TO_CHAR(date, 'J')) or date - TO_DATE('1','yyyy')
- MS SQL Server: CAST(date AS float) (or equivalent CONVERT)
更多推荐
如何在SQL中计算斜率
发布评论