MySQL で相関係数を求める

MySQL相関係数を求める

その前に予習

平均: \overline{x} = \displaystyle \frac{1}n\sum_{i=1}^n x_i

母分散:\sigma^2 = \displaystyle \frac{1}n\sum_{i=1}^n (x_i-\overline{x})^2

標準偏差\sigma = \displaystyle \sqrt{ \frac{1}n\sum_{i=1}^n (x_i - \overline{x})^2 }

変動係数: CV = \displaystyle \frac{\sigma} {\overline{x}}

単回帰分析

相関係数 r_{xy} = \frac{\displaystyle \frac{1}n \sum_{i=1}^n(x_i-\overline{x})(y_i-\overline{y}) }
{\displaystyle{\sqrt{ \frac{1}{n}\sum_{i=1}^n (x_i-\overline{x})^2}} \cdot \sqrt{ \frac{1}{n}\sum_{i=1}^n(y_i-\overline{y})^2 }}

単回帰式: y = β_0 + β_1x

回帰係数: β_1= \frac{\displaystyle \sum_{i=1}^{n}(y_i-\overline{y})(x_i-\overline{x})}  {\displaystyle \sum_{i=1}^{n}(x_i-\overline{x})^2} ,
 β_0 = \overline{y} - β_1\cdot \overline{x}

MySQL

table

name x y

標準偏差を求める

SELECT
  name,
  ROUND(AVG(x),2) AS "平均(x)",
  ROUND(AVG(y),2) AS "平均(y)",
  ROUND(VAR_POP(STD(x),3) AS "母分散(x)",
  ROUND(VAR_POP(STD(y),3) AS "母分散(y)",
  ROUND(STD(x),3) AS "標準偏差(x)", 
  ROUND(STD(y),3) AS "標準偏差(y)",
  ROUND(STD(x)/AVG(x),3) AS "変動系数(x)",
  ROUND(STD(y)/AVG(y),3) AS "変動系数(y)"
FROM
  table
GROUP BY
  name WITH ROLLUP

相関係数を求める

SELECT
  name,
  ROUND(SUM((x-avgX)*(y-avgY))/CNT/(stdX*stdY),3) AS Rxy,      #相関係数
  ROUND(SUM((x-avgX)*(y-avgY))/(stdX*stdX),3) AS B1,           #回帰係数 B1
  ROUND(avgY-SUM((x-avgX)*(y-avgY))/(stdX*stdX)*avgX,3) AS B0  #回帰係数 B0
FROM
  table,
  (SELECT
    name,
    AVG(x) AS avgX, #平均(x)
    AVG(y) AS avgY, #平均(y)
    STD(x) AS stdX, #標準偏差(x) 
    STD(y) AS stdY, #標準偏差(y),
    COUNT(*) AS CNT #件数 
  FROM
    table
  GROUP BY
    name
  ) AS DMY USING(name)

GROUP BY
  name

試していませんが。