聚类相似的时间序列?

编程入门 行业动态 更新时间:2024-10-23 11:19:24
本文介绍了聚类相似的时间序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有10-20k个不同的时间序列(24维数据-一天中的每个小时的一列)之间的某个地方,我对聚类的时间序列表现出大致相同的活动模式感兴趣.

I have somewhere between 10-20k different time-series (24 dimensional data -- a column for each hour of the day) and I'm interested in clustering time series that exhibit roughly the same patterns of activity.

我最初开始实施动态时间规整(DTW),原因是:

I had originally started to implement Dynamic Time Warping (DTW) because:

  • 并非我所有的时间序列都完全对齐
  • 出于我的目的,两个略有偏移的时间序列应被视为相似
  • 形状相同但比例不同的两个时间序列应被视为相似
  • 我对DTW遇到的唯一问题是,它似乎无法很好地扩展-在500x500距离矩阵上的 fastdtw 花费了大约30分钟.

    The only problem I had run into with DTW was that it did not appear to scale well -- fastdtw on a 500x500 distance matrix took ~30 minutes.

    还有哪些其他方法可以帮助我满足条件2&3?

    What other methods exist that would help me satisfy conditions 2 & 3?

    推荐答案

    如果将时间序列分解为趋势,季节性和残差,ARIMA可以胜任.之后,使用K最近邻算法.但是,基本上由于ARIMA,计算成本可能会很高.

    ARIMA can do the job, if you decompose the time series into trend, seasonality and residuals. After that, use a K-Nearest Neighbor algorithm. However, computational cost may be expensive, basically due to ARIMA.

    在ARIMA中:

    from statsmodels.tsa.arima_model import ARIMA model0 = ARIMA(X, dates=None,order=(2,1,0)) model1 = model0.fit(disp=1) decomposition = seasonal_decompose(np.array(X).reshape(len(X),),freq=100) ### insert your data seasonality in 'freq' trend = decomposition.trend seasonal = decomposition.seasonal residual = decomposition.resid

    作为@Sushant评论的补充,您可以分解时间序列,并可以检查4个图中的一个或全部的相似性:数据,季节性,趋势和残差.

    As a complement to @Sushant comment, you decompose the time series and can check for similarity in one or all of the 4 plots: data, seasonality, trend and residuals.

    然后是一个数据示例:

    import numpy as np import matplotlib.pyplot as plt sin1=[np.sin(x)+x/7 for x in np.linspace(0,30*3,14*2,1)] sin2=[np.sin(0.8*x)+x/5 for x in np.linspace(0,30*3,14*2,1)] sin3=[np.sin(1.3*x)+x/5 for x in np.linspace(0,30*3,14*2,1)] plt.plot(sin1,label='sin1') plt.plot(sin2,label='sin2') plt.plot(sin3,label='sin3') plt.legend(loc=2) plt.show()

    X=np.array([sin1,sin2,sin3]) from sklearn.neighbors import NearestNeighbors nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X) distances, indices = nbrs.kneighbors(X) distances

    您将获得相似之处:

    array([[ 0. , 16.39833107], [ 0. , 5.2312092 ], [ 0. , 5.2312092 ]])

    更多推荐

    聚类相似的时间序列?

    本文发布于:2023-10-28 18:19:07,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1537404.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:序列   时间

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!