PySpark 1.5如何将时间戳从秒钟截断到最接近的分钟

编程入门 行业动态 更新时间:2024-10-24 10:20:53
本文介绍了PySpark 1.5如何将时间戳从秒钟截断到最接近的分钟的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 我正在使用PySpark。数据帧('canon_evt')中有一列('dt'),这是一个时间戳。我试图从DateTime值中删除秒。它最初是作为一个字符串从镶木地板读取的。然后我尝试通过

canon_evt = canon_evt.withColumn('dt',to_date(canon_evt.dt))将其转换为Timestamp canon_evt = canon_evt.withColumn('dt',canon_evt.dt.astype('Timestamp'))

然后我想删除秒。我尝试'trunc','date_format',甚至试图将下面的部分连接在一起。我认为它需要一些地图和lambda组合,但我不确定Timestamp是否是一个适当的格式,以及是否可以摆脱秒。

canon_evt = canon_evt.withColumn('dyt',year('dt')+' - '+ month('dt')+ ' - '+ dayofmonth('dt' )+''+ hour('dt')+':'+ minute('dt')) [Row(dt = datetime.datetime(2015,9,16,0,0) ,dyt = None)]

解决方案

转换为Unix时间戳基本的算术应该是窍门:从pyspark.sql导入行$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ unix_timestamp,round df = sc.parallelize([ Row(dt ='1970-01-01 00:00:00'), Row(dt ='2015 -09-16 05:39:46'), Row(dt ='2015-09-16 05:40:46'), Row(dt ='2016-03-05 02: 00:10'),])。toDF() ## unix_timestamp将字符串转换为Uni x时间戳(bigint / long) ##(秒)。除以60,圆,乘以60,并投掷 ##应该工作正常。 ## dt_truncated =((round(unix_timestamp(col(dt))/ 60)* 60) .cast(timestamp)) df.withColumn(dt_truncated,dt_truncated).show(10,False) ## + ------------------- + ----- ---------------- + ## | dt | dt_truncated | ## + ------------------- + --------------------- + ## | 1970-01-01 00:00:00 | 1970-01-01 00:00:00.0 | ## | 2015-09-16 05:39:46 | 2015-09-16 05:40:00.0 | ## | 2015-09-16 05:40:46 | 2015-09-16 05:41:00.0 | ## | 2016-03-05 02:00:10 | 2016-03-05 02:00:00.0 | ## + ------------------- + --------------------- +

I am using PySpark. I have a column ('dt') in a dataframe ('canon_evt') that this a timestamp. I am trying to remove seconds from a DateTime value. It is originally read in from parquet as a String. I then try to convert it to Timestamp via

canon_evt = canon_evt.withColumn('dt',to_date(canon_evt.dt)) canon_evt= canon_evt.withColumn('dt',canon_evt.dt.astype('Timestamp'))

Then I would like to remove the seconds. I tried 'trunc', 'date_format' or even trying to concatenate pieces together like below. I think it requires some sort of map and lambda combination, but I'm not certain whether Timestamp is an appropriate format, and whether it's possible to get rid of seconds.

canon_evt = canon_evt.withColumn('dyt',year('dt') + '-' + month('dt') + '-' + dayofmonth('dt') + ' ' + hour('dt') + ':' + minute('dt')) [Row(dt=datetime.datetime(2015, 9, 16, 0, 0),dyt=None)]

解决方案

Converting to Unix timestamps and basic arithmetics should to the trick:

from pyspark.sql import Row from pyspark.sql.functions import col, unix_timestamp, round df = sc.parallelize([ Row(dt='1970-01-01 00:00:00'), Row(dt='2015-09-16 05:39:46'), Row(dt='2015-09-16 05:40:46'), Row(dt='2016-03-05 02:00:10'), ]).toDF() ## unix_timestamp converts string to Unix timestamp (bigint / long) ## in seconds. Divide by 60, round, multiply by 60 and cast ## should work just fine. ## dt_truncated = ((round(unix_timestamp(col("dt")) / 60) * 60) .cast("timestamp")) df.withColumn("dt_truncated", dt_truncated).show(10, False) ## +-------------------+---------------------+ ## |dt |dt_truncated | ## +-------------------+---------------------+ ## |1970-01-01 00:00:00|1970-01-01 00:00:00.0| ## |2015-09-16 05:39:46|2015-09-16 05:40:00.0| ## |2015-09-16 05:40:46|2015-09-16 05:41:00.0| ## |2016-03-05 02:00:10|2016-03-05 02:00:00.0| ## +-------------------+---------------------+

更多推荐

PySpark 1.5如何将时间戳从秒钟截断到最接近的分钟

本文发布于:2023-10-16 05:30:16,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1496639.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:如何将   最接近   时间   PySpark

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!