将时间戳舍入到最接近的 30 秒

编程入门 行业动态 更新时间:2024-10-24 12:21:24
本文介绍了将时间戳舍入到最接近的 30 秒的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我在 DF 中有一个列,它包含格式 (yyyy-mm-dd HH:mm:ss) 的 timestamp.我需要将 timestamp 舍入到最接近的 30 秒.

I have a column in a DF and it contains timestamp in format ( yyyy-mm-dd HH:mm:ss). I need to round timestamp to nearest 30 seconds.

old column desired column 2016-02-09 19:31:02 2016-02-09 19:31:00 2016-02-09 19:31:35 2016-02-09 19:31:30 2016-02-09 19:31:52 2016-02-09 19:32:00 2016-02-09 19:31:28 2016-02-09 19:31:30

是否可以在 Pyspark 中做到这一点?

Is it possible to do that in Pyspark ?

推荐答案

如果您使用的是 spark 版本 1.5+,则可以使用 pyspark.sql.functions.second() 从时间戳列中获取秒数.

If you're using spark verson 1.5+, you can use pyspark.sql.functions.second() to get the seconds from your timestamp column.

import pyspark.sql.functions as f df.withColumn("second", f.second("old_timestamp")).show() #+-------------------+------+ #| old_timestamp|second| #+-------------------+------+ #|2016-02-09 19:31:02| 2| #|2016-02-09 19:31:35| 35| #|2016-02-09 19:31:52| 52| #|2016-02-09 19:31:28| 28| #+-------------------+------+

一旦你有了秒部分,你就可以把这个数字除以 30,四舍五入,再乘以 30 得到新的"秒.

Once you have the seconds part you can take this number, divide by 30, round it, and multiply by 30 to get the "new" second.

df.withColumn("second", f.second("old_timestamp"))\ .withColumn("new_second", f.round(f.col("second")/30)*30)\ .show() #+-------------------+------+----------+ #| old_timestamp|second|new_second| #+-------------------+------+----------+ #|2016-02-09 19:31:02| 2| 0.0| #|2016-02-09 19:31:35| 35| 30.0| #|2016-02-09 19:31:52| 52| 60.0| #|2016-02-09 19:31:28| 28| 30.0| #+-------------------+------+----------+

从新"秒开始,我们可以计算出一个以秒为单位的偏移量,当添加到原始时间戳时,将产生所需的四舍五入"时间戳.

From the "new" second, we can compute an offset in seconds, which when added to the original timestamp will produce the desired "rounded" timestamps.

df.withColumn("second", f.second("old_timestamp"))\ .withColumn("new_second", f.round(f.col("second")/30)*30)\ .withColumn("add_seconds", f.col("new_second") - f.col("second"))\ .show() #+-------------------+------+----------+-----------+ #| old_timestamp|second|new_second|add_seconds| #+-------------------+------+----------+-----------+ #|2016-02-09 19:31:02| 2| 0.0| -2.0| #|2016-02-09 19:31:35| 35| 30.0| -5.0| #|2016-02-09 19:31:52| 52| 60.0| 8.0| #|2016-02-09 19:31:28| 28| 30.0| 2.0| #+-------------------+------+----------+-----------+

如我们所见,此列中的负数表示必须将原始时间向下舍入.正数会增加时间.

As we can see, a negative number in this column means that the original time has to be rounded down. A positive number will increase the time.

为了将这个时间添加到原始时间戳中,首先使用 pyspark.sql.functions.unix_timestamp().添加后,使用 pyspark.sql.functions.from_unixtime().

In order to add this time to the original timestamp, first convert it to a unix timestamp using pyspark.sql.functions.unix_timestamp(). After the addition, convert the result back to a timestamp using pyspark.sql.functions.from_unixtime().

将所有这些放在一起(浓缩中间步骤):

Putting this all together (condensing the intermediate steps):

df.withColumn( "add_seconds", (f.round(f.second("old_timestamp")/30)*30) - f.second("old_timestamp") )\ .withColumn( "new_timestamp", f.from_unixtime(f.unix_timestamp("old_timestamp") + f.col("add_seconds")) )\ .drop("add_seconds")\ .show() #+-------------------+-------------------+ #| old_timestamp| new_timestamp| #+-------------------+-------------------+ #|2016-02-09 19:31:02|2016-02-09 19:31:00| #|2016-02-09 19:31:35|2016-02-09 19:31:30| #|2016-02-09 19:31:52|2016-02-09 19:32:00| #|2016-02-09 19:31:28|2016-02-09 19:31:30| #+-------------------+-------------------+

更多推荐

将时间戳舍入到最接近的 30 秒

本文发布于:2023-10-16 05:29:59,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1496638.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:最接近   时间   戳舍入到

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!