按时间间隔分组消息

编程入门行业动态更新时间:2024-10-25 07:22:18

本文介绍了按时间间隔分组消息的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我目前正在尝试将发送的消息分组1秒的时间间隔。我目前正在计算时间延迟：

I'm currently trying to group messages that are sent out by 1 second time intervals. I'm currently calculating time latency with this:

def time_deltas(infile): entries = (line.split() for line in open(INFILE, "r")) ts = {} for e in entries: if " ".join(e[2:5]) == "T out: [O]": ts[e[8]] = e[0] elif " ".join(e[2:5]) == "T in: [A]": in_ts, ref_id = e[0], e[7] out_ts = ts.pop(ref_id, None) yield (float(out_ts),ref_id[1:-1],(float(in_ts)*1000 - float(out_ts)*1000)) INFILE = 'C:/Users/klee/Documents/test.txt' import csv with open('test.csv', 'w') as f: csv.writer(f).writerows(time_deltas(INFILE))

然而，我想计算发送出去的每秒T in：[A]消息的数量，并一直在努力工作这样做：

HOWEVER I want to calculate the number of "T in: [A]" messages per second that are sent out, and have been trying to work with this to do so:

import datetime import bisect import collections data=[ (datetime.datetime(2010, 2, 26, 12, 8, 17), 5594813L), (datetime.datetime(2010, 2, 26, 12, 7, 31), 5594810L), (datetime.datetime(2010, 2, 26, 12, 6, 4) , 5594807L), ] interval=datetime.timedelta(seconds=50) start=datetime.datetime(2010, 2, 26, 12, 6, 4) grid=[start+n*interval for n in range(10)] bins=collections.defaultdict(list) for date,num in data: idx=bisect.bisect(grid,date) bins[idx].append(num) for idx,nums in bins.iteritems(): print('{0} --- {1}'.format(grid[idx],len(nums)))

在这里找到： Python：按时间间隔组合结果

（我意识到这些单位将是我想要的，但我只是调查一般的想法...）

(I realize the units would be off for what I want, but I'm just looking into the general idea...)

到目前为止，我一直很失败，不胜感激。谢谢！

I've been mostly unsuccessful thus far and would appreciate any help. Thanks!

另外，数据显示为：

Also, The data appears as:

082438.577652 - T in: [A] accepted. ordID [F25Q6] timestamp [082438.575880] RefNumber [6018786] State [L]

再次感谢！对此，我真的非常感激。：D

Thanks again! I really appreciate it. :D

推荐答案

假设您想要在第二秒内以1秒为间隔发布的数据分组，我们可以利用您的数据被排序，而 int（out_ts）截断时间戳到第二个可以用作分组键的事实。

Assuming you want to group your data by those issued within 1 second intervals on the second, we can make use of the fact that your data is ordered and that int(out_ts) truncates the timestamp to the second which we can use as a grouping key.

最简单的分组方式是使用 itertools.groupby ：

Simplest way to do the grouping would be to use itertools.groupby:

from itertools import groupby data = get_time_deltas(INFILE) get_key = lambda x: int(x[0]) # function to get group key from data bins = [(k, list(g)) for k, g in groupby(data, get_key)]

bins 将是元组的列表，其中元组中的第一个值是关键字（整数，例如 082438 ），第二个值是数据条目列表这是在第二个（发布时间戳= 082438。* ）。

bins will be a list of tuples where the first value in the tuple is the key (integer, e.g. 082438) and the second value is the a list of data entries that were issued on that second (with timestamp = 082438.*).

使用示例：

# print out the number of messages for each second for sec, data in bins: print('{0} --- {1}'.format(sec, len(data))) # write (sec, msg_per_sec) out to CSV file import csv with open("test.csv", "w") as f: csv.writer(f).writerows((s, len(d)) for s, d in bins) # get average message per second message_counts = [len(d) for s, d in bins] avg_msg_per_second = float(sum(message_count)) / len(message_count)

PS在这个例子中，列表用于 bins ，以便维护数据顺序。如果您需要随机访问数据，请考虑使用 OrderedDict 。

P.S. In this example, a list was used for bins so that the order of data is maintained. If you need random access to the data, consider using an OrderedDict instead.

请注意，相对来说，解决方案以秒的倍数分组。例如，按照每分钟（60秒）的消息分组，将 get_key 函数更改为：

Note that it is relatively straight-forward to adapt the solution to group by multiples of seconds. For example, to group by messages per minute (60 seconds), change the get_key function to:

get_key = lambda x: int(x[0] / 60) # truncate timestamp to the minute

更多推荐

按时间间隔分组消息

本文发布于:2023-10-13 09:45:25，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1487604.html