创建格式化的DataFrame，然后逐行添加数据(Creation of formatted DataFrame and then adding data line by line)

我有一个连续的数据流进来，所以我想事先定义DataFrame，以便我没有告诉大熊猫格式化数据或设置索引

所以我想创建一个像DataFrame

df = pd.DataFrame(columns=["timestamp","stockname","price","volume"])

但我想告诉Pandas，数据索引应该是时间戳，而格式应该是

"%Y-%m-%d %H:%M:%S:%f"

和它设置的一个，然后我会读取文件并将数据传递给初始化的DataFrame

我每次都像循环一样获取这些变量中的数据

for line in filehandle: timestamp, stockname, price, volume = fetch(line) here I want to update the "df"

此更新将继续，而我会继续使用该副本

df

让我们说一下

tempdf

由于原始数据帧，在任何给定时间点进行重新采样或任何其他任务

df

正在不断更新

I have a continuous stream of data coming in so I want to define the DataFrame before hand so that I don't have tell pandas to format data or set index

So I want to create a DataFrame like

df = pd.DataFrame(columns=["timestamp","stockname","price","volume"])

but I want to tell Pandas that index of data should be timestamp and that the format would be

"%Y-%m-%d %H:%M:%S:%f"

and one this it set, then I would read through file and pass data to the DataFrame initialized

I get data in variables like these populated every time in loop like

for line in filehandle: timestamp, stockname, price, volume = fetch(line) here I want to update the "df"

this update would go on while I would keep using the copy of

df

let us say into a

tempdf

to do re-sampling or any other task at any given point in time because original dataframe

df

is getting updated continuously

最满意答案

import numpy as np import pandas as pd import datetime as dt import time # create df with timestamp as index df = pd.DataFrame(columns=["timestamp","stockname","price","volume"], dtype = float) pd.to_datetime(df['timestamp'], format = "%Y-%m-%d %H:%M:%S:%f") df.set_index('timestamp', inplace = True) for i in range(10): # for the purposes of functioning demo code i += 1 # counter time.sleep(0.01) # give jupyter notebook a moment timestamp = dt.datetime.now() # to be used as index df.loc[timestamp] = ['AAPL', np.random.randint(1000), np.random.randint(10)] # replace with your database read tempdf = df.copy()

如果您连续读取文件或数据库，则可以将for：循环替换为您在问题中描述的内容。 @ MattR的问题也应该解决; 如果您需要连续记录或更新数据，我不确定熊猫是否是最佳解决方案。

import numpy as np import pandas as pd import datetime as dt import time # create df with timestamp as index df = pd.DataFrame(columns=["timestamp","stockname","price","volume"], dtype = float) pd.to_datetime(df['timestamp'], format = "%Y-%m-%d %H:%M:%S:%f") df.set_index('timestamp', inplace = True) for i in range(10): # for the purposes of functioning demo code i += 1 # counter time.sleep(0.01) # give jupyter notebook a moment timestamp = dt.datetime.now() # to be used as index df.loc[timestamp] = ['AAPL', np.random.randint(1000), np.random.randint(10)] # replace with your database read tempdf = df.copy()

If you are reading a file or database continuously, you can replace the for: loop with what you described in your question. @MattR's questions should also be addressed; if you need to continuously log or update data, I am not sure if pandas is the best solution.

更多推荐