在熊猫DataFrame中快速应用字符串操作(Quickly applying string operations in a pandas DataFrame)
假设我有一个具有100k行和列name的DataFrame 。 我想尽可能有效地将这个名字分成姓和名。 我目前的方法是,
def splitName(name): return pandas.Series(name.split()[0:2]) df[['first', 'last']] = df.apply(lambda x: splitName(x['name']), axis=1)不幸的是, DataFrame.apply真的很慢。 我能做些什么来使这个字符串操作几乎和一个numpy操作一样快?
谢谢!
Suppose I have a DataFrame with 100k rows and a column name. I would like to split this name into first and last name as efficiently as possibly. My current method is,
def splitName(name): return pandas.Series(name.split()[0:2]) df[['first', 'last']] = df.apply(lambda x: splitName(x['name']), axis=1)Unfortunately, DataFrame.apply is really, really slow. Is there anything I can do to make this string operation nearly as fast as a numpy operation?
Thanks!
最满意答案
尝试(要求熊猫> = 0.8.1):
splits = x['name'].split() df['first'] = splits.str[0] df['last'] = splits.str[1]Try (requires pandas >= 0.8.1):
splits = x['name'].split() df['first'] = splits.str[0] df['last'] = splits.str[1]更多推荐
发布评论