我在pandas中有一个数据框,在这里我使用python中的fuzzywuzzy包将数据框中的第一列与第二列进行匹配.
I have a dataframe in pandas where I am using fuzzywuzzy package in python to match first column in the dataframe with second column.
我已经定义了一个函数来创建具有第一列,第二列和部分比率得分的输出.但这不起作用.
I have defined a function to create an output with first column, second column and partial ratio score. But it is not working.
可以请你帮忙
import csv import sys import os import numpy as np import pandas as pd from fuzzywuzzy import fuzz from fuzzywuzzy import process def match(driver): driver["score"]=driver.apply(lambda row: fuzz.partial_ratio(row driver[driver.columns[0]], driver[driver.columns[1]]), axis=1) print(driver) return(driver)致谢
-算盘
推荐答案您已在apply函数内部传递了一个可使用的Series,在此表示当前行.在您的代码中,您实际上是在忽略此Series,并尝试每次用DataFrame的两整列(driver[col])调用partial_ratio.
You're passed a Series to work with inside the apply function, representing the current row here. In your code, you're effectively ignoring this Series and trying to call partial_ratio with the two whole columns of the DataFrame each time (driver[col]).
对代码进行较小的更改有望为您提供所需的内容.
A minor change to your code should hopefully give you what you want.
d = DataFrame({'one': ['fuzz', 'wuzz'], 'two': ['fizz', 'woo']}) d.apply(lambda s: fuzz.partial_ratio(s['one'], s['two']), axis=1) 0 75 1 33 dtype: int64(有趣的是,partial_ratio函数将接受Series作为输入,但这仅是因为它在内部将其转换为字符串.:)
(Interestingly, the partial_ratio function will accept a Series as input, but only because it converts it internally into a string. :)
更多推荐
使用Fuzzywuzzy在数据框中创建新列
发布评论