我从运行statsmodels.formula.api.ols获得了一些回归结果。 这是一个玩具示例:
import pandas as pd import numpy as np import statsmodels.formula.api as smf example_df = pd.DataFrame(np.random.randn(10, 3)) example_df.columns = ["a", "b", "c"] fit = smf.ols('a ~ b', example_df).fit()我想将模型应用于列c ,但是这样做的天真尝试不起作用:
fit.predict(example_df["c"])这是我得到的例外:
PatsyError: Error evaluating factor: NameError: name 'b' is not defined a ~ b ^我可以做一些事情并创建一个新的临时DataFrame ,我在其中重命名感兴趣的列:
example_df2 = pd.DataFrame(example_df["c"]) example_df2.columns = ["b"] fit.predict(example_df2)有更清洁的方法吗? (没有切换到statsmodels.api而不是statsmodels.formula.api )
I've got some regressions results from running statsmodels.formula.api.ols. Here's a toy example:
import pandas as pd import numpy as np import statsmodels.formula.api as smf example_df = pd.DataFrame(np.random.randn(10, 3)) example_df.columns = ["a", "b", "c"] fit = smf.ols('a ~ b', example_df).fit()I'd like to apply the model to column c, but a naive attempt to do so doesn't work:
fit.predict(example_df["c"])Here's the exception I get:
PatsyError: Error evaluating factor: NameError: name 'b' is not defined a ~ b ^I can do something gross and create a new, temporary DataFrame in which I rename the column of interest:
example_df2 = pd.DataFrame(example_df["c"]) example_df2.columns = ["b"] fit.predict(example_df2)Is there a cleaner way to do this? (short of switching to statsmodels.api instead of statsmodels.formula.api)
最满意答案
你可以使用字典:
>>> fit.predict({"b": example_df["c"]}) array([ 0.84770672, -0.35968269, 1.19592387, -0.77487812, -0.98805215, 0.90584753, -0.15258093, 1.53721494, -0.26973941, 1.23996892])或者为预测创建一个numpy数组,尽管如果存在明确的解释变量则会更加复杂:
>>> fit.predict(sm.add_constant(example_df["c"].values), transform=False) array([ 0.84770672, -0.35968269, 1.19592387, -0.77487812, -0.98805215, 0.90584753, -0.15258093, 1.53721494, -0.26973941, 1.23996892])You can use a dictionary:
>>> fit.predict({"b": example_df["c"]}) array([ 0.84770672, -0.35968269, 1.19592387, -0.77487812, -0.98805215, 0.90584753, -0.15258093, 1.53721494, -0.26973941, 1.23996892])or create a numpy array for the prediction, although that is much more complicated if there are categorical explanatory variables:
>>> fit.predict(sm.add_constant(example_df["c"].values), transform=False) array([ 0.84770672, -0.35968269, 1.19592387, -0.77487812, -0.98805215, 0.90584753, -0.15258093, 1.53721494, -0.26973941, 1.23996892])更多推荐
发布评论