使用Python和Pandas对statsmodels.formula数据使用不同的列名称使用predict（）(Using predict() on statsmodels.formula data

编程入门行业动态更新时间:2024-10-26 04:23:17

使用Python和Pandas对statsmodels.formula数据使用不同的列名称使用predict（）(Using predict() on statsmodels.formula data with different column names using Python and Pandas)

我从运行statsmodels.formula.api.ols获得了一些回归结果。这是一个玩具示例：

import pandas as pd import numpy as np import statsmodels.formula.api as smf example_df = pd.DataFrame(np.random.randn(10, 3)) example_df.columns = ["a", "b", "c"] fit = smf.ols('a ~ b', example_df).fit()

我想将模型应用于列c ，但是这样做的天真尝试不起作用：

fit.predict(example_df["c"])

这是我得到的例外：

PatsyError: Error evaluating factor: NameError: name 'b' is not defined a ~ b ^

我可以做一些事情并创建一个新的临时DataFrame ，我在其中重命名感兴趣的列：

example_df2 = pd.DataFrame(example_df["c"]) example_df2.columns = ["b"] fit.predict(example_df2)

有更清洁的方法吗？（没有切换到statsmodels.api而不是statsmodels.formula.api ）

I've got some regressions results from running statsmodels.formula.api.ols. Here's a toy example:

I'd like to apply the model to column c, but a naive attempt to do so doesn't work:

fit.predict(example_df["c"])

Here's the exception I get:

PatsyError: Error evaluating factor: NameError: name 'b' is not defined a ~ b ^

I can do something gross and create a new, temporary DataFrame in which I rename the column of interest:

example_df2 = pd.DataFrame(example_df["c"]) example_df2.columns = ["b"] fit.predict(example_df2)

Is there a cleaner way to do this? (short of switching to statsmodels.api instead of statsmodels.formula.api)

最满意答案

你可以使用字典：

>>> fit.predict({"b": example_df["c"]}) array([ 0.84770672, -0.35968269, 1.19592387, -0.77487812, -0.98805215, 0.90584753, -0.15258093, 1.53721494, -0.26973941, 1.23996892])

或者为预测创建一个numpy数组，尽管如果存在明确的解释变量则会更加复杂：

>>> fit.predict(sm.add_constant(example_df["c"].values), transform=False) array([ 0.84770672, -0.35968269, 1.19592387, -0.77487812, -0.98805215, 0.90584753, -0.15258093, 1.53721494, -0.26973941, 1.23996892])

You can use a dictionary:

>>> fit.predict({"b": example_df["c"]}) array([ 0.84770672, -0.35968269, 1.19592387, -0.77487812, -0.98805215, 0.90584753, -0.15258093, 1.53721494, -0.26973941, 1.23996892])

or create a numpy array for the prediction, although that is much more complicated if there are categorical explanatory variables:

更多推荐

本文发布于:2023-08-07 23:55:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1466392.html