使用Python和Pandas对statsmodels.formula数据使用不同的列名称使用predict()(Using predict() on statsmodels.formula data

编程入门 行业动态 更新时间:2024-10-26 04:23:17
使用Python和Pandas对statsmodels.formula数据使用不同的列名称使用predict()(Using predict() on statsmodels.formula data with different column names using Python and Pandas)

我从运行statsmodels.formula.api.ols获得了一些回归结果。 这是一个玩具示例:

import pandas as pd import numpy as np import statsmodels.formula.api as smf example_df = pd.DataFrame(np.random.randn(10, 3)) example_df.columns = ["a", "b", "c"] fit = smf.ols('a ~ b', example_df).fit()

我想将模型应用于列c ,但是这样做的天真尝试不起作用:

fit.predict(example_df["c"])

这是我得到的例外:

PatsyError: Error evaluating factor: NameError: name 'b' is not defined a ~ b ^

我可以做一些事情并创建一个新的临时DataFrame ,我在其中重命名感兴趣的列:

example_df2 = pd.DataFrame(example_df["c"]) example_df2.columns = ["b"] fit.predict(example_df2)

有更清洁的方法吗? (没有切换到statsmodels.api而不是statsmodels.formula.api )

I've got some regressions results from running statsmodels.formula.api.ols. Here's a toy example:

import pandas as pd import numpy as np import statsmodels.formula.api as smf example_df = pd.DataFrame(np.random.randn(10, 3)) example_df.columns = ["a", "b", "c"] fit = smf.ols('a ~ b', example_df).fit()

I'd like to apply the model to column c, but a naive attempt to do so doesn't work:

fit.predict(example_df["c"])

Here's the exception I get:

PatsyError: Error evaluating factor: NameError: name 'b' is not defined a ~ b ^

I can do something gross and create a new, temporary DataFrame in which I rename the column of interest:

example_df2 = pd.DataFrame(example_df["c"]) example_df2.columns = ["b"] fit.predict(example_df2)

Is there a cleaner way to do this? (short of switching to statsmodels.api instead of statsmodels.formula.api)

最满意答案

你可以使用字典:

>>> fit.predict({"b": example_df["c"]}) array([ 0.84770672, -0.35968269, 1.19592387, -0.77487812, -0.98805215, 0.90584753, -0.15258093, 1.53721494, -0.26973941, 1.23996892])

或者为预测创建一个numpy数组,尽管如果存在明确的解释变量则会更加复杂:

>>> fit.predict(sm.add_constant(example_df["c"].values), transform=False) array([ 0.84770672, -0.35968269, 1.19592387, -0.77487812, -0.98805215, 0.90584753, -0.15258093, 1.53721494, -0.26973941, 1.23996892])

You can use a dictionary:

>>> fit.predict({"b": example_df["c"]}) array([ 0.84770672, -0.35968269, 1.19592387, -0.77487812, -0.98805215, 0.90584753, -0.15258093, 1.53721494, -0.26973941, 1.23996892])

or create a numpy array for the prediction, although that is much more complicated if there are categorical explanatory variables:

>>> fit.predict(sm.add_constant(example_df["c"].values), transform=False) array([ 0.84770672, -0.35968269, 1.19592387, -0.77487812, -0.98805215, 0.90584753, -0.15258093, 1.53721494, -0.26973941, 1.23996892])

更多推荐

本文发布于:2023-08-07 23:55:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1466392.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:名称   数据   formula   statsmodels   Python

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!