我正在使用R编程语言.我使用了"rpart"库并使用一些数据拟合决策树:
来自上一个问题的 #:stackoverflow/questions/65678552/r-changing-plot-sizes库(rpart)car.test.frame $ Reliability = as.factor(car.test.frame $ Reliability)z.auto<-rpart(可靠性〜.,car.test.frame)情节(z.auto)文字(z.auto,use.n = TRUE,xpd = TRUE,cex = .8)很好,但是我正在寻找一种更简单的方法来汇总该树的结果,以防树变得太大,太复杂和混乱(并且无法可视化).我在这里找到了另一个stackoverflow帖子,其中显示了如何获取规则列表:
I am using the R programming language. I used the "rpart" library and fit a decision tree using some data:
#from a previous question : stackoverflow/questions/65678552/r-changing-plot-sizes library(rpart) car.test.frame$Reliability = as.factor(car.test.frame$Reliability) z.auto <- rpart(Reliability ~ ., car.test.frame) plot(z.auto) text(z.auto, use.n=TRUE, xpd=TRUE, cex=.8)This is good, but I am looking for an easier way to summarize the results of this tree in case the tree becomes too big, complicated and cluttered (and impossible to visualize). I found another stackoverflow post over here that shows how to obtain a listing of rules: Extracting Information from the Decision Rules in rpart package
library(party) library(partykit) party_obj <- as.party.rpart(z.auto, data = TRUE) decisions <- partykit:::.list.rules.party(party_obj) cat(paste(decisions, collapse = "\n"))This returns the following list of rules (each line is a rule corresponding to the plot of "z.auto"):
Country %in% c("NA", "Germany", "Korea", "Mexico", "Sweden", "USA") & Weight >= 3167.5 Country %in% c("NA", "Germany", "Korea", "Mexico", "Sweden", "USA") & Weight < 3167.5 Country %in% c("NA", "Japan", "Japan/USA")>However, from this list, it is not possible to know which rule results in which value of "Reliability". For the time being, I am manually interpreting the tree and manually tracing each rule to the result, but is there a way to add to each line "the corresponding value of reliability"?
e.g. Is it possible to produce something like this?
Country %in% c("NA", "Germany", "Korea", "Mexico", "Sweden", "USA") & Weight >= 3167.5 then reliability = 3,7,4,0(note1: I am also not sure why the countries are appearing as "befgh" instead of their actual names.
note2: I am aware that there is a library "rpart.plot" that has a simpler way of obtaining these rules. However, I am using a computer that does not have internet access or a usb port, therefore I can not download the rpart.plot library. I have R with a few preloaded packages. I am trying to obtain the decision rules using libraries such as rpart, dplyr, purr, party, partykit, functions from base R)
Thanks
解决方案This isn't my area of expertise, but perhaps this function (from www.togaware/datamining/survivor/Convert_Tree.html) will do what you want to do:
library(rpart) car.test.frame$Reliability = as.factor(car.test.frame$Reliability) z.auto <- rpart(Reliability ~ ., car.test.frame) plot(z.auto, margin = 0.25) text(z.auto, pretty = TRUE, cex = 0.8, splits = TRUE, use.n = TRUE, all = FALSE) list.rules.rpart <- function(model) { if (!inherits(model, "rpart")) stop("Not a legitimate rpart tree") # # Get some information. # frm <- model$frame names <- row.names(frm) ylevels <- attr(model, "ylevels") ds.size <- model$frame[1,]$n # # Print each leaf node as a rule. # for (i in 1:nrow(frm)) { if (frm[i,1] == "<leaf>") { # The following [,5] is hardwired - needs work! cat("\n") cat(sprintf(" Rule number: %s ", names[i])) cat(sprintf("[yval=%s cover=%d (%.0f%%) prob=%0.2f]\n", ylevels[frm[i,]$yval], frm[i,]$n, round(100*frm[i,]$n/ds.size), frm[i,]$yval2[,5])) pth <- path.rpart(model, nodes=as.numeric(names[i]), print.it=FALSE) cat(sprintf(" %s\n", unlist(pth)[-1]), sep="") } } } list.rules.rpart(z.auto) >Rule number: 4 [yval=3 cover=10 (20%) prob=0.00] > Country=Germany,Korea,Mexico,Sweden,USA > Weight>=3168 > > Rule number: 5 [yval=2 cover=18 (37%) prob=4.00] > Country=Germany,Korea,Mexico,Sweden,USA > Weight< 3168 > > Rule number: 3 [yval=5 cover=21 (43%) prob=2.00] > Country=Japan,Japan/USA更多推荐
R:从函数获取规则
发布评论