Classification - Ensemble (Bagging, Boosting, Random Forest) / ROCR
참고자료1 : Ensemble 기법
참고자료2 : ROCR, Lift chart
Ensemble
의사결정나무(Decision Tree)는 데이터의 작은 변화에 의해 예측 모델이 크게 변하는 불안정성이 있다.
주어진 자료로 여러 개의 예측 모델을 만들어 조합하여 하나의 최종 예측 모델을 만드는 방법을 앙상블(ensemble) 기법이라고 한다.
참고자료1-Bagging and Boosting
참고자료2-random forest
참고자료3-random forest and gradient boosting
1. Decision Tree - Ensemble - Bagging
주어진 데이터에서 여러 개의 bootstrap 자료를 생성 --> 각 자료에 대한 예측 모델 생성 --> 결합하여 최종 모델 결정
일반적으로 traing data의 모집단 분포를 모르기 때문에 실제 문제에서는 평균예측모델을 구할 수 없다.
배깅은 traing data를 모집단으로 생각하고 평균예측모델을 구하기 때문에 분산을 줄이고 예측력을 향상시킬 수 있다.
일반적으로 overfitting 된 모델일 경우 사용하면 좋다.
bootstrap : raw data 에서 랜덤 복원추출을 통해 만든 동일한 크기의 자료들
voting : 여러 개의 모델로부터 산출된 결과를 합쳐 다수결에 의해 최종 결과로 선택
# bootstrap data 생성
data_boot1 <- iris[sample(1:nrow(iris), replace = T), ]
data_boot2 <- iris[sample(1:nrow(iris), replace = T), ]
data_boot3 <- iris[sample(1:nrow(iris), replace = T), ]
data_boot4 <- iris[sample(1:nrow(iris), replace = T), ]
data_boot5 <- iris[sample(1:nrow(iris), replace = T), ]
# Modeling
tree1 <- ctree(Species ~ ., data_boot1)
tree2 <- ctree(Species ~ ., data_boot2)
tree3 <- ctree(Species ~ ., data_boot3)
tree4 <- ctree(Species ~ ., data_boot4)
tree5 <- ctree(Species ~ ., data_boot5)
plot(tree1)
plot(tree5)
pred1 <- predict(tree1, iris)
pred2 <- predict(tree2, iris)
pred3 <- predict(tree3, iris)
pred4 <- predict(tree4, iris)
pred5 <- predict(tree5, iris)
# 각각의 예측 결과를 취합
test <- data.frame(Species = iris$Species, pred1, pred2, pred3, pred4, pred5)
head(test)
## Species pred1 pred2 pred3 pred4 pred5
## 1 setosa setosa setosa setosa setosa setosa
## 2 setosa setosa setosa setosa setosa setosa
## 3 setosa setosa setosa setosa setosa setosa
## 4 setosa setosa setosa setosa setosa setosa
## 5 setosa setosa setosa setosa setosa setosa
## 6 setosa setosa setosa setosa setosa setosa
# 5개 분류기의 결과를 취합하여 최종 결과를 voting
funcResultValue <- function(x) {
result <- NULL
for (i in 1:nrow(x)) {
xtab <- table(t(x[i, ]))
rvalue <- names(sort(xtab, decreasing = T)[1])
result <- c(result, rvalue)
}
return(result)
}
test$result <- funcResultValue(test[ , 2:6])
confusionMatrix(test$result, test$Species)
## Confusion Matrix and Statistics
##
## Reference
## Prediction setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 47 1
## virginica 0 3 49
##
## Overall Statistics
##
## Accuracy : 0.9733
## 95% CI : (0.9331, 0.9927)
## No Information Rate : 0.3333
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.96
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: setosa Class: versicolor Class: virginica
## Sensitivity 1.0000 0.9400 0.9800
## Specificity 1.0000 0.9900 0.9700
## Pos Pred Value 1.0000 0.9792 0.9423
## Neg Pred Value 1.0000 0.9706 0.9898
## Prevalence 0.3333 0.3333 0.3333
## Detection Rate 0.3333 0.3133 0.3267
## Detection Prevalence 0.3333 0.3200 0.3467
## Balanced Accuracy 1.0000 0.9650 0.9750
2. Decision Tree - Ensemble - Boosting
예측력이 약한 모델들을 결합하여 강한 예측 모델을 만드는 방법. 훈련오차를 빨리 그리고 쉽게 줄일 수 있다.
잘못 분류된 데이터에 가중치를 주어서 더 잘 분류하는 것이 목적.
단점 : 2차, 3차 분류기에 들어가는 데이터는 기존 데이터의 일부만 적용되므로 traing data의 규모가 커야 한다.
Adaboost : 이진분류 문제에서 랜덤 분류기보다 조금 더 좋은 분류기 n개에 가중치를 설정하고 이를 결합하여 최종 분류기를 만듬.
library(tree)
data(kyphosis, package = "rpart") # 척추교정 수술을 받은 어린이 데이터.
data <- kyphosis
head(data)
## Kyphosis Age Number Start
## 1 absent 71 3 5
## 2 absent 158 3 14
## 3 present 128 4 5
## 4 absent 2 5 1
## 5 absent 1 4 15
## 6 absent 1 2 16
totalCount <- nrow(data)
totalCount
## [1] 81
boost <- function(k, compare) {
# 첫번째 표본 추출 확률을 모두 동일하게 설정
pr <- rep(1/totalCount, totalCount)
# 결과에 대한 확률 및 모델의 정확도를 저장할 객체
result <- matrix(0, k, 3) # k row 3 col
# k개 만큼 tree model 생성
for (j in 1:k) {
# 배깅과 달리 각 인덱스에 설정된 확률로 샘플링
data.boost <- data[sample(1:totalCount, prob = pr, replace = T), ]
# 샘플링 데이터에 대한 tree 생성
data.tree <- tree(Kyphosis ~ ., data.boost)
# 각 row에 대한 예측을 저장할 객체
pred <- matrix(0, totalCount, 1)
for (i in 1:totalCount) {
# predict - absent / present 확률
if (predict(data.tree, data[i, ])[ , 1] > 0.5) {
pred[i, 1] <- "absent"
} else {
pred[i, 1] <- "present"
}
}
# test data (compare) 한 개에 대한 예측 확률
result[j, 1] <- predict(data.tree, compare)[ , 1]
result[j, 2] <- predict(data.tree, compare)[ , 2]
result[j, 3] <- length(which(as.matrix(data)[ , 1] == pred)) / totalCount # 정확도
pr <- rep(1/totalCount, totalCount)
# 오분류 표본의 확률을 2배로 설정하여 2번째 loop 수행
pr[as.matrix(data)[ , 1] != pred] <- 2/totalCount
}
return(result)
}
# 80번째 데이터로 10회 반복해서 측정
boost.result <- boost(10, data[80, ])
boost.result
## [,1] [,2] [,3]
## [1,] 1.0000000 0.0000000 0.8641975
## [2,] 0.5555556 0.4444444 0.8148148
## [3,] 0.1000000 0.9000000 0.7283951
## [4,] 0.1428571 0.8571429 0.8271605
## [5,] 0.2857143 0.7142857 0.8271605
## [6,] 0.0000000 1.0000000 0.8271605
## [7,] 0.0000000 1.0000000 0.8518519
## [8,] 0.6000000 0.4000000 0.8271605
## [9,] 0.0000000 1.0000000 0.8518519
## [10,] 1.0000000 0.0000000 0.8518519
a <- t(boost.result[,1])%*%(boost.result[,3]) # absent 확률
b <- t(boost.result[,2])%*%(boost.result[,3]) # present 확률
a;b
## [,1]
## [1,] 3.092357
## [,1]
## [1,] 5.179248
# b가 a 보다 확률이 높기 때문에 80번째 데이터는 present로 최종 예측
3. Decision Tree - Ensemble - Random Forest
bagging, boosting 보다 더 많은 무작위성을 주어 모델을 생성한 후 이를 선형 결합하여 최종 모델을 만드는 방법.
입력 변수가 아주 많은 경우에도 변수 제거없이 실행 가능.
최종 결과에 대한 해석이 어렵다는 단점이 있지만 좋은 예측력을 보인다.
idx <- sample(2, nrow(iris), replace = T, prob = c(0.7, 0.3))
trainData <- iris[idx == 1, ]
testData <- iris[idx == 2, ]
library(randomForest)
# ntree = 100 : 100 개의 tree 만듬.
# proximity = T : 다양한 트리 분할 시도
model <- randomForest(Species ~ ., data = trainData, ntree = 100, proximity = T)
model
##
## Call:
## randomForest(formula = Species ~ ., data = trainData, ntree = 100, proximity = T)
## Type of random forest: classification
## Number of trees: 100
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 3.85%
## Confusion matrix:
## setosa versicolor virginica class.error
## setosa 38 0 0 0.00000000
## versicolor 0 31 1 0.03125000
## virginica 0 3 31 0.08823529
table(trainData$Species, predict(model))
##
## setosa versicolor virginica
## setosa 38 0 0
## versicolor 0 31 1
## virginica 0 3 31
importance(model) # 지니계수. 값이 높은 변수가 클래스를 분류하는데 가장 큰 영향을 줌.
## MeanDecreaseGini
## Sepal.Length 5.408025
## Sepal.Width 1.272408
## Petal.Length 30.947899
## Petal.Width 30.721860
plot(model, main = "randomForest model of iris")
# tree가 40개 이상일 경우 오차가 안정적으로 나타난다.
varImpPlot(model) # 변수의 상대적 중요도를 표시
pred <- predict(model, newdata = testData)
table(testData$Species, pred)
## pred
## setosa versicolor virginica
## setosa 12 0 0
## versicolor 0 16 2
## virginica 0 0 16
plot(margin(model, testData$Species))
ROCR
library(C50)
library(ROCR)
data(churn) # C50 dataset. 서비스 제공자를 바꾸는 고객.
summary(churnTrain)
## state account_length area_code international_plan
## WV : 106 Min. : 1.0 area_code_408: 838 no :3010
## MN : 84 1st Qu.: 74.0 area_code_415:1655 yes: 323
## NY : 83 Median :101.0 area_code_510: 840
## AL : 80 Mean :101.1
## OH : 78 3rd Qu.:127.0
## OR : 78 Max. :243.0
## (Other):2824
## voice_mail_plan number_vmail_messages total_day_minutes total_day_calls
## no :2411 Min. : 0.000 Min. : 0.0 Min. : 0.0
## yes: 922 1st Qu.: 0.000 1st Qu.:143.7 1st Qu.: 87.0
## Median : 0.000 Median :179.4 Median :101.0
## Mean : 8.099 Mean :179.8 Mean :100.4
## 3rd Qu.:20.000 3rd Qu.:216.4 3rd Qu.:114.0
## Max. :51.000 Max. :350.8 Max. :165.0
##
## total_day_charge total_eve_minutes total_eve_calls total_eve_charge
## Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0.00
## 1st Qu.:24.43 1st Qu.:166.6 1st Qu.: 87.0 1st Qu.:14.16
## Median :30.50 Median :201.4 Median :100.0 Median :17.12
## Mean :30.56 Mean :201.0 Mean :100.1 Mean :17.08
## 3rd Qu.:36.79 3rd Qu.:235.3 3rd Qu.:114.0 3rd Qu.:20.00
## Max. :59.64 Max. :363.7 Max. :170.0 Max. :30.91
##
## total_night_minutes total_night_calls total_night_charge
## Min. : 23.2 Min. : 33.0 Min. : 1.040
## 1st Qu.:167.0 1st Qu.: 87.0 1st Qu.: 7.520
## Median :201.2 Median :100.0 Median : 9.050
## Mean :200.9 Mean :100.1 Mean : 9.039
## 3rd Qu.:235.3 3rd Qu.:113.0 3rd Qu.:10.590
## Max. :395.0 Max. :175.0 Max. :17.770
##
## total_intl_minutes total_intl_calls total_intl_charge
## Min. : 0.00 Min. : 0.000 Min. :0.000
## 1st Qu.: 8.50 1st Qu.: 3.000 1st Qu.:2.300
## Median :10.30 Median : 4.000 Median :2.780
## Mean :10.24 Mean : 4.479 Mean :2.765
## 3rd Qu.:12.10 3rd Qu.: 6.000 3rd Qu.:3.270
## Max. :20.00 Max. :20.000 Max. :5.400
##
## number_customer_service_calls churn
## Min. :0.000 yes: 483
## 1st Qu.:1.000 no :2850
## Median :1.000
## Mean :1.563
## 3rd Qu.:2.000
## Max. :9.000
##
# Modeling
c5_options <- C5.0Control(winnow = FALSE, noGlobalPruning = FALSE)
model <- C5.0(churn ~ ., data = churnTrain, control = c5_options, rules = FALSE)
summary(model)
##
## Call:
## C5.0.formula(formula = churn ~ ., data = churnTrain, control =
## c5_options, rules = FALSE)
##
##
## C5.0 [Release 2.07 GPL Edition] Sun Mar 19 20:59:53 2017
## -------------------------------
##
## Class specified by attribute `outcome'
##
## Read 3333 cases (20 attributes) from undefined.data
##
## Decision tree:
##
## total_day_minutes > 264.4:
## :...voice_mail_plan = yes:
## : :...international_plan = no: no (45/1)
## : : international_plan = yes: yes (8/3)
## : voice_mail_plan = no:
## : :...total_eve_minutes > 187.7:
## : :...total_night_minutes > 126.9: yes (94/1)
## : : total_night_minutes <= 126.9:
## : : :...total_day_minutes <= 277: no (4)
## : : total_day_minutes > 277: yes (3)
## : total_eve_minutes <= 187.7:
## : :...total_eve_charge <= 12.26: no (15/1)
## : total_eve_charge > 12.26:
## : :...total_day_minutes <= 277:
## : :...total_night_minutes <= 224.8: no (13)
## : : total_night_minutes > 224.8: yes (5/1)
## : total_day_minutes > 277:
## : :...total_night_minutes > 151.9: yes (18)
## : total_night_minutes <= 151.9:
## : :...account_length <= 123: no (4)
## : account_length > 123: yes (2)
## total_day_minutes <= 264.4:
## :...number_customer_service_calls > 3:
## :...total_day_minutes <= 160.2:
## : :...total_eve_charge <= 19.83: yes (79/3)
## : : total_eve_charge > 19.83:
## : : :...total_day_minutes <= 120.5: yes (10)
## : : total_day_minutes > 120.5: no (13/3)
## : total_day_minutes > 160.2:
## : :...total_eve_charge > 12.05: no (130/24)
## : total_eve_charge <= 12.05:
## : :...total_eve_calls <= 125: yes (16/2)
## : total_eve_calls > 125: no (3)
## number_customer_service_calls <= 3:
## :...international_plan = yes:
## :...total_intl_calls <= 2: yes (51)
## : total_intl_calls > 2:
## : :...total_intl_minutes <= 13.1: no (173/7)
## : total_intl_minutes > 13.1: yes (43)
## international_plan = no:
## :...total_day_minutes <= 223.2: no (2221/60)
## total_day_minutes > 223.2:
## :...total_eve_charge <= 20.5: no (295/22)
## total_eve_charge > 20.5:
## :...voice_mail_plan = yes: no (20)
## voice_mail_plan = no:
## :...total_night_minutes > 174.2: yes (50/8)
## total_night_minutes <= 174.2:
## :...total_day_minutes <= 246.6: no (12)
## total_day_minutes > 246.6:
## :...total_day_charge <= 43.33: yes (4)
## total_day_charge > 43.33: no (2)
##
##
## Evaluation on training data (3333 cases):
##
## Decision Tree
## ----------------
## Size Errors
##
## 27 136( 4.1%) <<
##
##
## (a) (b) <-classified as
## ---- ----
## 365 118 (a): class yes
## 18 2832 (b): class no
##
##
## Attribute usage:
##
## 100.00% total_day_minutes
## 93.67% number_customer_service_calls
## 87.73% international_plan
## 20.73% total_eve_charge
## 8.97% voice_mail_plan
## 8.01% total_intl_calls
## 6.48% total_intl_minutes
## 6.33% total_night_minutes
## 4.74% total_eve_minutes
## 0.57% total_eve_calls
## 0.18% account_length
## 0.18% total_day_charge
##
##
## Time: 0.1 secs
# 가지가 너무 많아서 Attribute usage 가 작은 변수 제거하고 다시 모델링
drops <- c("total_day_charge", "account_length", "total_eve_calls", "total_day_calls", "total_eve_minutes")
churnTrain2 <- churnTrain[, !(names(churnTrain) %in% drops)]
model2 <- C5.0(churn ~ ., data = churnTrain2, control = c5_options, rules = FALSE)
summary(model2)
##
## Call:
## C5.0.formula(formula = churn ~ ., data = churnTrain2, control
## = c5_options, rules = FALSE)
##
##
## C5.0 [Release 2.07 GPL Edition] Sun Mar 19 20:59:53 2017
## -------------------------------
##
## Class specified by attribute `outcome'
##
## Read 3333 cases (15 attributes) from undefined.data
##
## Decision tree:
##
## total_day_minutes > 264.4:
## :...voice_mail_plan = yes:
## : :...international_plan = no: no (45/1)
## : : international_plan = yes:
## : : :...total_day_minutes <= 275.4: yes (4)
## : : total_day_minutes > 275.4: no (4/1)
## : voice_mail_plan = no:
## : :...total_eve_charge > 15.95:
## : :...total_night_minutes > 126.9: yes (94/1)
## : : total_night_minutes <= 126.9:
## : : :...total_day_minutes <= 277: no (4)
## : : total_day_minutes > 277: yes (3)
## : total_eve_charge <= 15.95:
## : :...total_eve_charge <= 12.26: no (15/1)
## : total_eve_charge > 12.26:
## : :...total_day_minutes <= 277:
## : :...total_night_minutes <= 224.8: no (13)
## : : total_night_minutes > 224.8: yes (5/1)
## : total_day_minutes > 277:
## : :...total_night_minutes <= 151.9: no (6/2)
## : total_night_minutes > 151.9: yes (18)
## total_day_minutes <= 264.4:
## :...number_customer_service_calls > 3:
## :...total_day_minutes > 160.2:
## : :...total_eve_charge <= 12.05: yes (19/5)
## : : total_eve_charge > 12.05: no (130/24)
## : total_day_minutes <= 160.2:
## : :...total_eve_charge <= 19.83: yes (79/3)
## : total_eve_charge > 19.83:
## : :...total_day_minutes <= 120.5: yes (10)
## : total_day_minutes > 120.5: no (13/3)
## number_customer_service_calls <= 3:
## :...international_plan = yes:
## :...total_intl_calls <= 2: yes (51)
## : total_intl_calls > 2:
## : :...total_intl_minutes <= 13.1: no (173/7)
## : total_intl_minutes > 13.1: yes (43)
## international_plan = no:
## :...total_day_minutes <= 223.2: no (2221/60)
## total_day_minutes > 223.2:
## :...total_eve_charge <= 20.5: no (295/22)
## total_eve_charge > 20.5:
## :...voice_mail_plan = yes: no (20)
## voice_mail_plan = no:
## :...total_night_minutes > 174.2: yes (50/8)
## total_night_minutes <= 174.2:
## :...total_day_minutes <= 246.6: no (12)
## total_day_minutes > 246.6:
## :...total_day_minutes <= 254.9: yes (4)
## total_day_minutes > 254.9: no (2)
##
##
## Evaluation on training data (3333 cases):
##
## Decision Tree
## ----------------
## Size Errors
##
## 26 139( 4.2%) <<
##
##
## (a) (b) <-classified as
## ---- ----
## 362 121 (a): class yes
## 18 2832 (b): class no
##
##
## Attribute usage:
##
## 100.00% total_day_minutes
## 93.67% number_customer_service_calls
## 87.73% international_plan
## 23.76% total_eve_charge
## 8.97% voice_mail_plan
## 8.01% total_intl_calls
## 6.48% total_intl_minutes
## 6.33% total_night_minutes
##
##
## Time: 0.1 secs
plot(model2, type = "simple")
pred_train <- predict(model2, churnTrain2, type = "class")
confusionMatrix(pred_train, churnTrain2$churn)
## Confusion Matrix and Statistics
##
## Reference
## Prediction yes no
## yes 362 18
## no 121 2832
##
## Accuracy : 0.9583
## 95% CI : (0.9509, 0.9648)
## No Information Rate : 0.8551
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8154
## Mcnemar's Test P-Value : < 2.2e-16
##
## Sensitivity : 0.7495
## Specificity : 0.9937
## Pos Pred Value : 0.9526
## Neg Pred Value : 0.9590
## Prevalence : 0.1449
## Detection Rate : 0.1086
## Detection Prevalence : 0.1140
## Balanced Accuracy : 0.8716
##
## 'Positive' Class : yes
##
# Test
head(churnTest)
## state account_length area_code international_plan voice_mail_plan
## 1 HI 101 area_code_510 no no
## 2 MT 137 area_code_510 no no
## 3 OH 103 area_code_408 no yes
## 4 NM 99 area_code_415 no no
## 5 SC 108 area_code_415 no no
## 6 IA 117 area_code_415 no no
## number_vmail_messages total_day_minutes total_day_calls total_day_charge
## 1 0 70.9 123 12.05
## 2 0 223.6 86 38.01
## 3 29 294.7 95 50.10
## 4 0 216.8 123 36.86
## 5 0 197.4 78 33.56
## 6 0 226.5 85 38.51
## total_eve_minutes total_eve_calls total_eve_charge total_night_minutes
## 1 211.9 73 18.01 236.0
## 2 244.8 139 20.81 94.2
## 3 237.3 105 20.17 300.3
## 4 126.4 88 10.74 220.6
## 5 124.0 101 10.54 204.5
## 6 141.6 68 12.04 223.0
## total_night_calls total_night_charge total_intl_minutes total_intl_calls
## 1 73 10.62 10.6 3
## 2 81 4.24 9.5 7
## 3 127 13.51 13.7 6
## 4 82 9.93 15.7 2
## 5 107 9.20 7.7 4
## 6 90 10.04 6.9 5
## total_intl_charge number_customer_service_calls churn
## 1 2.86 3 no
## 2 2.57 0 no
## 3 3.70 1 no
## 4 4.24 1 no
## 5 2.08 2 no
## 6 1.86 1 no
churnTest$pred <- predict(model2, churnTest, type = "class") # 예측결과
churnTest$pred_prob <- predict(model2, churnTest, type = "prob") # 확률
confusionMatrix(churnTest$pred, churnTest$churn)
## Confusion Matrix and Statistics
##
## Reference
## Prediction yes no
## yes 146 9
## no 78 1434
##
## Accuracy : 0.9478
## 95% CI : (0.936, 0.958)
## No Information Rate : 0.8656
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.7421
## Mcnemar's Test P-Value : 3.091e-13
##
## Sensitivity : 0.65179
## Specificity : 0.99376
## Pos Pred Value : 0.94194
## Neg Pred Value : 0.94841
## Prevalence : 0.13437
## Detection Rate : 0.08758
## Detection Prevalence : 0.09298
## Balanced Accuracy : 0.82277
##
## 'Positive' Class : yes
##
# Model Evaluation by ROCR chart
head(churnTest$pred_prob)
## yes no
## 1 0.02706792 0.9729321
## 2 0.01114727 0.9888527
## 3 0.02488945 0.9751106
## 4 0.02706792 0.9729321
## 5 0.02706792 0.9729321
## 6 0.07481390 0.9251861
c5_pred <- prediction(churnTest$pred_prob[, "yes"], churnTest$churn)
c5_model.perf <- performance(c5_pred, "tpr", "fpr")
# True positive rate (tpr) = Sensitivity
# False positive rate (fpr) = 1 - Specificity
c5_model.perf
## An object of class "performance"
## Slot "x.name":
## [1] "False positive rate"
##
## Slot "y.name":
## [1] "True positive rate"
##
## Slot "alpha.name":
## [1] "Cutoff"
##
## Slot "x.values":
## [[1]]
## [1] 0.000000000 0.000000000 0.000000000 0.000000000 0.001386001
## [6] 0.002079002 0.002079002 0.002772003 0.002772003 0.002772003
## [11] 0.005544006 0.006237006 0.006237006 0.006930007 0.009009009
## [16] 0.049203049 0.158697159 0.169092169 0.227304227 0.228690229
## [21] 0.975744976 0.989604990 0.992376992 0.995148995 1.000000000
##
##
## Slot "y.values":
## [[1]]
## [1] 0.0000000 0.1026786 0.1830357 0.3616071 0.4017857 0.5491071 0.5625000
## [8] 0.5669643 0.6294643 0.6339286 0.6428571 0.6517857 0.6562500 0.6562500
## [15] 0.6696429 0.7410714 0.7991071 0.8258929 0.8392857 0.8392857 0.9821429
## [22] 0.9866071 0.9955357 1.0000000 1.0000000
##
##
## Slot "alpha.values":
## [[1]]
## [1] Inf 0.98355605 0.98056624 0.98047278 0.95499550 0.95181143
## [7] 0.92226496 0.82898290 0.82637087 0.78622863 0.70724572 0.69081909
## [13] 0.30641636 0.22898290 0.22463675 0.18431233 0.07481390 0.07155716
## [19] 0.04106273 0.02898290 0.02706792 0.02488945 0.01114727 0.01035104
## [25] 0.00690069
plot(c5_model.perf, col = "red")
AUROC <- performance(c5_pred, "auc")
AUROC@y.values
## [[1]]
## [1] 0.8804094
# 0.8804094 : Good
c5_model.lift <- performance(c5_pred, "lift", "rpp") # rpp : Rate of positive predictions
plot(c5_model.lift, col = "red")