## Lending Club Discussion => Investors - LC => Topic started by: rawraw on June 06, 2014, 09:01:54 AM

Title: Regression Trees
Post by: rawraw on June 06, 2014, 09:01:54 AM
Has anyone here put LC data through Regression Trees?  I just came across this stuff on a work assignment and they used it on a portfolio of credit to find advance relationships that would be too difficult to detect without big data technology.  I'm considering signing up for the free trial and running LC data through it to see what it shows but was curious if anyone here had done it or had experience with it. And yes, I'm looking at you Fred, AnilG, and BryceMason.

Examples:
http://www.mu-sigma.com/
http://www.salford-systems.com/
http://www.angoss.com/

Title: Re: Regression Trees
Post by: gamassey on June 06, 2014, 09:33:58 AM
I have always thought that a neural net was the best way to build loan prediction systems.

http://www.r-bloggers.com/using-neural-networks-for-credit-scoring-a-simple-example/ (http://www.r-bloggers.com/using-neural-networks-for-credit-scoring-a-simple-example/)
Title: Re: Regression Trees
Post by: AnilG on June 06, 2014, 12:06:19 PM
PeerCube uses Decision Tree/Regression Tree for splitting the loan attributes before calculating BLE Risk Index. If you are familiar with R, you can use it with 'party' or 'rpart' packages. As the example below and attached shows, even with three loan attributes and ROI, the analysis gets complex.

Personally, I suggest lenders look into Genetic Algorithm as described by David M. Patierno http://blog.dmpatierno.com/post/3161338411/lending-club-genetic-algorithm. PeerCube has it as Public Loan Filter 'DMP Genetic Algorithm'.

Code: [Select]
`> print(loandp_ctree) Conditional inference tree with 29 terminal nodesResponse:  roi Inputs:  annual_inc, dti, loan_amnt Number of observations:  56129 1) dti <= 24.84; criterion = 1, statistic = 986.644  2) loan_amnt <= 15150; criterion = 1, statistic = 472.029    3) dti <= 17.95; criterion = 1, statistic = 187.221      4) annual_inc <= 40310; criterion = 1, statistic = 62.871        5) loan_amnt <= 7500; criterion = 1, statistic = 45.188          6) dti <= 8.48; criterion = 1, statistic = 15.918            7)*  weights = 2005           6) dti > 8.48            8) annual_inc <= 23404.1; criterion = 0.99, statistic = 8.542              9)*  weights = 678             8) annual_inc > 23404.1              10)*  weights = 2138         5) loan_amnt > 7500          11) annual_inc <= 30680; criterion = 0.997, statistic = 10.895            12)*  weights = 939           11) annual_inc > 30680            13)*  weights = 1667       4) annual_inc > 40310        14) dti <= 10.78; criterion = 1, statistic = 21.98          15) loan_amnt <= 9150; criterion = 0.993, statistic = 9.285            16)*  weights = 5602           15) loan_amnt > 9150            17)*  weights = 4552         14) dti > 10.78          18) annual_inc <= 61920; criterion = 1, statistic = 18.053            19)*  weights = 4426           18) annual_inc > 61920            20)*  weights = 5270     3) dti > 17.95      21) annual_inc <= 40040; criterion = 1, statistic = 67.193        22) loan_amnt <= 8700; criterion = 1, statistic = 33.864          23)*  weights = 2318         22) loan_amnt > 8700          24) annual_inc <= 28700; criterion = 0.951, statistic = 5.731            25)*  weights = 121           24) annual_inc > 28700            26)*  weights = 1058       21) annual_inc > 40040        27) annual_inc <= 89000; criterion = 1, statistic = 17.068          28)*  weights = 5719         27) annual_inc > 89000          29)*  weights = 968   2) loan_amnt > 15150    30) annual_inc <= 67500; criterion = 1, statistic = 118.866      31) loan_amnt <= 25475; criterion = 1, statistic = 15.061        32) annual_inc <= 50350; criterion = 1, statistic = 24.175          33)*  weights = 1916         32) annual_inc > 50350          34) dti <= 24.15; criterion = 0.993, statistic = 9.243            35)*  weights = 2527           34) dti > 24.15            36)*  weights = 86       31) loan_amnt > 25475        37)*  weights = 322     30) annual_inc > 67500      38) loan_amnt <= 25050; criterion = 1, statistic = 104.304        39) annual_inc <= 101004; criterion = 1, statistic = 23.78          40)*  weights = 4034         39) annual_inc > 101004          41)*  weights = 2626       38) loan_amnt > 25050        42) annual_inc <= 94800; criterion = 1, statistic = 20.277          43)*  weights = 998         42) annual_inc > 94800          44) annual_inc <= 189000; criterion = 0.982, statistic = 7.495            45)*  weights = 1429           44) annual_inc > 189000            46)*  weights = 306 1) dti > 24.84  47) dti <= 29.94; criterion = 1, statistic = 26.847    48) loan_amnt <= 19500; criterion = 1, statistic = 18.034      49) annual_inc <= 64500; criterion = 1, statistic = 14.775        50)*  weights = 1976       49) annual_inc > 64500        51)*  weights = 550     48) loan_amnt > 19500      52) annual_inc <= 51690.48; criterion = 0.989, statistic = 8.394        53)*  weights = 84       52) annual_inc > 51690.48        54)*  weights = 632   47) dti > 29.94    55) annual_inc <= 47500; criterion = 0.999, statistic = 13.619      56)*  weights = 530     55) annual_inc > 47500      57)*  weights = 652 `
Title: Re: Regression Trees
Post by: brycemason on June 06, 2014, 01:58:37 PM
It can work well, but I'm not a fan. I prefer to understand the reasons why we see relationships, and the computer finding the best way to slide down regression trees / random forests like a game of Plinko on the Price is Right just doesn't satisfy my curiosity. There are many ways to skin a cat.
Title: Re: Regression Trees
Post by: rawraw on June 06, 2014, 07:16:27 PM
It can work well, but I'm not a fan. I prefer to understand the reasons why we see relationships, and the computer finding the best way to slide down regression trees / random forests like a game of Plinko on the Price is Right just doesn't satisfy my curiosity. There are many ways to skin a cat.
Well the way the person used it wasn't necessarily any different than your approach.  He basically said everyone has biases and they look at the same sort of stuff the same sort of way.  He used it to find splits that he wouldn't have necessarily predicted -- then he did analysis to determine what was the reason and did he think it was useful.  In one example, he found the fact that some people in a higher risk population actually had the same probability of default as the low risk population if they had a certain variable.  He went and talked to the production staff to figure out why that was -- then they were able to incorporate that into their models they built themselves.

Quote
I have always thought that a neural net was the best way to build loan prediction systems.
I've only seen neural nets used to determine which population a borrower belongs and then choose the credit score card based off of that population.  I do think neural nets are interesting, I just have less exposure at this point.

Title: Re: Regression Trees
Post by: Fred on June 08, 2014, 02:24:47 AM
Has anyone here put LC data through Regression Trees?  I just came across this stuff on a work assignment and they used it on a portfolio of credit to find advance relationships that would be too difficult to detect without big data technology.  I'm considering signing up for the free trial and running LC data through it to see what it shows but was curious if anyone here had done it or had experience with it. And yes, I'm looking at you Fred, AnilG, and BryceMason.

I have not tried Regression Trees; although I tried multi-variate regressions wrapped around genetic algorithm -- the regression models were crossed & mutated to get better models.

However, I am interested in learning what you do for your work with these tools.  Keep us posted.
Title: Re: Regression Trees
Post by: brycemason on June 08, 2014, 10:47:10 AM
That's a fine approach to improve models, but one better use a very high alpha because in my book this is a fishing expedition. You check every permutation with a genetic or tree and there are bound to be (1-alpha)% variables significant just by chance. So, back testing and high alpha are important here IMO.
Title: Re: Regression Trees
Post by: rawraw on June 08, 2014, 01:43:51 PM
That's a fine approach to improve models, but one better use a very high alpha because in my book this is a fishing expedition. You check every permutation with a genetic or tree and there are bound to be (1-alpha)% variables significant just by chance. So, back testing and high alpha are important here IMO.
Good points.  I knew there was a reason we kept you stats guys around :)
Title: Re: Regression Trees
Post by: 99greenballoons on June 10, 2014, 10:43:30 AM
Has anyone here put LC data through Regression Trees?  ...to find advance relationships that would be too difficult to detect without big data machine learning technology.
Ftfy.  Whether the service uses "big data technology" (some form of MapReduce or MPP) depends on the quantity of data and not so much on the algorithm used.  Common case of the over-inclusive buzzword  :)

My account is only just now nearing the one-year mark (6mo note average maturity), but has been successful so far with multiple regression via a genetic algorithm.
Title: Re: Regression Trees
Post by: rawraw on June 10, 2014, 02:39:22 PM
Has anyone here put LC data through Regression Trees?  ...to find advance relationships that would be too difficult to detect without big data machine learning technology.
Ftfy.  Whether the service uses "big data technology" (some form of MapReduce or MPP) depends on the quantity of data and not so much on the algorithm used.  Common case of the over-inclusive buzzword  :)

My account is only just now nearing the one-year mark (6mo note average maturity), but has been successful so far with multiple regression via a genetic algorithm.
yea I got a 30 day free trial from the account reps.  Now time to analyze that data and then back test for predictability
Title: Re: Regression Trees
Post by: nuncsystems on November 12, 2019, 06:14:24 AM
even i think that a neural net was the best way to build loan prediction systems

example: https://www.nuncsystems.com/big-data.html