Author Topic: Regression Trees  (Read 6063 times)

rawraw

  • Hero Member
  • *****
  • Posts: 2767
    • View Profile
Regression Trees
« on: June 06, 2014, 09:01:54 AM »
Has anyone here put LC data through Regression Trees?  I just came across this stuff on a work assignment and they used it on a portfolio of credit to find advance relationships that would be too difficult to detect without big data technology.  I'm considering signing up for the free trial and running LC data through it to see what it shows but was curious if anyone here had done it or had experience with it. And yes, I'm looking at you Fred, AnilG, and BryceMason.

Examples:
http://www.mu-sigma.com/
http://www.salford-systems.com/
http://www.angoss.com/

 
« Last Edit: June 06, 2014, 09:03:48 AM by rawraw »

gamassey

  • Jr. Member
  • **
  • Posts: 72
    • View Profile
    • Email
Re: Regression Trees
« Reply #1 on: June 06, 2014, 09:33:58 AM »
I have always thought that a neural net was the best way to build loan prediction systems.

http://www.r-bloggers.com/using-neural-networks-for-credit-scoring-a-simple-example/

AnilG

  • Hero Member
  • *****
  • Posts: 1094
    • View Profile
    • PeerCube
Re: Regression Trees
« Reply #2 on: June 06, 2014, 12:06:19 PM »
PeerCube uses Decision Tree/Regression Tree for splitting the loan attributes before calculating BLE Risk Index. If you are familiar with R, you can use it with 'party' or 'rpart' packages. As the example below and attached shows, even with three loan attributes and ROI, the analysis gets complex.

Personally, I suggest lenders look into Genetic Algorithm as described by David M. Patierno http://blog.dmpatierno.com/post/3161338411/lending-club-genetic-algorithm. PeerCube has it as Public Loan Filter 'DMP Genetic Algorithm'.

Code: [Select]
> print(loandp_ctree)

Conditional inference tree with 29 terminal nodes

Response:  roi
Inputs:  annual_inc, dti, loan_amnt
Number of observations:  56129

1) dti <= 24.84; criterion = 1, statistic = 986.644
  2) loan_amnt <= 15150; criterion = 1, statistic = 472.029
    3) dti <= 17.95; criterion = 1, statistic = 187.221
      4) annual_inc <= 40310; criterion = 1, statistic = 62.871
        5) loan_amnt <= 7500; criterion = 1, statistic = 45.188
          6) dti <= 8.48; criterion = 1, statistic = 15.918
            7)*  weights = 2005
          6) dti > 8.48
            8) annual_inc <= 23404.1; criterion = 0.99, statistic = 8.542
              9)*  weights = 678
            8) annual_inc > 23404.1
              10)*  weights = 2138
        5) loan_amnt > 7500
          11) annual_inc <= 30680; criterion = 0.997, statistic = 10.895
            12)*  weights = 939
          11) annual_inc > 30680
            13)*  weights = 1667
      4) annual_inc > 40310
        14) dti <= 10.78; criterion = 1, statistic = 21.98
          15) loan_amnt <= 9150; criterion = 0.993, statistic = 9.285
            16)*  weights = 5602
          15) loan_amnt > 9150
            17)*  weights = 4552
        14) dti > 10.78
          18) annual_inc <= 61920; criterion = 1, statistic = 18.053
            19)*  weights = 4426
          18) annual_inc > 61920
            20)*  weights = 5270
    3) dti > 17.95
      21) annual_inc <= 40040; criterion = 1, statistic = 67.193
        22) loan_amnt <= 8700; criterion = 1, statistic = 33.864
          23)*  weights = 2318
        22) loan_amnt > 8700
          24) annual_inc <= 28700; criterion = 0.951, statistic = 5.731
            25)*  weights = 121
          24) annual_inc > 28700
            26)*  weights = 1058
      21) annual_inc > 40040
        27) annual_inc <= 89000; criterion = 1, statistic = 17.068
          28)*  weights = 5719
        27) annual_inc > 89000
          29)*  weights = 968
  2) loan_amnt > 15150
    30) annual_inc <= 67500; criterion = 1, statistic = 118.866
      31) loan_amnt <= 25475; criterion = 1, statistic = 15.061
        32) annual_inc <= 50350; criterion = 1, statistic = 24.175
          33)*  weights = 1916
        32) annual_inc > 50350
          34) dti <= 24.15; criterion = 0.993, statistic = 9.243
            35)*  weights = 2527
          34) dti > 24.15
            36)*  weights = 86
      31) loan_amnt > 25475
        37)*  weights = 322
    30) annual_inc > 67500
      38) loan_amnt <= 25050; criterion = 1, statistic = 104.304
        39) annual_inc <= 101004; criterion = 1, statistic = 23.78
          40)*  weights = 4034
        39) annual_inc > 101004
          41)*  weights = 2626
      38) loan_amnt > 25050
        42) annual_inc <= 94800; criterion = 1, statistic = 20.277
          43)*  weights = 998
        42) annual_inc > 94800
          44) annual_inc <= 189000; criterion = 0.982, statistic = 7.495
            45)*  weights = 1429
          44) annual_inc > 189000
            46)*  weights = 306
1) dti > 24.84
  47) dti <= 29.94; criterion = 1, statistic = 26.847
    48) loan_amnt <= 19500; criterion = 1, statistic = 18.034
      49) annual_inc <= 64500; criterion = 1, statistic = 14.775
        50)*  weights = 1976
      49) annual_inc > 64500
        51)*  weights = 550
    48) loan_amnt > 19500
      52) annual_inc <= 51690.48; criterion = 0.989, statistic = 8.394
        53)*  weights = 84
      52) annual_inc > 51690.48
        54)*  weights = 632
  47) dti > 29.94
    55) annual_inc <= 47500; criterion = 0.999, statistic = 13.619
      56)*  weights = 530
    55) annual_inc > 47500
      57)*  weights = 652
---
Anil Gupta
PeerCube Thoughts blog https://www.peercube.com/blog
PeerCube https://www.peercube.com

brycemason

  • Hero Member
  • *****
  • Posts: 801
    • View Profile
    • P2P-Picks.com
    • Email
Re: Regression Trees
« Reply #3 on: June 06, 2014, 01:58:37 PM »
It can work well, but I'm not a fan. I prefer to understand the reasons why we see relationships, and the computer finding the best way to slide down regression trees / random forests like a game of Plinko on the Price is Right just doesn't satisfy my curiosity. There are many ways to skin a cat.

rawraw

  • Hero Member
  • *****
  • Posts: 2767
    • View Profile
Re: Regression Trees
« Reply #4 on: June 06, 2014, 07:16:27 PM »
It can work well, but I'm not a fan. I prefer to understand the reasons why we see relationships, and the computer finding the best way to slide down regression trees / random forests like a game of Plinko on the Price is Right just doesn't satisfy my curiosity. There are many ways to skin a cat.
Well the way the person used it wasn't necessarily any different than your approach.  He basically said everyone has biases and they look at the same sort of stuff the same sort of way.  He used it to find splits that he wouldn't have necessarily predicted -- then he did analysis to determine what was the reason and did he think it was useful.  In one example, he found the fact that some people in a higher risk population actually had the same probability of default as the low risk population if they had a certain variable.  He went and talked to the production staff to figure out why that was -- then they were able to incorporate that into their models they built themselves. 

Quote
I have always thought that a neural net was the best way to build loan prediction systems.
I've only seen neural nets used to determine which population a borrower belongs and then choose the credit score card based off of that population.  I do think neural nets are interesting, I just have less exposure at this point.

« Last Edit: June 06, 2014, 07:19:14 PM by rawraw »

Fred

  • Hero Member
  • *****
  • Posts: 1421
    • View Profile
Re: Regression Trees
« Reply #5 on: June 08, 2014, 02:24:47 AM »
Has anyone here put LC data through Regression Trees?  I just came across this stuff on a work assignment and they used it on a portfolio of credit to find advance relationships that would be too difficult to detect without big data technology.  I'm considering signing up for the free trial and running LC data through it to see what it shows but was curious if anyone here had done it or had experience with it. And yes, I'm looking at you Fred, AnilG, and BryceMason.

I have not tried Regression Trees; although I tried multi-variate regressions wrapped around genetic algorithm -- the regression models were crossed & mutated to get better models.

However, I am interested in learning what you do for your work with these tools.  Keep us posted.

brycemason

  • Hero Member
  • *****
  • Posts: 801
    • View Profile
    • P2P-Picks.com
    • Email
Re: Regression Trees
« Reply #6 on: June 08, 2014, 10:47:10 AM »
That's a fine approach to improve models, but one better use a very high alpha because in my book this is a fishing expedition. You check every permutation with a genetic or tree and there are bound to be (1-alpha)% variables significant just by chance. So, back testing and high alpha are important here IMO.

rawraw

  • Hero Member
  • *****
  • Posts: 2767
    • View Profile
Re: Regression Trees
« Reply #7 on: June 08, 2014, 01:43:51 PM »
That's a fine approach to improve models, but one better use a very high alpha because in my book this is a fishing expedition. You check every permutation with a genetic or tree and there are bound to be (1-alpha)% variables significant just by chance. So, back testing and high alpha are important here IMO.
Good points.  I knew there was a reason we kept you stats guys around :)

99greenballoons

  • Newbie
  • *
  • Posts: 8
    • View Profile
Re: Regression Trees
« Reply #8 on: June 10, 2014, 10:43:30 AM »
Has anyone here put LC data through Regression Trees?  ...to find advance relationships that would be too difficult to detect without big data machine learning technology.
Ftfy.  Whether the service uses "big data technology" (some form of MapReduce or MPP) depends on the quantity of data and not so much on the algorithm used.  Common case of the over-inclusive buzzword  :)

My account is only just now nearing the one-year mark (6mo note average maturity), but has been successful so far with multiple regression via a genetic algorithm.

rawraw

  • Hero Member
  • *****
  • Posts: 2767
    • View Profile
Re: Regression Trees
« Reply #9 on: June 10, 2014, 02:39:22 PM »
Has anyone here put LC data through Regression Trees?  ...to find advance relationships that would be too difficult to detect without big data machine learning technology.
Ftfy.  Whether the service uses "big data technology" (some form of MapReduce or MPP) depends on the quantity of data and not so much on the algorithm used.  Common case of the over-inclusive buzzword  :)

My account is only just now nearing the one-year mark (6mo note average maturity), but has been successful so far with multiple regression via a genetic algorithm.
yea I got a 30 day free trial from the account reps.  Now time to analyze that data and then back test for predictability

nuncsystems

  • Newbie
  • *
  • Posts: 1
    • View Profile
Re: Regression Trees
« Reply #10 on: November 12, 2019, 06:14:24 AM »
even i think that a neural net was the best way to build loan prediction systems

example: https://www.nuncsystems.com/big-data.html