Author Topic: Lending Club loan default prediction model question  (Read 1944 times)

larrydag

  • Newbie
  • *
  • Posts: 14
    • View Profile
    • Email
Lending Club loan default prediction model question
« on: February 10, 2019, 08:57:10 AM »
I've built a loan default prediction model with Lending Club about 2 years ago and I've been investing modestly with it since then.  I'm getting about 5.5 to 6% adj. return on my loans.  So I think its working fairly well.  I'm trying to improve the model hopefully one day achieve 10% returns.  I'm wondering if anyone else has built similar models and have come up with creative variable transformations on the historical loan data?  Here are some that I've come up with

loan_to_income = loan amount / income
payment_to_income = installment / income
time_since_earliest_credit_line = earliest credit line date - issue date
open_acc_ratio = open_acc / total_acc
curr_bal_ratio = tot_cur_bal / total_bal_ex_mort

some of these are more or less predictive.  Anyone have any other interesting transforms?

My inspiration for developing a Lending Club model came from LendingRobot  http://blog.lendingrobot.com/research/predicting-the-number-of-payments-in-peer-lending/

Rob L

  • Hero Member
  • *****
  • Posts: 2043
    • View Profile
Re: Lending Club loan default prediction model question
« Reply #1 on: February 10, 2019, 09:48:43 AM »
I recommend the book "Credit Scoring, Response Modeling and Insurance Rating" by Steven Finlay.
Also recommend the statistical package R as it's free, open source and very powerful.
Finally, I recommend the following LA thread and particularly the post by brycemason 12/23/2015.
In particularly note the referral to "the four horsemen of the consumer credit scoring apocalypse".
https://forum.lendacademy.com/index.php/topic,3570.msg31594.html#msg31593

Anyway, test transformations for covariance with your other model factors to see if they statistically add value;
loan_to_income (one of your transformations) is (was) one of the four biggies.
Good luck with that 10%!


AnilG

  • Hero Member
  • *****
  • Posts: 1090
    • View Profile
    • PeerCube
Re: Lending Club loan default prediction model question
« Reply #2 on: February 11, 2019, 12:29:43 AM »
Three important transformations are:
  • Installment to Income
  • Loan Amount to Revolving Balance
  • Credit Age (Earliest Credit Date - Loan Issue Date)
---
Anil Gupta
PeerCube Thoughts blog https://www.peercube.com/blog
PeerCube https://www.peercube.com

Rob L

  • Hero Member
  • *****
  • Posts: 2043
    • View Profile
Re: Lending Club loan default prediction model question
« Reply #3 on: February 11, 2019, 10:30:33 AM »
Three important transformations are:
  • Installment to Income
  • Loan Amount to Revolving Balance
  • Credit Age (Earliest Credit Date - Loan Issue Date)

This is a bit of a technicality but it's important. Clearly if you want to be able to take your model result and use it to determine whether or not the LC model has "mispriced" a loan (ie find the best loans) then your model may not incorporate any output of the LC model (Grade, Interest Rate, etc). If your model is independent of the LC model and LC changes its model (underwriting standards) then comparisons with your model results (that have not changed) should be immediately evident. Installment is based on LC's assigned interest rate which is the key product of it's model so you can't use Installment in your model. The best you can do is use loan to income. (Credit to Bryce for this insight, a very long time ago).


AnilG

  • Hero Member
  • *****
  • Posts: 1090
    • View Profile
    • PeerCube
Re: Lending Club loan default prediction model question
« Reply #4 on: February 11, 2019, 08:17:30 PM »
Installment to income represents capability to pay, what does loan amount to income represent?

Three important transformations are:
  • Installment to Income
  • Loan Amount to Revolving Balance
  • Credit Age (Earliest Credit Date - Loan Issue Date)

This is a bit of a technicality but it's important. Clearly if you want to be able to take your model result and use it to determine whether or not the LC model has "mispriced" a loan (ie find the best loans) then your model may not incorporate any output of the LC model (Grade, Interest Rate, etc). If your model is independent of the LC model and LC changes its model (underwriting standards) then comparisons with your model results (that have not changed) should be immediately evident. Installment is based on LC's assigned interest rate which is the key product of it's model so you can't use Installment in your model. The best you can do is use loan to income. (Credit to Bryce for this insight, a very long time ago).
---
Anil Gupta
PeerCube Thoughts blog https://www.peercube.com/blog
PeerCube https://www.peercube.com

Rob L

  • Hero Member
  • *****
  • Posts: 2043
    • View Profile
Re: Lending Club loan default prediction model question
« Reply #5 on: February 12, 2019, 10:15:42 AM »
Installment to income represents capability to pay, what does loan amount to income represent?

Three important transformations are:
  • Installment to Income
  • Loan Amount to Revolving Balance
  • Credit Age (Earliest Credit Date - Loan Issue Date)

This is a bit of a technicality but it's important. Clearly if you want to be able to take your model result and use it to determine whether or not the LC model has "mispriced" a loan (ie find the best loans) then your model may not incorporate any output of the LC model (Grade, Interest Rate, etc). If your model is independent of the LC model and LC changes its model (underwriting standards) then comparisons with your model results (that have not changed) should be immediately evident. Installment is based on LC's assigned interest rate which is the key product of it's model so you can't use Installment in your model. The best you can do is use loan to income. (Credit to Bryce for this insight, a very long time ago).

My answer would simply be "something very important".
In a multi-variate logistic regression its statistical significance is quite large; only surpassed by FICO. Using words of the English language to describe a relationship seems sensible enough but our own personal biases attach a significance or lack there-of that may not be accurate. I'll go where the numbers take me and prefer to exclude all LC model results from being inputs my own model, period. That permits my model to be a completely unbiased observer so to speak when evaluating the outputs of the LC model to determine which loans meet my own investment criteria and which do not.

For those that may not already know I no longer invest in LC loans and its been a long time since I participated in any of this.
Judging from the Cumulative ROI's I posted recently I don't plan to resume. But then again I never planned to resume anyway.

https://forum.lendacademy.com/index.php/topic,5076.0.html

EDIT: Changed "linear regression" to "logistic regression" which was used. Like I said, it's been quite a while.
« Last Edit: February 13, 2019, 08:51:29 AM by Rob L »

Roux

  • Newbie
  • *
  • Posts: 23
    • View Profile
    • Liquid P2P
Re: Lending Club loan default prediction model question
« Reply #6 on: February 12, 2019, 08:07:21 PM »
Our Data Scientist, Guangming Lang, used machine learning to mine the LC historical data. He used a combination of R and XGBoost to train our Liquid P2P loan selection models. I believe these are one click installs on AWS if you're inclined to tackle such a project.

https://liquidp2p.com/
https://www.linkedin.com/in/gmlang/
https://www.r-project.org/about.html
https://xgboost.readthedocs.io/en/latest/

larrydag

  • Newbie
  • *
  • Posts: 14
    • View Profile
    • Email
Re: Lending Club loan default prediction model question
« Reply #7 on: February 12, 2019, 08:55:24 PM »
Thanks for all of the replies.  I should have shared a little about myself and my methods.  I have experience building predictive credit models in financial institutions.  My primary tool of choice to build predictive models is R.  I'm very fond of the GLMNET package and my methods resemble Frank Harrells "Regression Modeling Strategies". 

Roux

  • Newbie
  • *
  • Posts: 23
    • View Profile
    • Liquid P2P
Re: Lending Club loan default prediction model question
« Reply #8 on: February 12, 2019, 09:18:05 PM »
Maybe you and Guangming should have a chat... lol. Iím a serial entrepreneur, not a data scientist. I knew what I wanted to build and assembled a team. He obviously was a critical team member. Guangming also authored a book on scoring consumer credit. I would be happy to show and discuss some of his work in detail if you want to pm me.


Sent from my iPhone using Tapatalk

Rob L

  • Hero Member
  • *****
  • Posts: 2043
    • View Profile
Re: Lending Club loan default prediction model question
« Reply #9 on: February 13, 2019, 09:40:44 AM »
Thanks for all of the replies.  I should have shared a little about myself and my methods.  I have experience building predictive credit models in financial institutions.  My primary tool of choice to build predictive models is R.  I'm very fond of the GLMNET package and my methods resemble Frank Harrells "Regression Modeling Strategies".

Very good. Thanks for the tip on the book.
Please share a bit more of your experience if you will. It would be so interesting so see how things are now.
Claim "secret sauce" where appropriate.

1) Is LC offering enough loans that meet your criteria for you to be able to stay fully invested? Would it be too much to ask that $ amount?
2) Presumably you are using the API to access new loans at the four "feeding times". Is there still a race? Do you consider speed important?
3) What's the Term and Grade allocation of your portfolio 36(%A, %B, ...) and 60(%A, %B, ...) where %x is a percent of the total $ principal invested?

TIA
 

mikedev10

  • Newbie
  • *
  • Posts: 37
    • View Profile
    • Email
Re: Lending Club loan default prediction model question
« Reply #10 on: February 13, 2019, 07:51:09 PM »
Thanks for all of the replies.  I should have shared a little about myself and my methods.  I have experience building predictive credit models in financial institutions.  My primary tool of choice to build predictive models is R.  I'm very fond of the GLMNET package and my methods resemble Frank Harrells "Regression Modeling Strategies".

Very good. Thanks for the tip on the book.
Please share a bit more of your experience if you will. It would be so interesting so see how things are now.
Claim "secret sauce" where appropriate.

1) Is LC offering enough loans that meet your criteria for you to be able to stay fully invested? Would it be too much to ask that $ amount?
2) Presumably you are using the API to access new loans at the four "feeding times". Is there still a race? Do you consider speed important?
3) What's the Term and Grade allocation of your portfolio 36(%A, %B, ...) and 60(%A, %B, ...) where %x is a percent of the total $ principal invested?

TIA

there's literally nothing to pick from at those 4 times a day in the primary market.  i made a very picky algo and my algo never buys anything, because of the 1000 loans issue today, i think the majority are bought as whole loans, then the remainder is bought by retail investors with a simple rules mix of grade and duration, then a few loans dribble out to the api - like 20-30 a day.


AnilG

  • Hero Member
  • *****
  • Posts: 1090
    • View Profile
    • PeerCube
Re: Lending Club loan default prediction model question
« Reply #11 on: February 14, 2019, 09:47:15 PM »
Did you use installment/income  term in addition to loan amount/income and FICO in your multivariate logistic regression? Did you also use separate monthly income term in your regression? If not, then your statement is ingenuous as you didn't considered the relative importance of these terms in respect to each other. If you had considered relative merits of these terms together in your regression, you would know that monthly income is a very important "borrower characteristics" datapoint and any transformation containing monthly income will be weighted heavily in a regression. The first step of any regression analysis is to identify important and influential attributes to include in the regression.

The English language explanation for loan amount/income transformation is simple. This transformation represents whether a borrower given certain income can pay back the loan amount or not irrespective of duration. The installment/income transformation represents whether a borrower given certain income can make regular payment of installment amount over certain duration to payback loan amount or not. It is a "borrower indebtedness" datapoint and goes along with DTI.

When you are lending on LC primary market, you are deciding whether to lend on the LC given terms of lending (interest rate, duration installment). If you were deciding the terms of lending yourself (for ex: Prosper 1.0), then your strategy of not considering platform recommended terms of lending in assessing the loan quality will be effective and you will come up with your own acceptable terms of lending at which you will lend.

Sorry to see you discontinue the lending but not surprised.

Installment to income represents capability to pay, what does loan amount to income represent?

My answer would simply be "something very important".
In a multi-variate logistic regression its statistical significance is quite large; only surpassed by FICO. Using words of the English language to describe a relationship seems sensible enough but our own personal biases attach a significance or lack there-of that may not be accurate. I'll go where the numbers take me and prefer to exclude all LC model results from being inputs my own model, period. That permits my model to be a completely unbiased observer so to speak when evaluating the outputs of the LC model to determine which loans meet my own investment criteria and which do not.

For those that may not already know I no longer invest in LC loans and its been a long time since I participated in any of this.
Judging from the Cumulative ROI's I posted recently I don't plan to resume. But then again I never planned to resume anyway.

https://forum.lendacademy.com/index.php/topic,5076.0.html

EDIT: Changed "linear regression" to "logistic regression" which was used. Like I said, it's been quite a while.
---
Anil Gupta
PeerCube Thoughts blog https://www.peercube.com/blog
PeerCube https://www.peercube.com

Rob L

  • Hero Member
  • *****
  • Posts: 2043
    • View Profile
Re: Lending Club loan default prediction model question
« Reply #12 on: February 15, 2019, 11:36:31 AM »
Did you use installment/income  term in addition to loan amount/income and FICO in your multivariate logistic regression? Did you also use separate monthly income term in your regression? If not, then your statement is ingenuous as you didn't considered the relative importance of these terms in respect to each other. If you had considered relative merits of these terms together in your regression, you would know that monthly income is a very important "borrower characteristics" datapoint and any transformation containing monthly income will be weighted heavily in a regression. The first step of any regression analysis is to identify important and influential attributes to include in the regression.

The English language explanation for loan amount/income transformation is simple. This transformation represents whether a borrower given certain income can pay back the loan amount or not irrespective of duration. The installment/income transformation represents whether a borrower given certain income can make regular payment of installment amount over certain duration to payback loan amount or not. It is a "borrower indebtedness" datapoint and goes along with DTI.

When you are lending on LC primary market, you are deciding whether to lend on the LC given terms of lending (interest rate, duration installment). If you were deciding the terms of lending yourself (for ex: Prosper 1.0), then your strategy of not considering platform recommended terms of lending in assessing the loan quality will be effective and you will come up with your own acceptable terms of lending at which you will lend.

Sorry to see you discontinue the lending but not surprised.

Installment to income represents capability to pay, what does loan amount to income represent?

My answer would simply be "something very important".
In a multi-variate logistic regression its statistical significance is quite large; only surpassed by FICO. Using words of the English language to describe a relationship seems sensible enough but our own personal biases attach a significance or lack there-of that may not be accurate. I'll go where the numbers take me and prefer to exclude all LC model results from being inputs my own model, period. That permits my model to be a completely unbiased observer so to speak when evaluating the outputs of the LC model to determine which loans meet my own investment criteria and which do not.

For those that may not already know I no longer invest in LC loans and its been a long time since I participated in any of this.
Judging from the Cumulative ROI's I posted recently I don't plan to resume. But then again I never planned to resume anyway.

https://forum.lendacademy.com/index.php/topic,5076.0.html

EDIT: Changed "linear regression" to "logistic regression" which was used. Like I said, it's been quite a while.

Installment / Income wasn't used for the reason I mentioned before, loan amount / income was. (Actually Installment / Income was used in early models but somewhere along the way Bryce noted the problem regarding the use of Installment and replaced it with loan amount / income.) IIRC the change didn't have a major effect on the model results.

Yes, income and loan amount were also included separately.

The objective of the model was to produce results very much Prosper 1.0, yielding an independent probability of default (i.e "risk"). In addition to the risk model a measure of "reward" was also computed (using the LC assigned interest rate, etc.). Prosper 1.0 offered no comparative "reward" basis which IMO is why it failed. Using risk and reward it was simple enough to rank a set of loans LC offered from best to worst and purchase only the ones ranked best. Of course this determination is all relative. If all the loans are lousy then selecting the best ones will still be lousy; and vice versa. Back in 2013 and 2014 there were lots of very good loans (as we now know from hindsight). Picking the relatively best ones was a winning bet. As time moved forward risk / reward increased and I had to chose whether to accept less reward for risk or buy fewer loans. Unfortunately I lowered my lending standards and accepted less reward for the risk. I had no idea it would get as bad as it did. Had I not lowered my lending standards my guess is that I would have completely stopped purchasing LC D&E notes in 2015. My bad. Actually the 16Q2 fiasco saved me from myself as it caused me to stop purchasing loans, sell half of my loan portfolio and reassess. I did buy a few more higher risk (D&E) loans in the fall, switched to all B and stopped all purchases in Feb 17. I was ready to leave LC for good. All the credit for the model is Bryce Mason's, not mine, but I have a pretty good handle how it worked and my comments are based on that understanding. We collaborated on quite a number of things back then.

I'm sorry to be leaving LC as it's been both an interesting and profitable hobby.
Guess when it became less profitable it also became less interesting.

larrydag

  • Newbie
  • *
  • Posts: 14
    • View Profile
    • Email
Re: Lending Club loan default prediction model question
« Reply #13 on: February 15, 2019, 08:55:34 PM »
Thanks for all of the replies.  I should have shared a little about myself and my methods.  I have experience building predictive credit models in financial institutions.  My primary tool of choice to build predictive models is R.  I'm very fond of the GLMNET package and my methods resemble Frank Harrells "Regression Modeling Strategies".

Very good. Thanks for the tip on the book.
Please share a bit more of your experience if you will. It would be so interesting so see how things are now.
Claim "secret sauce" where appropriate.

1) Is LC offering enough loans that meet your criteria for you to be able to stay fully invested? Would it be too much to ask that $ amount?
2) Presumably you are using the API to access new loans at the four "feeding times". Is there still a race? Do you consider speed important?
3) What's the Term and Grade allocation of your portfolio 36(%A, %B, ...) and 60(%A, %B, ...) where %x is a percent of the total $ principal invested?

TIA

My modeling method is using a Cox Prop Hazard multivariate survival model tuned with GLMNET.  Nothing really special.  I've never put a survival model in production and wanted to give it a go.  I've worked in auto finance for the last 7 years and have done applied math and data analysis for most of my career.  I've built credit scoring models for large lenders.  It is actually quite fun in my opinion.

1) to be honest I do it as a hobby.  I've only invested a few thousand in the last couple of years.  I'm doing it to keep my chops up and it interests me.
2) yes I'm using the API.  I don't invest enough frequency to see if speed is important
3) 36: B 10%, C 33%, D 17%, E 14%     60: B 2%, C 11%, D 4%, E 5%, F/G 2%
« Last Edit: February 15, 2019, 08:59:38 PM by larrydag »

AnilG

  • Hero Member
  • *****
  • Posts: 1090
    • View Profile
    • PeerCube
Re: Lending Club loan default prediction model question
« Reply #14 on: February 16, 2019, 01:15:37 AM »
So, you had no theoretical basis/reason for excluding "installment/income" in favor of "loan amount/income" from your model. That's all I wanted to highlight as a forum participant reached out to me offline for more clarification on merit of using installment over loan amount. I typically don't get into back and forth on internet forums. Thanks for your time in explaining the reasoning.
 

Installment / Income wasn't used for the reason I mentioned before, loan amount / income was. (Actually Installment / Income was used in early models but somewhere along the way Bryce noted the problem regarding the use of Installment and replaced it with loan amount / income.) IIRC the change didn't have a major effect on the model results.
Yes, income and loan amount were also included separately.


The objective of the model was to produce results very much Prosper 1.0, yielding an independent probability of default (i.e "risk"). In addition to the risk model a measure of "reward" was also computed (using the LC assigned interest rate, etc.). Prosper 1.0 offered no comparative "reward" basis which IMO is why it failed. Using risk and reward it was simple enough to rank a set of loans LC offered from best to worst and purchase only the ones ranked best. Of course this determination is all relative. If all the loans are lousy then selecting the best ones will still be lousy; and vice versa. Back in 2013 and 2014 there were lots of very good loans (as we now know from hindsight). Picking the relatively best ones was a winning bet. As time moved forward risk / reward increased and I had to chose whether to accept less reward for risk or buy fewer loans. Unfortunately I lowered my lending standards and accepted less reward for the risk. I had no idea it would get as bad as it did. Had I not lowered my lending standards my guess is that I would have completely stopped purchasing LC D&E notes in 2015. My bad. Actually the 16Q2 fiasco saved me from myself as it caused me to stop purchasing loans, sell half of my loan portfolio and reassess. I did buy a few more higher risk (D&E) loans in the fall, switched to all B and stopped all purchases in Feb 17. I was ready to leave LC for good. All the credit for the model is Bryce Mason's, not mine, but I have a pretty good handle how it worked and my comments are based on that understanding. We collaborated on quite a number of things back then.

I'm sorry to be leaving LC as it's been both an interesting and profitable hobby.
Guess when it became less profitable it also became less interesting.
---
Anil Gupta
PeerCube Thoughts blog https://www.peercube.com/blog
PeerCube https://www.peercube.com