Author Topic: feedback on a statistical model built in Excel  (Read 11725 times)

Kowser

  • Newbie
  • *
  • Posts: 18
    • View Profile
    • Email
feedback on a statistical model built in Excel
« on: April 01, 2014, 07:37:13 PM »
I am by no means a statistician, but 2 years ago when I first learned about LC and decided I had to get involved as a lender, I spent many hours learning how to work with Excel Pivot Tables because that was the closest tool I had available. I came up with the following formula, and will do my best to explain as easily as possible. I'd love some feedback from the mathematically inclined community.

Example Scenario

1000 total loans with the following dumbed down criteria:
Ownership = Home / Rent
Purpose = Debt Consolidation /  CC Refi

Our loans have an overall 5% total default rate (so 50 defaults). Default breakdown:

Our 1000 loans based on Ownership:
Renters = 30 defaults out of 500 loans = 6% default rate, therefore = 1.2x average of 5%
Owners = 20 defaults out of 500 loans = 4% default rate, therefore = 0.8x average of 5%

Our 1000 loans based on Purpose:
Debt Consolidation =  35 defaults out of 560 loans = 6.25% default rate, therefore = 1.25x average of 5%
CC Refi = 15 defaults out of 440 loans = 3.41% default rate, therefore = 0.682x average of 5%

So, can I conclude?

Two new loans: A Renter pursuing a CC Refi (vs) A home owner for debt consolidation, can be calculated as follows

Renter => 1.2 x 0.682 = 0.8184 x 5% = 4.092% default risk
Owner => 0.8 x 1.25 = 1 x 5% = 5% default risk

Is this math anywhere near the ballpark of sane statistical analysis?!

PS. Thank you to anyone who managed to read (and understand) all of that  ;D
« Last Edit: April 01, 2014, 07:41:36 PM by Kowser »

Thatguybil

  • Newbie
  • *
  • Posts: 39
    • View Profile
Re: feedback on a statistical model built in Excel
« Reply #1 on: April 01, 2014, 08:46:39 PM »
You are correct from a mathematical standpoint.
From a statistical standpoint you may or may not be "close".

Is the default rate follow normal bell shaped curve or is it non-normal?
You have a large sample size so you might assume a normal curve in which case your math would be ok.
On the other hand you can look at the default rate data and see that it does not look like a pretty bell shape.
That would argue that the data is non-normal and you need  to use non-parametric tools.

Also you are only analyzing two variables. Your math is assuming that the other variables either do not matter or are exactly the same between the to samples.  Dangerous assumption.

Also just because you calculated a difference between the two it does not mean that that diference is statically significant.  You would need to calculate a confidence interval.

In short what you have done "makes sence" and is mathematically correct.  It is not however statistically valid. That does not mean it is wrong!

If you want to make a more robust model I would suggest looking at doing a Hazard ratio (survival ratio). In the medical world we use it to predict survival after an event based on some intervention compared to another intervention.  This model made the most sense to me to use for predicting the likely hood of a loan "surviving" to maturity.

I am only failure with stats from the point of view of medical literature. Finance uses the same tools but applies it differently. Likely someone with a robust background in financial statistical models could suggest a better or more appropriate model to use.




Fred

  • Hero Member
  • *****
  • Posts: 1421
    • View Profile
Re: feedback on a statistical model built in Excel
« Reply #2 on: April 01, 2014, 10:31:20 PM »
Since you have the data, you can pivot your table in a 2-dimensional way like this:



The default probability for "A Renter pursuing a CC Refi" would be the D number, and
the default probability for "A home owner for debt consolidation" would be the A number, and so on.

As a rule of thumb, when you have the raw data, you should strive to get the most granular joint probabilities (i.e., A ... D in the above table).  Most of everything else can be calculated from these.

Kowser

  • Newbie
  • *
  • Posts: 18
    • View Profile
    • Email
Re: feedback on a statistical model built in Excel
« Reply #3 on: April 02, 2014, 02:02:33 AM »
I would love to get that granular, but when you have as many criteria as LC loans do, it's just not possible. Hence the math question, in it's most rudimentary form to be able to ask it. I have attached an image of only a fraction of the real spreadsheet just to show the magnitude of how big this gets.
« Last Edit: April 02, 2014, 02:04:29 AM by Kowser »

Jmar42

  • Newbie
  • *
  • Posts: 36
    • View Profile
    • Email
feedback on a statistical model built in Excel
« Reply #4 on: April 02, 2014, 03:06:52 AM »
Very pretty.

rawraw

  • Hero Member
  • *****
  • Posts: 2756
    • View Profile
Re: feedback on a statistical model built in Excel
« Reply #5 on: April 02, 2014, 10:35:06 AM »
Are the defaults dollar weighted or just by count?

edward

  • Full Member
  • ***
  • Posts: 195
    • View Profile
Re: feedback on a statistical model built in Excel
« Reply #6 on: April 02, 2014, 03:54:56 PM »
Kowser,
That's an amazing spreadsheet. Thank you so much for sharing! It confirms most of my filter choices, but gives me some new issues to look at as well.

Kowser

  • Newbie
  • *
  • Posts: 18
    • View Profile
    • Email
Re: feedback on a statistical model built in Excel
« Reply #7 on: April 02, 2014, 07:13:47 PM »
It is all weighted by count of loans. Since my investments are all the same ($50), the loan amounts makes no difference to me other than as a variable for default. Here is the full spreedsheet. It's old, but I have found the trends don't change much between iterations. I may update it again soon. Defaults were anything on that date that was a 16+ days late or worse. Current was all others (Grace, Current, Fully Paid). Keeps it simple and shows trends easily enough. I also did some other blending to help get more data points for low sample areas, but this should be fairly accurate.


States are after applicable cities are EXTRACTED. So NV isn't so bad when you take out Las Vegas/Henderson. Same for Florida.

I don't remember the average % default here. 5.6%?

*DER is derogatory

Hope this helps visualize the spectrum, thought biased through my methods it may be.

Also, I think this is only for loans from 2012 onwards? I don't remember.
« Last Edit: April 02, 2014, 07:23:33 PM by Kowser »

Jmar42

  • Newbie
  • *
  • Posts: 36
    • View Profile
    • Email
Re: feedback on a statistical model built in Excel
« Reply #8 on: April 02, 2014, 07:31:01 PM »
Is there similar data you can pull for the secondary market on folio?

Fred

  • Hero Member
  • *****
  • Posts: 1421
    • View Profile
Re: feedback on a statistical model built in Excel
« Reply #9 on: April 02, 2014, 10:14:15 PM »
I would love to get that granular, but when you have as many criteria as LC loans do, it's just not possible.

Seem like you can rotate your PURPOSE into columns, so you'll have 2-layer column headers.

Then use the Bayes' rule to "condition out" the variables that you are not interested in (i.e., the SUBGRADES).  The column headers will become 1-layer (PURPOSE).  THen you can use the OWN rows to get the joint probabilities of <OWN, PURPOSE> .

Alternative, you can simply replace the SUBGRADE with PURPOSE as the columns.

Kowser

  • Newbie
  • *
  • Posts: 18
    • View Profile
    • Email
Re: feedback on a statistical model built in Excel
« Reply #10 on: April 03, 2014, 12:23:32 AM »
@Fred. I think I may have miscommunicated my point. I'm not just trying to use purpose/ownership, I was simply making an example of something simple to explain my math, but in the grand scheme, the math has to work across everything, so I was curious if it would. When you multiply 18+ different multipliers and come out with 0.38 as the final number, I wanted to know if that was remotely useful in assessing that loan's risk relative to the others (38% as likely as average to default, etc...)

@Jmar42 - Not sure how that would look any different. This represents the same loans that are sold, so if you pull out their attributes and find the respective multipliers, you'd have the same thing analysis as a new loan. I do use the same table for Folio, but I also have the added factor of time. So the far right table, which is unlabeled, shows the % of principal remaining and the likely hood of loans at that principal level defaulting relative to average. So 60-month loans have lost about 75% of their default risk by the time they are 20% repaid (80% remaining).

As for actual results:
I have too much folio stuff to get a pure reading on the results of this table, but a friend of mine is holding a pure hold strategy 14.5% adjusted NAR after 1 year and 15.2% adjusted in another account. This was also when C & Ds were about the only thing available to buy. I prefer XIRR, but she wasn't as active in keeping her money going, so that wouldn't reflect accurately. Overall, pretty damn good results if I could get it set up with an API, but that is a little beyond my skill level for now.

Fred93

  • Hero Member
  • *****
  • Posts: 2102
    • View Profile
Re: feedback on a statistical model built in Excel
« Reply #11 on: April 03, 2014, 01:00:01 AM »
but a friend of mine is holding a pure hold strategy 14.5% adjusted NAR after 1 year and 15.2% adjusted in another account.

Remember that after 1 year, if you were buying only 36 month loans, about 3/4 of the defaults haven't happened yet.  If you were buying 60 month loans, then even more.  NAR doesn't attempt to include the effects of default on now-current loans, which is most of the loans that will default!   Remember that a loan can't default in the first 5 months, because there's a time lag before stop pay is recognized as a default, so if you think of defaults as "stop pay" events, after 12 months, you've had only 7 throws of the dice (out of 36 or 60 total so far).  Judging the return on investment counting only 7 out of 36 events is potentially quite misleading.

rawraw

  • Hero Member
  • *****
  • Posts: 2756
    • View Profile
Re: feedback on a statistical model built in Excel
« Reply #12 on: April 03, 2014, 10:34:44 AM »
@Fred. I think I may have miscommunicated my point. I'm not just trying to use purpose/ownership, I was simply making an example of something simple to explain my math, but in the grand scheme, the math has to work across everything, so I was curious if it would. When you multiply 18+ different multipliers and come out with 0.38 as the final number, I wanted to know if that was remotely useful in assessing that loan's risk relative to the others (38% as likely as average to default, etc...)

I would listen to Fred.  He is probably the best financial quant on the forum, although he has little competition lol

rawraw

  • Hero Member
  • *****
  • Posts: 2756
    • View Profile
Re: feedback on a statistical model built in Excel
« Reply #13 on: April 03, 2014, 10:35:52 AM »
It is all weighted by count of loans. Since my investments are all the same ($50), the loan amounts makes no difference to me other than as a variable for default.
This means that your data assumes that loan size has no influence on the outcome, correct?  I'm fairly certain I've seen analysis that suggests that isn't the case though, but may be mistaken.

Jmar42

  • Newbie
  • *
  • Posts: 36
    • View Profile
    • Email
Re: feedback on a statistical model built in Excel
« Reply #14 on: April 03, 2014, 10:43:43 AM »
So, folio has the same data that I can pull on the loans?