Author Topic: Old Data Download  (Read 6179 times)

qwertyfan

  • Newbie
  • *
  • Posts: 2
    • View Profile
    • Email
Old Data Download
« on: December 07, 2015, 07:44:07 PM »
Does anyone have a logged-in Lending Club data download with the old, augmented fields available? The files were named LoanStats3a_securev1.csv et al. The current files on Lending Club have significantly fewer fields available. The Internet Archive does not have the 'secure' version of the files either, but an insecure version without all the fields. The latest file I found is from April 2014, leaving a big hole in my data and making modeling harder. I would super appreciate anyone sharing their data, would be glad to buy them the digital equivalent of a coffee :)

PhilGD

  • Full Member
  • ***
  • Posts: 150
    • View Profile
    • Email
Re: Old Data Download
« Reply #1 on: December 07, 2015, 08:08:34 PM »
The only files I am aware of with *all* the credit data available are the ones on the internet archive. They incorporate August 2012 - June 2014.  Which is all of LoanStats3b and January to June of LoanStats3c. Prior to August 2012 LC did not provide expanded credit data.

Those old files on the internet archive, as you mentioned, are the "insecure" files. The data that is missing from the "insecure" files is the borrower FICO score at time of origination and the 3-digit zip code. These data points can be pulled from the current files provided by LC, and merged with the "old" files.

If anyone has additional data to this I'd be more than happy to provide an additional digital equivalent of a coffee
« Last Edit: December 07, 2015, 08:12:35 PM by PhilGD »

qwertyfan

  • Newbie
  • *
  • Posts: 2
    • View Profile
    • Email
Re: Old Data Download
« Reply #2 on: December 07, 2015, 08:45:59 PM »
Great, thank you for your help.

As you suggested, I looked on the InternetArchive, and got all the data up to Sep 30th, 2014. The only fields missing are the ones you indicated, which are available in the new data. Unless someone downloaded data after that date but before LC got rid of the additional variables, I believe this is the best we'll do.

Rob L

  • Hero Member
  • *****
  • Posts: 2117
    • View Profile
Re: Old Data Download
« Reply #3 on: December 08, 2015, 02:05:46 AM »
I think I have the old files you are looking for.
Send me a PM and I'll send you a Dropbox link that will let you upload them.

qwertuser

  • Newbie
  • *
  • Posts: 3
    • View Profile
Re: Old Data Download
« Reply #4 on: August 23, 2017, 01:15:56 PM »
I am looking for the old Lending club files as well. I tried searching InternetArchive but couldn't find them. It will be very helpful if some could provide a link to the old files.
 
Thanks

panther02912

  • Newbie
  • *
  • Posts: 13
    • View Profile
Re: Old Data Download
« Reply #5 on: August 24, 2017, 11:52:42 AM »
This is regarding data that LC once provided, but no longer posts for fresh or recent loans, right?    If so, then the missing data would help predict/model older loans, but it's not clear how one could use it to predict new loans.

Or are you, qwertyuser et al, trying to figure out performance of older loans for Folio purchases?

PhilGD

  • Full Member
  • ***
  • Posts: 150
    • View Profile
    • Email
Re: Old Data Download
« Reply #6 on: August 24, 2017, 02:38:12 PM »
The question in the OP isn't relevant anymore because LC is back to providing the expanded data attributes for all loans issued from August 2012 - present in their historical data downloads. Therefore the granularity of the historical data is now on par (for the most part) with the granularity of the data on loans available for investment.

Rob L

  • Hero Member
  • *****
  • Posts: 2117
    • View Profile
Re: Old Data Download
« Reply #7 on: August 24, 2017, 02:59:47 PM »
There were quite a number of data fields in the Loanstats files prior to the LC IPO that were sanitized or removed.
It was assumed the changes were made to better protect the identity of the borrowers.
However, since this info isn't available any more it's hard to see what value it could have today (except possibly for Folio).

qwertuser

  • Newbie
  • *
  • Posts: 3
    • View Profile
Re: Old Data Download
« Reply #8 on: August 29, 2017, 01:42:55 PM »
I am very new to P2P lending and have only started putting a little bit of money last month. Based on what I've read here, the returns seems to have gone down in the last couple of months. However, I am trying to take an algorithmic/AI based approach to figure out the best loans. A model is as good as the data you build it on.

I am trying to find out variables that can be good predictors. The usual ones like Purpose, Home Ownership, dti etc are fairly commonly used when deciding whether to buy a loan or not. However there are more than 100 variables in that dataset and I am trying to see if there are certain other predictors which I can find.

One other issue that I am facing is that I am unable to identify which columns in the files don't change once a loan is issued. There are columns like "open_acc", "total_acc". Lets say a new loan was available for investment in 2012 where the "open_acc" was 5. Loan was paid back in 2015, but 3 accounts were opened in 2013. Will the lending club file show "open_acc" as 5 (original value) or 8 (updated value)? There are probably 50 such variables where this sort of update can take place. For eg "last_fico_high" is a very good predictor of defaults but i am sure that it changes once the loan is issued. Thus I am trying to find out which variables are immutable once a loan is issued and which ones change thereafter. I am guessing the old files might help me figure this out. I have looked at the data dictionary on their website, but it has not been very helpful. I was wondering if anyone has suggestions to tackle this problem.

Also the some of the files till 2014 have some missing columns like "open_acc_6m", "open_il_6m",   "open_il_12m",   "open_il_24m", "mths_since_rcnt_il" etc. I am not sure whether the old data has these variables or not and thus want to take a look at it.

Any suggestions are welcome.
Thanks

Fred93

  • Hero Member
  • *****
  • Posts: 2227
    • View Profile
Re: Old Data Download
« Reply #9 on: August 29, 2017, 02:05:27 PM »
One other issue that I am facing is that I am unable to identify which columns in the files don't change once a loan is issued. There are columns like "open_acc", "total_acc". Lets say a new loan was available for investment in 2012 where the "open_acc" was 5. Loan was paid back in 2015, but 3 accounts were opened in 2013. Will the lending club file show "open_acc" as 5 (original value) or 8 (updated value)? There are probably 50 such variables where this sort of update can take place.

Almost none of the credit variables update.  The "last fico" fields are not part of the initial set of fields, ie are not available prior to loan issuance.  After loan issuance, they update monthly more or less.  The employment verification field updates during the loan application process, which may be before or after you buy a note, so that field is essentially useless for decisionmaking.

Quote
For eg "last_fico_high" is a very good predictor of defaults but i am sure that it changes once the loan is issued.

Precisely.

Quote
Thus I am trying to find out which variables are immutable once a loan is issued and which ones change thereafter. I am guessing the old files might help me figure this out. I have looked at the data dictionary on their website, but it has not been very helpful. I was wondering if anyone has suggestions to tackle this problem.

All the credit variables are immutable, with the exceptions noted above.  The old files won't help you.  The data dictionary is poor.

Quote
Also the some of the files till 2014 have some missing columns like "open_acc_6m", "open_il_6m",   "open_il_12m",   "open_il_24m", "mths_since_rcnt_il" etc. I am not sure whether the old data has these variables or not and thus want to take a look at it.

Over time, LC has several times changed which fields are provided.  Fields have gone away.  Fields have been added.  Loans whose applications were made at a time when a particular field was not in use don't have the field and will never have it.  This adds to the challenge of backtesting.


qwertuser

  • Newbie
  • *
  • Posts: 3
    • View Profile
Re: Old Data Download
« Reply #10 on: August 29, 2017, 02:18:53 PM »
Thanks for replying. Is there a comprehensible list of immutable variables?

There are variables like mthsSinceLastDelinq, mthsSinceLastMajorDerog, mthsSinceLastRecord, mthsSinceRcntIl or totCollAmt, totCurBal, totHiCredLim, totalAcc or openIl12m, openIl24m, openIl6m, percentBcGt75 and many more. I don't know whether any of them are even useful, but I am not sure which ones to discard out completely.


Fred93

  • Hero Member
  • *****
  • Posts: 2227
    • View Profile
Re: Old Data Download
« Reply #11 on: August 29, 2017, 02:55:17 PM »
Thanks for replying. Is there a comprehensible list of immutable variables?

There is no comprehensive list of anything.

Rob L

  • Hero Member
  • *****
  • Posts: 2117
    • View Profile
Re: Old Data Download
« Reply #12 on: August 30, 2017, 11:02:02 AM »
Start with only the data fields (CSV columns) that are available in the files containing new loans that are made available 4 times a day. For making buy/pass decisions nothing else matters. A couple of these fields change in real time as notes are being purchased by investors; for example FUNDED_AMOUNT and possibly INVESTOR_COUNT. I strongly believe all the rest are immutable. Many, but not all, of these fields are simply copied into the LoanStats file for historic record (and back testing of course) and never change once there. If this were not true then back testing with NSR or equivalent would have been fundamentally flawed from the start. I've never heard anyone propose this. One caution. An empty field may indicate data not available, or it may not. For example if the field MTHS_SINCE_RECENT_BC_DLQ is empty it could mean data not available, or more likely there has never been a BC_DLQ.