While processing the loans and notes files distributed by LC, I've noted a few things that make one scratch the noggin'. Some of them are just nonsensical sloppiness, such as giving the same field, containing the same data different names in different files, or providing the very same information in different files, but with different formats or conventions. Then there are some things that just some kind of wrong.
Lending Club employees, feel free to defend yourself right here on the forum!
Those of you who've been thru this will no doubt chuckle, 'cause you've all been thru this before me.
This is a mix of venting and a question or two. I'll just spew...
1. In the loans files, there are some loans that have the status of "Current" and yet have never made any payments! It was my understanding that the status "Issued" was for new loans that have not yet made payments, and "Current" was for loans that have made payments. That's true sometimes, however it clearly ain't that simple. I didn't count 'em, but there are over a dozen, and they are all recent.
Examples: ID = 12385147, or 12407908, or 12625678
In each case, total_pymnt = 0 and last_pymnt is null. These loans have made no payments.
2. In the loans files, the term of a loan is in a field named "term" and is a text string, either " 36 months" or " 60 months" (yes it begins with a space!), whereas in the notes.csv file, the same information is in a field named "LoanMaturity.Maturity", and here it is a simple numeric, ie "36" or "60". Different name. Different format. Same information.
3. In the loans files, the size of the loan is called "funded_amount". In the notes.csv file, the same information is called "AmountLent". (ok, of course in one case its the total loan amount, and in the other case its the note amount, but its the same concept, and if I read both the loan files and my notes file in to do similar analysis on them, this number goes the same place in that calculation.)
4. Interest rate is really interesting. It should be identical, but it has a different name, different format, and different data. I'll use numbers from Loan ID 367384. In the loans files, the field is named "int_rate", and the data is "11.26%" (yes the percent sign is in the data), however, in the notes.csv file the field is named "InterestRate" and the data is "0.112628" . This is not only a different format, it is a different number. Seems to be that way for every loan. How can this be? Where did those extra digits come from? They're not shown to borrowers or lenders on the web site. Are they paying me that extra money?
5. In the loans file, there are empty fields in some loans. These are simply empty, ie they end up null fields when you read them into a database. In the notes.csv file, there are also some empty fields, but IN ADDITION, there are also fields that contain the text string "null"!
These are not the only differences. They're a sample.
Looking at these files one might imagine that they came from different companies, but in fact they came from the very same company!
If anyone has any insight on #1 or #4 I'd be interested to hear it.