Lend Academy Network Forum

Lending Club Discussion => Investors - LC => Topic started by: Fred93 on April 23, 2014, 02:02:13 AM

Title: LC database sloppiness
Post by: Fred93 on April 23, 2014, 02:02:13 AM
While processing the loans and notes files distributed by LC, I've noted a few things that make one scratch the noggin'.  Some of them are just nonsensical sloppiness, such as giving the same field, containing the same data different names in different files, or providing the very same information in different files, but with different formats or conventions.  Then there are some things that just some kind of wrong.

Lending Club employees, feel free to defend yourself right here on the forum!

Those of you who've been thru this will  no doubt chuckle, 'cause you've all been thru this before me.

This is a mix of venting and a question or two.  I'll just spew...

1. In the loans files, there are some loans that have the status of "Current" and yet have never made any payments!  It was my understanding that the status "Issued" was for new loans that have not yet made payments, and "Current" was for loans that have made payments.  That's true sometimes, however it clearly ain't that simple.  I didn't count 'em, but there are over a dozen, and they are all recent.

Examples: ID = 12385147, or 12407908, or 12625678

In each case, total_pymnt = 0 and last_pymnt is null.  These loans have made no payments.

2. In the loans files, the term of a loan is in a field named "term" and is a text string, either " 36 months" or " 60 months" (yes it begins with a space!), whereas in the notes.csv file, the same information is in a field named "LoanMaturity.Maturity", and here it is a simple numeric, ie "36" or "60".  Different name.  Different format.  Same information.

3. In the loans files, the size of the loan is called "funded_amount".  In the notes.csv file, the same information is called "AmountLent".  (ok,  of course in one case its the total loan amount, and in the other case its the note amount, but its the same concept, and if I read both the loan files and my notes file in to do similar analysis on them, this number goes the same place in that calculation.)

4. Interest rate is really interesting.  It should be identical, but it has a different name, different format, and different data.  I'll use numbers from Loan ID 367384.  In the loans files, the field is named "int_rate", and the data is "11.26%"  (yes the percent sign is in the data), however, in the notes.csv file the field is named "InterestRate" and the data is "0.112628"  .  This is not only a different format, it is a different number.  Seems to be that way for every loan.  How can this be?  Where did those extra digits come from?  They're not shown to borrowers or lenders on the web site.  Are they paying me that extra money?

5. In the loans file, there are empty fields in some loans.  These are simply empty, ie they end up null fields when you read them into a database.  In the notes.csv file, there are also some empty fields, but IN ADDITION, there are also fields that contain the text string "null"!

These are not the only differences.  They're a sample.

Looking at these files one might imagine that they came from different companies, but in fact they came from the very same company!

If anyone has any insight on #1 or #4 I'd be interested to hear it. 
Title: Re: LC database sloppiness
Post by: lascott on April 23, 2014, 08:21:06 AM
4. Interest rate is really interesting.  It should be identical, but it has a different name, different format, and different data.  I'll use numbers from Loan ID 367384.  In the loans files, the field is named "int_rate", and the data is "11.26%"  (yes the percent sign is in the data), however, in the notes.csv file the field is named "InterestRate" and the data is "0.112628"  .  This is not only a different format, it is a different number.  Seems to be that way for every loan.  How can this be?  Where did those extra digits come from?  They're not shown to borrowers or lenders on the web site.  Are they paying me that extra money?
Certainly they are keeping all numbers out to several digits. I'm sure there is some "banking" standard.

I noticed something similar on my first payment ever as I've been looking at my activity for just over a month.

On the Account Activity the credit/debit/my_balance columns have little dotted underscores. A mouse over gives a pop up box that shows the "real" number out to 10 decimal points. 

(https://forum.lendacademy.com/proxy.php?request=http%3A%2F%2Fi.imgur.com%2Fd00acBd.png&hash=1022e76fba1ce51beff2e4d463366709)
Title: Re: LC database sloppiness
Post by: core on April 23, 2014, 08:42:28 AM
Certainly they are keeping all numbers out to several digits. I'm sure there is some "banking" standard.

That wasn't Fred93's point.  This is the stated interest rate, an exact number agreed to by both parties (ok 3 parties), not an accrual calculation like your example case.

On the Account Activity the credit/debit/my_balance columns have little dotted underscores. A mouse over gives a pop up box that shows the "real" number out to 10 decimal points. 

Those numbers are not as "real" as you say.  Those numbers do not agree with any form of math known to man nor beast.  See this thread (http://www.lendacademy.com/forum/index.php?topic=595.0).  If you're going to calculate numbers out to 12 digits they should agree with something.  They should be able to be duplicated.  They don't and they can't be.  Something is very wrong there.
Title: Re: LC database sloppiness
Post by: core on April 23, 2014, 09:15:55 AM
Examples: ID = 12385147, or 12407908, or 12625678

In each case, total_pymnt = 0 and last_pymnt is null.  These loans have made no payments.

I checked these last night and the middle one did have a recent payment which posted less than 2 days ago.  As for why your file didn't reflect that, I suspect this is all just the usual crap about everything being out of sync.  The freshness and accuracy of the data you see depends on what place on the site you're looking at, where you are when you're looking at it, and what EdgeCast proxy you happen to get tossed to along the way.  I am getting SO SICK OF THIS.  If I reload the freaking page 5 times from 5 different IPs I shouldn't get 5 differing versions of the data.  FICO trend up, down, up, flat, down, all in seconds, ARGH!!!!!!!!

in the notes.csv file the field is named "InterestRate" and the data is "0.112628"  .  This is not only a different format, it is a different number.  Seems to be that way for every loan.  How can this be?  Where did those extra digits come from?  They're not shown to borrowers or lenders on the web site.  Are they paying me that extra money?

LC paying you that extra interest?  HAH, that's a good one there.

I believe this is the first time I've seen this mentioned.  Good find.  This may or may not explain some of the interest discrepancies that have always existed.  I'm sure that's part of it. 

As for why this number is slightly larger, I can only offer guesses:
1. Something to do with accounting days in a year, 360 rather than 365
2. As the earth's rotation slows down you need to increase the actual interest rate to make the same return.  Best to hide this from users.
3. Stephanie's vigorish
4. The usual LC incompetence; an error
5. Note "protection money".  You don't want your notes to go on fire, do you?  Things burn, you know.

I agree with most all of your post except for the bits about fields being named slightly different things.  I wouldn't necessarily call this sloppy especially if they are on totally separate areas of the site which have nothing to do with each other.  I know when I'm writing something I often forget if I called it amount_received, received_amount, or recv_amt.  As a result certain blocks of files tend to have their own names depending on what year they were written.   If they don't interact with each other there's no need for them to be called the exact same thing, unless you suffer from a certain affliction that Freud was fond of.

Also keep in mind most of LC's code appears to be written by janitors in various states of sobriety late at night.  I'd be glad to cut them some slack on field names as long as the stuff works.  Maybe it will some day.
Title: Re: LC database sloppiness
Post by: rawraw on April 23, 2014, 11:53:37 AM
Core, have you asked LC these concerns?  I'm sure Stephanie is waiting for your email or call
Title: Re: LC database sloppiness
Post by: core on April 23, 2014, 01:03:43 PM
I'm sure Stephanie is waiting for your email or call

Yeah I'll just bet she is.  As in hard up for a date maybe.  She should have returned my calls when she had the chance.

Not necessarily related, but... In Missouri where she's originally from, there are plenty of horses.  Not so many in San Francisco near the new LC job.  Hmmm.
Title: Re: LC database sloppiness
Post by: Fred93 on April 23, 2014, 01:50:19 PM
At LC's request, I sent them a screen shot showing that the notes files has more digits in the interest rate than the loans files.  I thought words explained this one pretty well, but I figured I wouldn't argue about a picture.

I think we should all ask for a data dictionary for the notes.csv file.  That way next time some guy sees that he got paid back more principal than the "AmountLent" he won't be so befuddled.
Title: Re: LC database sloppiness
Post by: cnmor54 on April 23, 2014, 02:30:57 PM
Quote
1. In the loans files, there are some loans that have the status of "Current" and yet have never made any payments! 

I believe (and what I believe is all that matters) that the ones that say current are in processing; the payment just hasn't been posted yet.
Title: Re: LC database sloppiness
Post by: rawraw on April 23, 2014, 03:33:31 PM
I wonder if LC has some sort of internal or external data validation process, like an internal audit of the systems.  It seems like in their line of work, they'd need it.  But I don't know how closely they are regulated since they aren't a bank.
Title: Re: LC database sloppiness
Post by: lascott on April 23, 2014, 06:35:57 PM
On the Account Activity the credit/debit/my_balance columns have little dotted underscores. A mouse over gives a pop up box that shows the "real" number out to 10 decimal points. 

Those numbers are not as "real" as you say.  Those numbers do not agree with any form of math known to man nor beast.  See this thread (http://www.lendacademy.com/forum/index.php?topic=595.0).  If you're going to calculate numbers out to 12 digits they should agree with something.  They should be able to be duplicated.  They don't and they can't be.  Something is very wrong there.
Not sure why you need to act so harsh to posters.

He was comparing "11.26%"  to "0.112628" and I was simply pointing out a visual "display" general number can be used in one area vs a computational number that is used by the underlying software.  I was pointing out it is the same number (obviously rounded) but used for different purposes.

In my first payment ever on LC, I was wondering how they charged me $0.01 for something less than $1 ($0.86) since the fee amount is 1% (100bp) as I understand it.  Then it was clear they were just making the "report" look friendlier.  And nicely the visual dotted underline clued me in that there was more behind it and "allayed my fears" so to speak.
(https://forum.lendacademy.com/proxy.php?request=http%3A%2F%2Fi.imgur.com%2Fd00acBd.png&hash=1022e76fba1ce51beff2e4d463366709)

Thanks for the reminder about that thread. Certainly they are doing calcs daily, mulitple-times-per-day, or "continuous".  Hard to believe they are not following some industry practice. Perhaps I'm naive tho as I'm a new "bank/banker".
Title: Re: LC database sloppiness
Post by: edward on April 23, 2014, 07:41:25 PM
Not sure why you need to act so harsh to posters.

I agree with your concern, lascott. Some people on here, as knowledgeable as they are, just haven't mastered basic manners. Even if someone makes a mistake or doesn't understand something, we should be helping each other, or at least having a civil discourse while we exchange opinions under an inviting atmosphere. I'd bet there are some of this formum's followers who want to ask a question or make a comment, but once they read some of the postings on here, don't dare ask a question or try to contribute. I wish the forum was friendlier. I've learned so much on here, but somedays I hate having to wade through all the muck to get there.
Title: Re: LC database sloppiness
Post by: core on April 23, 2014, 08:28:34 PM
Edward, perhaps you would be so kind as to define "muck" as it applies here.  Let me guess, your definition of "muck" is "that which does not interest Edward that day".  Do you seriously find it annoying to have to read 7 total posts per day rather than just the 4 informative ones?  This forum isn't exactly high volume.

As for being harsh to posters, I do not see what is harsh about pointing out that LC's numbers appear to be wrong.  If anything the harshness was directed at LC, not lascott.  I don't know what you guys are on about.

Is harsh anything like this edward?
Yeah, read the homepage. Novel idea.
I'm just trying to understand the meaning of harsh here.  That jab of yours seems to fit nicely.  Is this a good example of those basic manners you want to see from everyone here?  A good example of the wholesome nuturing environment you seem to need?
Title: Re: LC database sloppiness
Post by: Bohb Daishi on April 24, 2014, 02:38:05 AM
On the Account Activity the credit/debit/my_balance columns have little dotted underscores. A mouse over gives a pop up box that shows the "real" number out to 10 decimal points. 

Those numbers are not as "real" as you say.  Those numbers do not agree with any form of math known to man nor beast.  See this thread (http://www.lendacademy.com/forum/index.php?topic=595.0).  If you're going to calculate numbers out to 12 digits they should agree with something.  They should be able to be duplicated.  They don't and they can't be.  Something is very wrong there.
Not sure why you need to act so harsh to posters.
I agree with your concern, lascott. Some people on here, as knowledgeable as they are, just haven't mastered basic manners. Even if someone makes a mistake or doesn't understand something, we should be helping each other, or at least having a civil discourse while we exchange opinions under an inviting atmosphere. I'd bet there are some of this formum's followers who want to ask a question or make a comment, but once they read some of the postings on here, don't dare ask a question or try to contribute. I wish the forum was friendlier. I've learned so much on here, but somedays I hate having to wade through all the muck to get there.

Welcome to the internet. But unlike your post, Core's reply didn't seem rude at all. He was stating a fact, not insulting someone's etiquette.
Title: Re: LC database sloppiness
Post by: Fred93 on April 25, 2014, 01:28:50 PM
I have a theory on the interest rates with six digits, such as "0.112628" in my notes.csv file.

I observed that these only occur on very old notes.  That  has led me to suspect that this is a "historical oddity", which likely has little importance for the present or future. 

I usually apply date cutoffs when I analyze my notes.csv file anyway, because my strategy has changed so much over time that I am usually uninterested in performance of old notes.

Title: Re: LC database sloppiness
Post by: yojoakak on April 25, 2014, 07:22:12 PM
I have a theory on the interest rates with six digits, such as "0.112628" in my notes.csv file.


For computers of the vintage of LendingClub's servers you could see all kinds of strange floating point errors,

For example, 1 / 10 added 10 times would not equal 1!!!!

(https://forum.lendacademy.com/proxy.php?request=http%3A%2F%2Fi60.tinypic.com%2F554gbb.jpg&hash=5e591fc21183688568fbcdf5d6ca00b4)

http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
Title: Re: LC database sloppiness
Post by: Fred on April 26, 2014, 06:26:10 PM
Some of them are just nonsensical sloppiness, such as giving the same field, containing the same data different names in different files, or providing the very same information in different files, but with different formats or conventions.  Then there are some things that just some kind of wrong.

I am not defending LC in any ways on this; however, even Bloomberg often produces incomplete, inconsistent, or even incorrect data.  This is from a company that charges a ton of money for its data!

I think the reasons are the same -- multiple teams working independently from each other.  There are different backend databases and different teams that update them.  I wouldn't be surprised if there are nightly batch jobs that simply copy data from one database to the others.

For display/downloadable data, there are yet different teams (depending on the pages or links) that handle precisions, formats, labels, etc.
Title: Re: LC database sloppiness
Post by: Thatguybil on April 27, 2014, 09:31:08 PM
Some of them are just nonsensical sloppiness, such as giving the same field, containing the same data different names in different files, or providing the very same information in different files, but with different formats or conventions.  Then there are some things that just some kind of wrong.

I am not defending LC in any ways on this; however, even Bloomberg often produces incomplete, inconsistent, or even incorrect data.  This is from a company that charges a ton of money for its data!

I think the reasons are the same -- multiple teams working independently from each other.  There are different backend databases and different teams that update them.  I wouldn't be surprised if there are nightly batch jobs that simply copy data from one database to the others.

For display/downloadable data, there are yet different teams (depending on the pages or links) that handle precisions, formats, labels, etc.

Heck I am amazed at the data integrity errors that can creep into databases when going from a real time database to a historical index database.

The back end guys never like it when you ask why you get different results when you run the same query on databases that are intended to be synced up.
Title: Re: LC database sloppiness
Post by: P2PFact on May 07, 2014, 10:41:27 PM
I also see 20k records in LC data that are almost empty. You can see loan id, member id, loan amnt, address. But that's about it. No loan_status, no term, etc.

I downloaded data 4/18. Maybe just bad luck? Anyone has similar issue?
Title: Re: LC database sloppiness
Post by: Fred93 on May 07, 2014, 10:46:06 PM
I also see 20k records in LC data that are almost empty. You can see loan id, member id, loan amnt, address. But that's about it. No loan_status, no term, etc.

I downloaded data 4/18. Maybe just bad luck? Anyone has similar issue?

I suspect you're referring to the "policy code = 2" loans.  Check out the policy code field.  I believe this is intentional.  These are, I believe, a class of loans which they are not yet offering to the public.  If you google you can find some blogs which mention them.  Just another detail not explained in the official documentation.
Title: Re: LC database sloppiness
Post by: P2PFact on May 07, 2014, 11:29:20 PM
I suspect you're referring to the "policy code = 2" loans.  Check out the policy code field.  I believe this is intentional.  These are, I believe, a class of loans which they are not yet offering to the public.  If you google you can find some blogs which mention them.  Just another detail not explained in the official documentation.

Yeah you are exactly right. All are policy 2 loans. Here are the stats:
loan count    total loan amount     avg loan amnt    state
41    $442,475     $10,792    AK
280    $2,528,325     $9,030    AL
138    $1,124,675     $8,150    AR
512    $4,332,925     $8,463    AZ
3335    $29,471,150     $8,837    CA
405    $3,893,325     $9,613    CO
280    $2,385,575     $8,520    CT
45    $351,675     $7,815    DC
52    $414,300     $7,967    DE
1376    $11,434,600     $8,310    FL
619    $5,776,200     $9,332    GA
141    $1,252,200     $8,881    HI
1    $6,000     $6,000    ID
651    $5,644,900     $8,671    IL
352    $2,807,550     $7,976    IN
163    $1,391,575     $8,537    KS
172    $1,506,600     $8,759    KY
175    $1,585,225     $9,058    LA
335    $3,808,525     $11,369    MA
495    $4,355,875     $8,800    MD
565    $4,669,925     $8,265    MI
365    $2,999,700     $8,218    MN
273    $2,407,625     $8,819    MO
53    $452,275     $8,533    MT
500    $4,505,325     $9,011    NC
89    $890,050     $10,001    NH
711    $6,507,675     $9,153    NJ
116    $1,160,650     $10,006    NM
378    $3,297,850     $8,724    NV
1634    $14,618,200     $8,946    NY
629    $5,355,275     $8,514    OH
146    $1,318,100     $9,028    OK
252    $2,135,175     $8,473    OR
578    $4,898,425     $8,475    PA
106    $898,175     $8,473    RI
184    $1,638,575     $8,905    SC
39    $316,975     $8,128    SD
281    $2,389,775     $8,505    TN
1108    $10,533,925     $9,507    TX
136    $1,254,525     $9,224    UT
555    $5,057,925     $9,113    VA
27    $198,925     $7,368    VT
437    $3,854,400     $8,820    WA
223    $1,938,350     $8,692    WI
71    $583,000     $8,211    WV
37    $320,800     $8,670    WY
Title: Re: LC database sloppiness
Post by: Fred93 on May 08, 2014, 02:09:24 AM
Some background on policy_code=2 loans can be found in this writeup...

http://www.lendacademy.com/policy-code-2-loans-lending-club/