Author Topic: LC database sloppiness  (Read 11451 times)

Fred93

  • Hero Member
  • *****
  • Posts: 2166
    • View Profile
LC database sloppiness
« on: April 23, 2014, 02:02:13 AM »
While processing the loans and notes files distributed by LC, I've noted a few things that make one scratch the noggin'.  Some of them are just nonsensical sloppiness, such as giving the same field, containing the same data different names in different files, or providing the very same information in different files, but with different formats or conventions.  Then there are some things that just some kind of wrong.

Lending Club employees, feel free to defend yourself right here on the forum!

Those of you who've been thru this will  no doubt chuckle, 'cause you've all been thru this before me.

This is a mix of venting and a question or two.  I'll just spew...

1. In the loans files, there are some loans that have the status of "Current" and yet have never made any payments!  It was my understanding that the status "Issued" was for new loans that have not yet made payments, and "Current" was for loans that have made payments.  That's true sometimes, however it clearly ain't that simple.  I didn't count 'em, but there are over a dozen, and they are all recent.

Examples: ID = 12385147, or 12407908, or 12625678

In each case, total_pymnt = 0 and last_pymnt is null.  These loans have made no payments.

2. In the loans files, the term of a loan is in a field named "term" and is a text string, either " 36 months" or " 60 months" (yes it begins with a space!), whereas in the notes.csv file, the same information is in a field named "LoanMaturity.Maturity", and here it is a simple numeric, ie "36" or "60".  Different name.  Different format.  Same information.

3. In the loans files, the size of the loan is called "funded_amount".  In the notes.csv file, the same information is called "AmountLent".  (ok,  of course in one case its the total loan amount, and in the other case its the note amount, but its the same concept, and if I read both the loan files and my notes file in to do similar analysis on them, this number goes the same place in that calculation.)

4. Interest rate is really interesting.  It should be identical, but it has a different name, different format, and different data.  I'll use numbers from Loan ID 367384.  In the loans files, the field is named "int_rate", and the data is "11.26%"  (yes the percent sign is in the data), however, in the notes.csv file the field is named "InterestRate" and the data is "0.112628"  .  This is not only a different format, it is a different number.  Seems to be that way for every loan.  How can this be?  Where did those extra digits come from?  They're not shown to borrowers or lenders on the web site.  Are they paying me that extra money?

5. In the loans file, there are empty fields in some loans.  These are simply empty, ie they end up null fields when you read them into a database.  In the notes.csv file, there are also some empty fields, but IN ADDITION, there are also fields that contain the text string "null"!

These are not the only differences.  They're a sample.

Looking at these files one might imagine that they came from different companies, but in fact they came from the very same company!

If anyone has any insight on #1 or #4 I'd be interested to hear it. 

lascott

  • Hero Member
  • *****
  • Posts: 1423
    • View Profile
    • Appreciate my post and want to try LendingRobot? URL below
Re: LC database sloppiness
« Reply #1 on: April 23, 2014, 08:21:06 AM »
4. Interest rate is really interesting.  It should be identical, but it has a different name, different format, and different data.  I'll use numbers from Loan ID 367384.  In the loans files, the field is named "int_rate", and the data is "11.26%"  (yes the percent sign is in the data), however, in the notes.csv file the field is named "InterestRate" and the data is "0.112628"  .  This is not only a different format, it is a different number.  Seems to be that way for every loan.  How can this be?  Where did those extra digits come from?  They're not shown to borrowers or lenders on the web site.  Are they paying me that extra money?
Certainly they are keeping all numbers out to several digits. I'm sure there is some "banking" standard.

I noticed something similar on my first payment ever as I've been looking at my activity for just over a month.

On the Account Activity the credit/debit/my_balance columns have little dotted underscores. A mouse over gives a pop up box that shows the "real" number out to 10 decimal points. 

Tools I use: (main) BlueVestment: https://www.bluevestment.com/app/pricing + https://www.interestradar.com/ , (others) Lending Robot referral link: https://www.lendingrobot.com/ref/scott473/  & Peercube referral code: DFVA9Y

core

  • Hero Member
  • *****
  • Posts: 1784
  • Your loss is my gain
    • View Profile
Re: LC database sloppiness
« Reply #2 on: April 23, 2014, 08:42:28 AM »
Certainly they are keeping all numbers out to several digits. I'm sure there is some "banking" standard.

That wasn't Fred93's point.  This is the stated interest rate, an exact number agreed to by both parties (ok 3 parties), not an accrual calculation like your example case.

On the Account Activity the credit/debit/my_balance columns have little dotted underscores. A mouse over gives a pop up box that shows the "real" number out to 10 decimal points. 

Those numbers are not as "real" as you say.  Those numbers do not agree with any form of math known to man nor beast.  See this thread.  If you're going to calculate numbers out to 12 digits they should agree with something.  They should be able to be duplicated.  They don't and they can't be.  Something is very wrong there.

core

  • Hero Member
  • *****
  • Posts: 1784
  • Your loss is my gain
    • View Profile
Re: LC database sloppiness
« Reply #3 on: April 23, 2014, 09:15:55 AM »
Examples: ID = 12385147, or 12407908, or 12625678

In each case, total_pymnt = 0 and last_pymnt is null.  These loans have made no payments.

I checked these last night and the middle one did have a recent payment which posted less than 2 days ago.  As for why your file didn't reflect that, I suspect this is all just the usual crap about everything being out of sync.  The freshness and accuracy of the data you see depends on what place on the site you're looking at, where you are when you're looking at it, and what EdgeCast proxy you happen to get tossed to along the way.  I am getting SO SICK OF THIS.  If I reload the freaking page 5 times from 5 different IPs I shouldn't get 5 differing versions of the data.  FICO trend up, down, up, flat, down, all in seconds, ARGH!!!!!!!!

in the notes.csv file the field is named "InterestRate" and the data is "0.112628"  .  This is not only a different format, it is a different number.  Seems to be that way for every loan.  How can this be?  Where did those extra digits come from?  They're not shown to borrowers or lenders on the web site.  Are they paying me that extra money?

LC paying you that extra interest?  HAH, that's a good one there.

I believe this is the first time I've seen this mentioned.  Good find.  This may or may not explain some of the interest discrepancies that have always existed.  I'm sure that's part of it. 

As for why this number is slightly larger, I can only offer guesses:
1. Something to do with accounting days in a year, 360 rather than 365
2. As the earth's rotation slows down you need to increase the actual interest rate to make the same return.  Best to hide this from users.
3. Stephanie's vigorish
4. The usual LC incompetence; an error
5. Note "protection money".  You don't want your notes to go on fire, do you?  Things burn, you know.

I agree with most all of your post except for the bits about fields being named slightly different things.  I wouldn't necessarily call this sloppy especially if they are on totally separate areas of the site which have nothing to do with each other.  I know when I'm writing something I often forget if I called it amount_received, received_amount, or recv_amt.  As a result certain blocks of files tend to have their own names depending on what year they were written.   If they don't interact with each other there's no need for them to be called the exact same thing, unless you suffer from a certain affliction that Freud was fond of.

Also keep in mind most of LC's code appears to be written by janitors in various states of sobriety late at night.  I'd be glad to cut them some slack on field names as long as the stuff works.  Maybe it will some day.

rawraw

  • Hero Member
  • *****
  • Posts: 2768
    • View Profile
Re: LC database sloppiness
« Reply #4 on: April 23, 2014, 11:53:37 AM »
Core, have you asked LC these concerns?  I'm sure Stephanie is waiting for your email or call

core

  • Hero Member
  • *****
  • Posts: 1784
  • Your loss is my gain
    • View Profile
Re: LC database sloppiness
« Reply #5 on: April 23, 2014, 01:03:43 PM »
I'm sure Stephanie is waiting for your email or call

Yeah I'll just bet she is.  As in hard up for a date maybe.  She should have returned my calls when she had the chance.

Not necessarily related, but... In Missouri where she's originally from, there are plenty of horses.  Not so many in San Francisco near the new LC job.  Hmmm.

Fred93

  • Hero Member
  • *****
  • Posts: 2166
    • View Profile
Re: LC database sloppiness
« Reply #6 on: April 23, 2014, 01:50:19 PM »
At LC's request, I sent them a screen shot showing that the notes files has more digits in the interest rate than the loans files.  I thought words explained this one pretty well, but I figured I wouldn't argue about a picture.

I think we should all ask for a data dictionary for the notes.csv file.  That way next time some guy sees that he got paid back more principal than the "AmountLent" he won't be so befuddled.

cnmor54

  • Full Member
  • ***
  • Posts: 153
    • View Profile
Re: LC database sloppiness
« Reply #7 on: April 23, 2014, 02:30:57 PM »
Quote
1. In the loans files, there are some loans that have the status of "Current" and yet have never made any payments! 

I believe (and what I believe is all that matters) that the ones that say current are in processing; the payment just hasn't been posted yet.

rawraw

  • Hero Member
  • *****
  • Posts: 2768
    • View Profile
Re: LC database sloppiness
« Reply #8 on: April 23, 2014, 03:33:31 PM »
I wonder if LC has some sort of internal or external data validation process, like an internal audit of the systems.  It seems like in their line of work, they'd need it.  But I don't know how closely they are regulated since they aren't a bank.

lascott

  • Hero Member
  • *****
  • Posts: 1423
    • View Profile
    • Appreciate my post and want to try LendingRobot? URL below
Re: LC database sloppiness
« Reply #9 on: April 23, 2014, 06:35:57 PM »
On the Account Activity the credit/debit/my_balance columns have little dotted underscores. A mouse over gives a pop up box that shows the "real" number out to 10 decimal points. 

Those numbers are not as "real" as you say.  Those numbers do not agree with any form of math known to man nor beast.  See this thread.  If you're going to calculate numbers out to 12 digits they should agree with something.  They should be able to be duplicated.  They don't and they can't be.  Something is very wrong there.
Not sure why you need to act so harsh to posters.

He was comparing "11.26%"  to "0.112628" and I was simply pointing out a visual "display" general number can be used in one area vs a computational number that is used by the underlying software.  I was pointing out it is the same number (obviously rounded) but used for different purposes.

In my first payment ever on LC, I was wondering how they charged me $0.01 for something less than $1 ($0.86) since the fee amount is 1% (100bp) as I understand it.  Then it was clear they were just making the "report" look friendlier.  And nicely the visual dotted underline clued me in that there was more behind it and "allayed my fears" so to speak.


Thanks for the reminder about that thread. Certainly they are doing calcs daily, mulitple-times-per-day, or "continuous".  Hard to believe they are not following some industry practice. Perhaps I'm naive tho as I'm a new "bank/banker".
Tools I use: (main) BlueVestment: https://www.bluevestment.com/app/pricing + https://www.interestradar.com/ , (others) Lending Robot referral link: https://www.lendingrobot.com/ref/scott473/  & Peercube referral code: DFVA9Y

edward

  • Full Member
  • ***
  • Posts: 195
    • View Profile
Re: LC database sloppiness
« Reply #10 on: April 23, 2014, 07:41:25 PM »
Not sure why you need to act so harsh to posters.

I agree with your concern, lascott. Some people on here, as knowledgeable as they are, just haven't mastered basic manners. Even if someone makes a mistake or doesn't understand something, we should be helping each other, or at least having a civil discourse while we exchange opinions under an inviting atmosphere. I'd bet there are some of this formum's followers who want to ask a question or make a comment, but once they read some of the postings on here, don't dare ask a question or try to contribute. I wish the forum was friendlier. I've learned so much on here, but somedays I hate having to wade through all the muck to get there.

core

  • Hero Member
  • *****
  • Posts: 1784
  • Your loss is my gain
    • View Profile
Re: LC database sloppiness
« Reply #11 on: April 23, 2014, 08:28:34 PM »
Edward, perhaps you would be so kind as to define "muck" as it applies here.  Let me guess, your definition of "muck" is "that which does not interest Edward that day".  Do you seriously find it annoying to have to read 7 total posts per day rather than just the 4 informative ones?  This forum isn't exactly high volume.

As for being harsh to posters, I do not see what is harsh about pointing out that LC's numbers appear to be wrong.  If anything the harshness was directed at LC, not lascott.  I don't know what you guys are on about.

Is harsh anything like this edward?
Yeah, read the homepage. Novel idea.
I'm just trying to understand the meaning of harsh here.  That jab of yours seems to fit nicely.  Is this a good example of those basic manners you want to see from everyone here?  A good example of the wholesome nuturing environment you seem to need?
« Last Edit: April 23, 2014, 08:47:25 PM by core »

Bohb Daishi

  • Sr. Member
  • ****
  • Posts: 481
  • I eat free lunches
    • View Profile
Re: LC database sloppiness
« Reply #12 on: April 24, 2014, 02:38:05 AM »
On the Account Activity the credit/debit/my_balance columns have little dotted underscores. A mouse over gives a pop up box that shows the "real" number out to 10 decimal points. 

Those numbers are not as "real" as you say.  Those numbers do not agree with any form of math known to man nor beast.  See this thread.  If you're going to calculate numbers out to 12 digits they should agree with something.  They should be able to be duplicated.  They don't and they can't be.  Something is very wrong there.
Not sure why you need to act so harsh to posters.
I agree with your concern, lascott. Some people on here, as knowledgeable as they are, just haven't mastered basic manners. Even if someone makes a mistake or doesn't understand something, we should be helping each other, or at least having a civil discourse while we exchange opinions under an inviting atmosphere. I'd bet there are some of this formum's followers who want to ask a question or make a comment, but once they read some of the postings on here, don't dare ask a question or try to contribute. I wish the forum was friendlier. I've learned so much on here, but somedays I hate having to wade through all the muck to get there.

Welcome to the internet. But unlike your post, Core's reply didn't seem rude at all. He was stating a fact, not insulting someone's etiquette.
There are three ways to make a living in this business: be first, be smarter, or cheat.

Fred93

  • Hero Member
  • *****
  • Posts: 2166
    • View Profile
Re: LC database sloppiness
« Reply #13 on: April 25, 2014, 01:28:50 PM »
I have a theory on the interest rates with six digits, such as "0.112628" in my notes.csv file.

I observed that these only occur on very old notes.  That  has led me to suspect that this is a "historical oddity", which likely has little importance for the present or future. 

I usually apply date cutoffs when I analyze my notes.csv file anyway, because my strategy has changed so much over time that I am usually uninterested in performance of old notes.


yojoakak

  • Hero Member
  • *****
  • Posts: 763
    • View Profile
    • Check out my Greasemonkey/Tampermonkey script for LendingClub here
    • Email
Re: LC database sloppiness
« Reply #14 on: April 25, 2014, 07:22:12 PM »
I have a theory on the interest rates with six digits, such as "0.112628" in my notes.csv file.


For computers of the vintage of LendingClub's servers you could see all kinds of strange floating point errors,

For example, 1 / 10 added 10 times would not equal 1!!!!



http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html