Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Topics - PhilGD

Pages: [1] 2
Investors - LC / Default rates by employment title
« on: December 16, 2016, 01:20:15 AM »
This forum has been a huge resource for me beginning in March 2015 and in the spirit of the holidays, I've decided to share a recent batch of work I've done on Lending Club's historical data. For the past week I've been working to crack open the employment title data and see if it reliably predicts charge off rates. My complete findings are contained in the attached PDF file.

The biggest challenge in this project was dealing with spelling errors and other issues that arise from a free-text field, which make it hard to group the data. For instance, using Excel, it is not easy to simply group all "Presidents" in one category. I had to deal with the people who misspelled a word, included extraneous spaces and special characters, or used compound descriptors like "President & CEO."


1. All loans issued between September 2007 (the earliest issue date) and June 2015 were included in my analysis - more than 643,000 loans. A "charge off" was defined as a loan status of "charged off" or "default," where the date of last payment occurred 0 - 12 months after the loan was issued. If the loan was charged off in month 18, for example, it was not counted as charged off for the purpose of this project. By aligning the data in this way, I was able to remove the effect of loan age for loans issued recently. Why not include all loans through November of 2015 (i.e., all loans with up to twelve months of payment data)? Because I wanted a buffer to account for delinquent loans that might roll into charge-off status. Why choose 12 months of payment data as the cut-off? A: I wanted the largest sample possible; B: fewer than 12 months wouldn't allow for enough seasoning; C: more than 12 months would whittle down the database too significantly; D: a person's employment status is more likely to change as a loan gets older.

2. After I had gathered the data and defined what a "charge off" is, I used a pivot table to determine the most commonly used employment titles. If you're curious, the top three most common were "teacher," "manager," and "owner." I decided to create a short list of employment title categories by taking the top 100 most common titles. The list eventually grew to 124 total categories, since some common categories were not detected by the initial pivot table analysis.

3. The pivot table report showed a clear separation between the low-hanging fruit in the database and everything else. The low-hanging fruit were the people who used a simple description for their employment title and spelled it correctly, such as "teacher." The difficult people used the name of their employer instead of an employment title, or used a compound description such as "President & CEO," or misspelled a word, or included special characters such as "&" or "/".

4. I quickly encountered a problem: some people could be included in multiple employment categories. Based on my shortlist, "Assistant director systems engineering" was an assistant, a director, an engineer, and a systems engineer all at once. I resolved to allow for multiple employment categories/labels per loan in order to overcome this problem:

Breakdown of loans by number of labels
One label335,495
Two labels71,845
Three labels3,556
Four labels98
not labeled271,000
emp title blank  36,886
total 643,698

5. I used keyword searches to label as many loans as possible. For example, "fire fighter," "fire marshall" and "fire chief" were all lumped together. Similarly, "CEO," "COO," "CTO," "CFO", and their non-abbreviated versions were all lumped together into the "C-suite" category.

6. As noted in the table above, over a quarter million loans in my sample remain unlabeled. This represents the really difficult ones - most of them are not employment titles, but employer names, and hence impossible to categorize. Others are indeed employment titles, but they contain severe misspellings or belong in categories that are too uncommon to be statistically significant. There are also probably many loans that could be labeled, but belong to a category that I missed.

7. I calculated charge-off rates for each category and sorted from lowest-to-highest. Median income was also calculated for each employment label. Complete results for all categories are included in the attached PDF file. Below is a small sampling of the categories for the purpose of including a pretty chart:

8. I validated the results by splitting the sample into "earlier loans" and "later loans" and recalculating the charge off rates. If the results were reasonably similar for the separate samples, then we can conclude that employment title is predictive of charge off rates. The cutoff date that I chose was October 2014 - loans issued on or before this date are in the "early" sample and loans after this date are in the "later" sample. I chose this cutoff point solely in the interest of creating equally-sized samples - I wanted an even split. Below is the chart that proves my methodology is mostly accurate:

Please feel free to dig into the complete data set (attached) and offer feedback. I'm particularly interested in suggestions for how to automate the labeling process - parsing through text data for keywords and misspellings is not easy to do in Excel. But if it can be accomplished, then I'm confident it would be useful for including this data in a regression model.

LendingRobot / Forward-looking return dropped 192 bps in one day
« on: July 18, 2016, 11:09:54 PM »
I'm a big fan of the Lending Robot app for iOS and I check it regularly. One feature I like about the app is the forward-looking return, which I assume is calculated using a proprietary Lending Robot model.

As of yesterday, the forward looking return on my account was 10.88%. Today I logged on and the number was down to 8.96%. I was surprised to see such a large variance between one day and the next. Any idea what might be going on? My ANAR reported by lending club has not changed significantly recently. The LR app shows that I've had 4 notes become late and 2 charged off since my last visit. The size of my portfolio is approx 1,000 notes.

Investors - LC / Why does LC limit loan term to 36 or 60 months?
« on: July 15, 2016, 08:34:03 PM »
I received another direct mail solicitation from Discover Personal Loans today and I noticed that Discover offers loan terms of 36, 48, 60, 72, and 84 months. So I was wondering if there's a reason why marketplace lenders like LC and Prosper only offer terms of 36 or 60 months? In other words, if the underwriting tech at LC is as good as they claim then why not compete across all loan terms?


In-depth article on what went down leading up to the big LC announcement last week; not much new information here but still a good read.

Investors - LC / Has LC stopped updating their Performance Files?
« on: March 23, 2016, 05:43:23 PM »
On, I've noticed that although the giant payment history file is still being updated each month, the aggregate performance files haven't been touched since January. Has anyone heard anything about what's going on here?

The article references research from Morgan Stanley and commentary from Peer IQ and Monja related to credit quality in the P2P space.

The recent trouble referred to in the headline is securitizations of Prosper loans that are on watch for downgrade by Moodys, anda 2015 securitization of CircleBack loans that has hit its cumulative loss trigger earlier than expected.

I'm not sure if there has been a blog post about this, but I noticed today that Lending Club has released once again a trove of bureau data in the historical loan files. Where previously the files were restricted to 60-ish fields, the count has jumped to 115, and most of them are populated with numbers fresh for the crunching.

The still-empty fields consist of the 15 new data points that LC began providing earlier in January. But the freshly populated fields contain data that has been restricted from retail investors since mid-2014, and which I know weren't available last time I downloaded loan stats in February.

BlueVestment / No Note Purchases in Last Several Days?
« on: January 02, 2016, 03:03:10 PM »
I have around $600 idle in my LC account and BV hasn't purchased any new notes for a few days. Also, when I tried to log into bluevestment today, it struggled to load and shows my account as inactive with zero notes purchased. Yet, it already purchased 33 notes last week.

Is anyone else having issues?


Investors - LC / Borrower with 9 recent inquiries
« on: November 24, 2015, 11:33:02 AM »
Spotted this loan today with 9 inquiries in the past 6 months:

Yet Lending Club's standard policy states that they don't accept any borrowers with greater than 5 recent inquiries.

See page 8: 

Investors - LC / Huge news RE: credit data
« on: November 12, 2015, 10:57:36 AM »
Hello friends,

In the recent past we discussed the impending removal of a large set of credit attributes from the lending club data files. See the thread below:

LC had posted that they were removing nearly all of the most useful credit data fields from the browse notes file and the API.

As of this morning, Lending Club has amended their "Recent and Upcoming Changes Article." The new update indicates that no longer should we expect important credit data to be removed. In fact, they are adding 15 new attributes effective November 25th. See the article for yourself:

NEW credit attributes to be added include:

OPEN_ACC_6M     Number of open trades in last 6 months
OPEN_IL_6M   Number of currently active installment trades
OPEN_IL_12M   Number of installment accounts opened in past 12 months
OPEN_IL_24M   Number of installment accounts opened in past 24 months
MTHS_SINCE_RCNT_IL   Months since most recent installment accounts opened
TOTAL_BAL_IL   Total current balance of all installment accounts
IL_UTIL   Ratio of total current balance to high credit/credit limit on all install acct
OPEN_RV_12M   Number of revolving trades opened in past 12 months
OPEN_RV_24M   Number of revolving trades opened in past 24 months
MAX_BAL_BC   Maximum current balance owed on all revolving accounts
ALL_UTIL   Balance to credit limit on all trades
TOTAL_CREDIT_BC    Total credit line on open revolving trades
INQ_FI    Number of personal finance inquiries
TOTAL_FI_TL   Number of credit union trades
INQ_LAST_12M     Number of credit inquiries in past 12 months
TOT_COLL_AMT*    Total collections amount ever
ACC_NOW_DELINQ*   Accounts now delinquent
TOT_CUR_BAL*       Total current balance

I can only speculate that this update is intended to compete more effectively with Prosper in the institutional investor arena. We know that Prosper already provides these attributes and more. However, on Prosper, they've removed the ability to track performance of loans so this is clearly a win for LC investors.

It's a buyer's market out there right now, and appears to have been for quite some time. Loans with multiple notes offered for sale have price dispersions on the order of 1,000 basis points or more. People try to sell their late notes at a deep discount, but forget to adjust the price after the loan is brought back to current status. If you have a credit model that is able to evaluate aged notes, the opportunities to take advantage of these mispricings are so hilariously good as to almost feel unethical...

Let the seller beware.

Investors - LC / Why does LC stop updating FICO scores after 499?
« on: September 15, 2015, 12:57:35 PM »
I've been looking through the historical notes files and I don't see any notes with a FICO score lower than 499. In fact, the FICO range at this level will be listed as:

  • last_fico_range_low = 0
  • last_fico_range_high = 499

Do they just automatically charge off notes below the 499 FICO threshold?

Investors - LC / Good News for LC Algorithm-style Investors
« on: July 07, 2015, 07:13:48 PM »
Good news for credit modelers:

Once again, LC has pushed back the date for their planned removal of credit attributes from the primary market issuance data. Now slated for "As early as October 15, 2015".

Must be getting some significant push-back from the institutions.

Does anyone know where I can find Lending Club's historical data file for the 2013 - 2014 cohort? I am talking about the credit file that contains around 100 data points per loan, which Lending Club used to provide before switching to the more limited version that's currently available on their website.

I already have the 2012 - 2013 file (the LoanStats3b file). I'm also compiling a new database containging the primary market issuance files from April 2015 to the present. I have monthly unemployment rates compiled at the zip code level for 651 3-digit zipcode and state areas. I'd be willing to trade either of these files if anyone has access to the LoanStats3c file? If you want my unemployment data, I'll also include a FIPS-code to zip-code conversion table so you can easily update it every month. The conversion table can delineate between the BLS's Local Area Unemployment series ID's, counties, and zip codes for 3,144 counties. It is organized as follows:

Series ID | County | Zip Code | time series data
LAUCN010010000000003 | Autauga County, AL | 36003 | time series data

I want a raw csv - I'm not interested in websites that let you view one loan at a time without offering to download the entire dataset.


I've been scouring all the U.S.-based lending websites I can find in search of an alternative to Lending Club and Prosper.

So far, every platform I've found will only cater to accredited investors and institutions. Are there really just two options in the U.S. for us small time retail folk?

Pages: [1] 2