Author Topic: Automatic order creation and selection  (Read 5223 times)

sociallender

  • Sr. Member
  • ****
  • Posts: 273
    • View Profile
    • Email
Automatic order creation and selection
« on: February 14, 2013, 12:08:21 AM »
Hello everyone,

I am new to lendingclub but prettty good with numbers and programming.  I started an account at LC and found it too time consuming to choose good loans and then create an order including each loan.  So, I created some software that statistically data mines the loans.  I also created a windows application to allow users of LC to easily create an order without manually having to click on each url of the loan. 

I just started with LC so not sure how the loan selection is going to turn out but thought some of you may be interested in trying out the software (in beta test now) as well as the loan selection process.  The site is still under construction but the URL is sociallender.blogspot.com

I will be updating the site daily with the loans that meet the statistical models criteria.  I currently use regression with penalization to create a strict selection criteria. 

For those that are interested in my other investment venture, I have a stock market system that I have implemented using similar modelling techniques (NNs) with over a year of production history.  It is assetclassta.blogspot.com.  As you can see, I enjoy numbers :)

John




Zach

  • Administrator
  • Hero Member
  • *****
  • Posts: 622
    • View Profile
    • Email
Re: Automatic order creation and selection
« Reply #1 on: February 14, 2013, 12:24:09 AM »
Hello everyone,

I am new to lendingclub but prettty good with numbers and programming.  I started an account at LC and found it too time consuming to choose good loans and then create an order including each loan.  So, I created some software that statistically data mines the loans.  I also created a windows application to allow users of LC to easily create an order without manually having to click on each url of the loan. 

I just started with LC so not sure how the loan selection is going to turn out but thought some of you may be interested in trying out the software (in beta test now) as well as the loan selection process.  The site is still under construction but the URL is sociallender.blogspot.com

I will be updating the site daily with the loans that meet the statistical models criteria.  I currently use regression with penalization to create a strict selection criteria. 

For those that are interested in my other investment venture, I have a stock market system that I have implemented using similar modelling techniques (NNs) with over a year of production history.  It is assetclassta.blogspot.com.  As you can see, I enjoy numbers :)

John

Interesting ideas...

1) Can you explain how you assess the quality of loans?
2) How does your stock market system work?
3) The google docs spreadsheet is not public and requires that you request access to view it...(was this intentional?)


sociallender

  • Sr. Member
  • ****
  • Posts: 273
    • View Profile
    • Email
Re: Automatic order creation and selection
« Reply #2 on: February 14, 2013, 01:04:07 AM »
zpbsfg,

The loans are modeled using the LCs historical database.  Unfortunately, it is very difficult to explain the rules created by the model.  Many algorithms such as neural networks are black boxes with no way (well actually there are some new processes that can on some) to tell.  Other algos such as decision trees can show the process but with the number of attributes, the trees contain hundreds of nodes.  In my case, I tested many algos such as SVMs, Bayes, linear regression, NN and settled on random forests with cost adjustment for imbalanced data.  My stat package provides attribute selection as well as precision metrics.  With 10 fold cross validation, the model seems to do a good job generalizing true positive (good loans) with precision at approx 93%.  However, many of the loans are discarded due to the cost penalization even if they do become fully paid.  The idea for me was to only choose loans that have a good chance of being paid in full even if many false postive (good loans that were incorrectly classified as bad) loans were discarded.

The stock market system is a completely different methodology using neural networks.  I am working on an overview document of how it works in detail.  I hope to have it posted there in the next few weeks.

Thanks for pointing out the link wasnt working.  It was not my intention.  I just changed it to point to the folder of the spreadsheet.  I am not quite sure why gdocs is giving me issues.  I am using googlecl to upload the docs but it doesnt seem to be putting in the sub folder correctly.  I have to manually move them and I think that may be the issue.  The folder link should be working tho (and the icon now to the folder). 

John

yojoakak

  • Hero Member
  • *****
  • Posts: 765
    • View Profile
    • Check out my Greasemonkey/Tampermonkey script for LendingClub here
    • Email
Re: Automatic order creation and selection
« Reply #3 on: February 14, 2013, 02:05:44 AM »
You should add a link directly to the loan, e.g. add a new column to the left of A and fill it with this:

="https://www.lendingclub.com/browse/loanDetail.action?loan_id=" & B2
« Last Edit: February 14, 2013, 02:10:11 AM by yojoakak »

Zach

  • Administrator
  • Hero Member
  • *****
  • Posts: 622
    • View Profile
    • Email
Re: Automatic order creation and selection
« Reply #4 on: February 14, 2013, 02:35:06 AM »
zpbsfg,

The loans are modeled using the LCs historical database.  Unfortunately, it is very difficult to explain the rules created by the model.  Many algorithms such as neural networks are black boxes with no way (well actually there are some new processes that can on some) to tell.  Other algos such as decision trees can show the process but with the number of attributes, the trees contain hundreds of nodes.  In my case, I tested many algos such as SVMs, Bayes, linear regression, NN and settled on random forests with cost adjustment for imbalanced data.  My stat package provides attribute selection as well as precision metrics.  With 10 fold cross validation, the model seems to do a good job generalizing true positive (good loans) with precision at approx 93%.  However, many of the loans are discarded due to the cost penalization even if they do become fully paid.  The idea for me was to only choose loans that have a good chance of being paid in full even if many false postive (good loans that were incorrectly classified as bad) loans were discarded.

The stock market system is a completely different methodology using neural networks.  I am working on an overview document of how it works in detail.  I hope to have it posted there in the next few weeks.

Thanks for pointing out the link wasnt working.  It was not my intention.  I just changed it to point to the folder of the spreadsheet.  I am not quite sure why gdocs is giving me issues.  I am using googlecl to upload the docs but it doesnt seem to be putting in the sub folder correctly.  I have to manually move them and I think that may be the issue.  The folder link should be working tho (and the icon now to the folder). 

John

Can you explain these returns to me?
Weeks   99   99

Compound Return   64.61%   13.14%

Spread   51.47%   

Avg Weekly Return   0.65%   0.13%

Wins   59   50

Percent Wins   59.60%   50.51%

Max Win   7.24%   6.15%

Max Loss   -10.67%   -10.67%

Does this indicate your annual returns average about 33%?

breitenm

  • Newbie
  • *
  • Posts: 15
    • View Profile
Re: Automatic order creation and selection
« Reply #5 on: February 14, 2013, 03:45:30 AM »
Hi John,

That looks pretty interesting. I built something similar, but I am counting all loans that are (or ever were) late as a bad loan. I'm a very conservative investor and try to only invest in loans that make no problems whatsoever :)

Can you share a variable importance plot from the random forest? Does desc_len in the spreadsheet refer to the length of the loan description? Also have you tried splitting the data in time? What I mean is you could train the model on loans issued up to say January 2011 and before.  The accuracy gets estimated on loans that were issued after that point in time.
My LendingClub Credit Rating Model: http://cervisia.org/lc_credit/

New Jersey Guy

  • Hero Member
  • *****
  • Posts: 914
  • Hell Yea it's a Hemi!
    • View Profile
    • Email
Re: Automatic order creation and selection
« Reply #6 on: February 14, 2013, 10:22:46 AM »
"Of the over 18,000 loans, approximately 18% of the loans were charged off (loan was not paid in full).  As expected, grade A loans perform the best with an average default rate of only 8% while grade G defaulted 44% of the time."

Algorithms, models, peanut logs and fruitcakes.  All of this is beyond a simpleton like me.   As a regular Joe that can't add 3 numbers with a calculator correctly, I even find this statement off your website hard to believe.  Nearly half of G loans will default?

Perhaps some of you smarter old-timers who have been doing this longer can elaborate on the accuracy of this.  It appears to me it is inconsistent with what Lending Club reports.  Or, am I wrong?

With a 33% to 43% default rate on E, F and G loans, it doesn't seem possible to achieve a positive return if these are the grades your diversifying in.

Return over deposits:   66.82%
IRR:   86.54%
As of April 30, 2014

breitenm

  • Newbie
  • *
  • Posts: 15
    • View Profile
Re: Automatic order creation and selection
« Reply #7 on: February 14, 2013, 11:37:44 AM »
Perhaps some of you smarter old-timers who have been doing this longer can elaborate on the accuracy of this.  It appears to me it is inconsistent with what Lending Club reports.  Or, am I wrong?

I have a higher percentage of bad loans in my statistics, because I count loans that were late as bad also. If I recall correctly lending club counts loans that aren't fully paid yet (status "current") as successes in their statistics. In my opinion that's not correct, because until the loan is actually fully repaid nobody can know if the loan will be repaid in full or not (or if it will ever be late) and it should simply be excluded from the statistic.
My LendingClub Credit Rating Model: http://cervisia.org/lc_credit/

sociallender

  • Sr. Member
  • ****
  • Posts: 273
    • View Profile
    • Email
Re: Automatic order creation and selection
« Reply #8 on: February 14, 2013, 11:47:26 AM »
@yojoakak

Good suggestion.  I am working on that for today's run

@zpbsfg

(for my other stock market blog) the second column is for my system (weekly market trades), the 3rd column is the benchmark S&P500.  So for example, over the past 99 weeks, my system has a compound return of 64% while the S&P during that same period is at 13%.  You can take a look at the performance page for more of the trades and breakdown.

@breitenm

Sadly my stat package (weka) does not have the ability to display the RF trees.  Even if it did, it uses multiple trees so determining the importance of a single attribute would also need to be coded.  It just doesn't have that function to my knowledge.  However I did run a attribute evaluator (InfoGain and Ranker) with the following results (descending order of importance):

Ranked attributes:
 0.033125    4 credit_grade
 0.031705    2 interest_rate
 0.020555    3 loan_length
 0.017653    9 fico_range
 0.013517   14 revolving_line_utilization
 0.009888    5 loan_purpose
 0.007297    8 monthly_income
 0.005468   15 inquiries_in_the_last_6
 0.004476    6 debt-to-income_ratio
 0.003225   21 months_since_last_record
 0.003147   20 public_records_on_file
 0.00206    22 employment_length
 0.001964   12 total_credit_lines
 0.001875   11 open_credit_lines
 0.001812    1 amount_requested
 0.001585   10 earliest_credit_line
 0.00086    23 desc_len
 0.000705    7 home_ownership
 0          13 revolving_credit_balance
 0          19 months_since_last_delinquency
 0          18 delinquencies_(last_2_yrs)
 0          16 accounts_now_delinquent
 0          17 delinquent_amount

Also, if i understand the time question, you are interested in knowing the accuracy using a percentage split of the training set for the cross validation instead of folds?  I would have to include the date to get a correct split (no dates are used in current training set).  However, the loans are sorted oldest to newest in the training file, so after doing a 66% split, with 34% cross validation (presumably the most recent loans), the precision accuracy of true positive is still 94%. Of 6526 instances in test set, 1429 loans were classified as good.  Of these 1429 instances, 1351 were correct and 78 were incorrect).  This is consistent with 10 fold cross validation.  Hope that answers your question

@New Jersey Guy

Mmmm... i love peanuts and fruitcake!  I just did a simple pivot table in excel of the loanStats.csv file that you can download from lendingclub.com.  However, I need to qualify that loans were categrorized as charged off if they were:

Charged Off
Default
Does not meet the current credit policy  Status: Charged Off
Does not meet the current credit policy  Status: Default

Unless I messed something up, these are the default rates for each loan grade.  However, average 18% for all loan grades.  If I am right (someone please confirm), then it pays to select your loans wisely.

John


yojoakak

  • Hero Member
  • *****
  • Posts: 765
    • View Profile
    • Check out my Greasemonkey/Tampermonkey script for LendingClub here
    • Email
Re: Automatic order creation and selection
« Reply #9 on: February 14, 2013, 02:56:50 PM »
If you freeze the first row (View > Freeze Rows > Freeze 1 row) then the headers will stay in place.

Zach

  • Administrator
  • Hero Member
  • *****
  • Posts: 622
    • View Profile
    • Email
Re: Automatic order creation and selection
« Reply #10 on: February 14, 2013, 03:05:03 PM »
@sociallender

Is the only reason you're investing with LC because you want more diversification?
With the returns you have achieved in the stock market, it seems like a much better return than LC could possibly yield....?


sociallender

  • Sr. Member
  • ****
  • Posts: 273
    • View Profile
    • Email
Re: Automatic order creation and selection
« Reply #11 on: February 14, 2013, 03:42:25 PM »
@sociallender

Is the only reason you're investing with LC because you want more diversification?
With the returns you have achieved in the stock market, it seems like a much better return than LC could possibly yield....?

3 months ago, I didn't even know lending club existed.  My brother asked me to look into it based on his friend's advice.  I did a cursory review and as time progressed it started to look more appealing.  I agreed to help him and now manage his account for him.  Yes, I believe that LC provides good diversification for my portfolio.  My stock market strategy is very volatile (low sharpe).  It is profitable but takes confidence and faith.  Not something that I am going to bet the whole bank on until I have a few more years of evidence to support more investment.   

This lendingclub project i am working is just an extension of what I have already done for my brother (except the windows software).  I have the orders automated from command line for him.  But others may find the software useful.  I always wanted to create windows software and this was my first.  Not sure how helfpul it will be but I can tell you there was no way I was going to invest my brother's 20K in $25 increments manually. 


New Jersey Guy

  • Hero Member
  • *****
  • Posts: 914
  • Hell Yea it's a Hemi!
    • View Profile
    • Email
Re: Automatic order creation and selection
« Reply #12 on: February 14, 2013, 04:00:12 PM »
" My stock market strategy is very volatile (low sharpe).  It is profitable but takes confidence and faith.  Not something that I am going to bet the whole bank on until I have a few more years of evidence to support more investment."

It's me again, Mr. Simpleton.
My portfolio of stocks and bonds are nothing more than mutual funds.  See, simple!  I put money in and hopefully it grows.

  Personally, I'm glad to have you on board.  There are others on this board who are equally knowledgable and into crunching numbers in order to squeeze an extra .001%
You'll fit right in and I look forward to taking advantage of all your hard work!
Return over deposits:   66.82%
IRR:   86.54%
As of April 30, 2014

sociallender

  • Sr. Member
  • ****
  • Posts: 273
    • View Profile
    • Email
Re: Automatic order creation and selection
« Reply #13 on: February 14, 2013, 04:26:46 PM »
" My stock market strategy is very volatile (low sharpe).  It is profitable but takes confidence and faith.  Not something that I am going to bet the whole bank on until I have a few more years of evidence to support more investment."

It's me again, Mr. Simpleton.
My portfolio of stocks and bonds are nothing more than mutual funds.  See, simple!  I put money in and hopefully it grows.

  Personally, I'm glad to have you on board.  There are others on this board who are equally knowledgable and into crunching numbers in order to squeeze an extra .001%
You'll fit right in and I look forward to taking advantage of all your hard work!

Mr. Simpleton, you have the right strategy.  Slow and steady wins the race.  Best advice for 99.99% of investors.

Glad i can help some with their loan selection.  I am hoping to improve my results more than .001% though.  But i do agree that many try to split hairs when it comes to investments (loan selection).  I fell into this trap last year in the market.  I tried to get too complicated and suffered as a result.  Simple is almost always better (especially in the stock market)!

Anyone can do what I am doing.  Doesn't take much to data mine these days.  Putting together an automated system is a bit more laborious.  Hopefully, i will have all the kinks worked out soon.