I have more or less validated your results. I say more-or-less, because I never seem to do anything quite the way somebody else did it. For example, I ran all completed 36 mo loans originated after 2011, whereas you ran just 2013. I ran the "test" only in months where the loan was current, whereas I presume you ran it in months where the loan was current or late. I did this because I thought it would be better to look for current loans to sell, as late loans sell at very poor prices. There are probably more differences, as there are many details. In any case, I get curves of the same shape as yours, although the numbers are a little different. So I call that validated -- more or less.
But then... This analysis depends mightily on what fraction of loans in the universe are detected. In other words, the set you intend to sell. I believe you used the word "sold" to describe them, as certainly the simulation presumes they are sold.
For grade D and a 60 point FICO drop (presuming I am understanding your numbers), you see 27% detected. 633/(633+1704) I see 20.5% . Ok. Given our differences in algorithm and data set, those are similar numbers.
But then... I ran this same test on my current portfolio. It only detects 4.2% of my loans! Yipes. That's way different than either 27% or 20.5% ! Red flag.
Now you might think, well the fraction detected just affects the level of profitability, so I would have less profit from a selling program. Yea, but if the number is this far off, then the other stats, like the fraction of detected who go on to default may be way off too. This leads to possibly bogus selection of threshold, etc. I feel like the robot from Lost in Space is waving his arms and yelling "Danger danger".
Here's what I think is going on. When I use all loans in the history file, that's a really broad set of loans. I broke them down by grade, but nothing else. My portfolio quite different than that, because when I bought those loans, I used a bunch of filter criteria beyond grade, such as: inquiries, DTI, years of credit history, income, and so forth.
What I'm saying is that it seems likely that historical info from the full payments file is not representative of my portfolio, so statistics gathered on this broad unfiltered set do not apply to my portfolio.
So maybe the problem is even harder, and we must filter the history, selecting only loans that pass some filter criteria similar to the filters we used to select the loans in our portfolios, and only then will we have statistics that we can use to guide us in how we should expect our portfolio loans to behave.
Doing one grade at a time was a good idea. The statistics are different for different grades. Maybe it just wasn't nearly enough differentiation.