loans/portfolio/partners

Mo Gad

Join Date: Feb 2019
Posts: 16

loans/portfolio/partners

18 Feb 2019, 06:20

Hi all,

I have data that looks like the following

Loan_ID	Loan_date	Lender_ID	Lead_lender	Borrower_ID	Borrower_Profits	Lender_Profits	VAR1	VAR2

1	31/10/2008	231	1	1100	0.1	0.12
1	31/10/2008	234	0	1100	0.1	0.13
1	31/10/2008	235	0	1100	0.1	0.1
5	20/12/2009	150	0	3000	0.2	0.07
5	20/12/2009	234	0	3000	0.2	0.11
5	20/12/2009	231	1	3000	0.2	0.07
5	20/12/2009	400	0	3000	0.2	0.09
7	01/10/2010	120	0	4000	0.05	0.12
7	01/10/2010	125	0	4000	0.05	0.14
7	01/10/2010	240	1	4000	0.05	0.11
7	01/10/2010	234	0	4000	0.05	0.1
9	01/10/2011	231	1	1500	0.15	0.01
9	01/10/2011	234	0	1500	0.15	0.2
9	01/10/2011	250	0	1500	0.15	0.15
3	01/10/2012	231	1	2300	0.1	0.13	0.175	0.136
3	01/10/2012	234	0	2300	0.1	0.01	0.175	0.136
3	01/10/2012	270	0	2300	0.1	0.09	0.175	0.136

This data represents loan data where multiple lenders (identified by Lender_ID) are giving loans to one borrower (identified by Borrower_ID). For example, where Loan_ID=1 there are 3 different lenders providing credit to one borrower. In each loan, there is one lead lender (i.e. main lender) identified by Lead_lender column and the remaining lenders are partners. Now I want to construct two variables as follows:

1. for each lead_lender, I need the average borrower_profits for all borrowers that the lead lender has been giving loans to over the past 3 years. For example, in the last loan (loan_id=3) the lead lender has an ID=231. This lender is also the lead lender in loan_id=9 and loan_id=5 in the past three years. So the average profits for his portfolio=(0.15+0.20) /2 = 0.175. See VAR1 column.

2. for each lead_lender, I need the average partner lender_profits for lenders that are most frequently working with the lead lender (say top 5 by frequency) over the past 3 years. For example, in the last loan (loan_id=3) the lead lender has an ID=231. This lender is frequently working with lender_id=234. So the average profits for his partners=(0.20+0.10+0.11) / 3 = 0.136. See VAR2 column.

Hope it is clear

Thank you for help

Gad

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#2

18 Feb 2019, 12:13

I'm confused. Lender 231 is the lead lender on loans 1, 3, 5, and 9. So why, in calculating average profits, are loans 1 and 3 excluded? Also, when lender 231 participates in a loan, it is always the lead in your example. If a lender participates as a non-lead in some loans, do you also want to include the profits on those loans in the average, or only those loans in which the lender is a lead?

I'm similarly confused about the second variable. Lender 234 participates in loans 1, 3, 5, 7, and 9. Why do you not include loans 7 and 1 in the calculation?

Also, since you want a calculation based on the top 5 partners, it would make sense to post example data where each lead lender (or at least 1 lead lender) actually has more than five different partners

When you post back, please use the -dataex- command to show your example data, rather than typing out a table that is both difficult for the eye to read and a nuisance to import into Stata. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment
Mo Gad

Join Date: Feb 2019

Posts: 16
#3

19 Feb 2019, 04:51

Thank you Clyde Schechter . That is very helpful!

I'm confused. Lender 231 is the lead lender on loans 1, 3, 5, and 9. So why, in calculating average profits, are loans 1 and 3 excluded?

1 and 3 are excluded because I am looking at t-1,t-2 and t-3 only (i.e. the past three years from loan_date). This will help me to identify how healthy was the lead-lender portfolio of borrowers over the past 3 years.

Also, when lender 231 participates in a loan, it is always the lead in your example. If a lender participates as a non-lead in some loans, do you also want to include the profits on those loans in the average, or only those loans in which the lender is a lead?

I need only these cases where lender 231 was the lead-lender in the past three years from loan date. So if lender 231 was a partner in any of the previous three years it shouldn't be included.

I'm similarly confused about the second variable. Lender 234 participates in loans 1, 3, 5, 7, and 9. Why do you not include loans 7 and 1 in the calculation?

loan 7 should be included but loan 1 should not as it is t-4 and my focus is on t-1,t-2 and t-3.

This will help me to identify how healthy was the lead-lender top 5 partners over the past 3 years from loan date.

Since you want a calculation based on the top 5 partners, it would make sense to post example data where each lead lender actually has more than five different partners

I did it, thanks! Please see the example below:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte Loan_ID int(Loan_date Lender_ID) byte Lead_lender int Borrower_ID double(Borrower_Profits Lender_Profits) byte(VAR1 VAR2) 1 17836 231 1 1100 .1 .12 . . 1 17836 233 0 1100 .1 .1 . . 1 17836 234 0 1100 .1 .05 . . 1 17836 250 0 1100 .1 .13 . . 1 17836 125 0 1100 .1 .12 . . 1 17836 235 0 1100 .1 .1 . . 5 18251 150 0 3000 .2 .07 . . 5 18251 234 0 3000 .2 .11 . . 5 18251 231 1 3000 .2 .07 . . 5 18251 400 0 3000 .2 .09 . . 7 18536 231 0 4000 .05 .12 . . 7 18536 125 0 4000 .05 .14 . . 7 18536 240 1 4000 .05 .11 . . 7 18536 234 0 4000 .05 .1 . . 9 18901 231 1 1500 .15 .01 . . 9 18901 230 0 1500 .15 .07 . . 9 18901 125 0 1500 .15 .11 . . 9 18901 250 0 1500 .15 .02 . . 9 18901 233 0 1500 .15 .03 . . 9 18901 234 0 1500 .15 .2 . . 9 18901 270 0 1500 .15 .15 . . 3 19267 231 1 2300 .04 .13 . . 3 19267 234 0 2300 .04 .01 . . 3 19267 270 0 2300 .04 .09 . . 3 19267 233 0 2300 .04 .1 . . 3 19267 250 0 2300 .04 .01 . . 3 19267 230 0 2300 .04 .08 . . 22 19632 231 1 2500 .09 .12 . . 22 19632 234 0 2500 .09 .02 . . 22 19632 270 0 2500 .09 .1 . . 22 19632 250 0 2500 .09 .03 . . 22 19632 120 0 2500 .09 .05 . . end format %tdnn/dd/CCYY Loan_date

In this example:

1- If I want to calculate VAR1 (portfolio health) at loan_ID=22,
- I need to identify loans where lender 231 is the lead-lender in the last three years (2012,2011,2010)
- Then calculate average profit of the borrowers in these loans.

2- If I want to calculate VAR2 (partners health) at loan_ID=22:
- I need to identify loans where lender 231 is the lead-lender in the last three years (2012,2011,2010)
- Identify the top 5 partners (most frequent non-lead-lenders) working with the same lead-lender (lender 231) over the past three years
- Calculate the average profit of these partners over the past three years

Thanks again!
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30170

19 Feb 2019, 09:08

OK. Thanks for the clarifications. I think I get it now.

Code:

//    CREATE A VARIABLE SHOWING WHO THE LEAD LENDER IS IN EACH LOAN
by Loan_ID (Lead_lender), sort: assert (Lead_lender == 1) == (_n == _N)
by Loan_ID (Lead_lender): gen who_lead = Lender_ID[_N]

//    CREATE A YEAR VARIABLE
gen loan_year = yofd(Loan_date)

//    CALCULATE PORTFOLIO HEALTH
gen llbp = cond(Lead_lender, Borrower_Profits, .)
rangestat (mean) portfolio_health = llbp, by(who_lead) ///
    interval(loan_year -3 -1)
drop llbp

//    CREATE A DATA SET OF TOP 5 PARTNERS FOR EACH LEAD LENDER
tempfile dataset
save `dataset'
keep who_lead Loan_ID Lender_ID loan_year Lender_Profits
drop if who_lead == Lender_ID
tempfile copy
save `copy'
rangejoin loan_year -3 -1 using `copy', by(who_lead)
drop Lender_ID
by who_lead loan_year Lender_ID_U, sort: gen n_loans = _N
by who_lead loan_year Lender_ID_U: keep if _n == 1
gsort who_lead loan_year -n_loans
by who_lead loan_year: keep if _n <= 5
keep who_lead loan_year Lender_ID_U Lender_Profits
drop if missing(Lender_ID_U)
rename Lender_ID_U Lender_ID
//    AND NOW GET AVERAGE LENDER PROFITS, REDUCING TO ONE OBSERVATION
//    PER LEADER PER YEAR
collapse (mean) partner_health = Lender_Profits, by(who_lead loan_year)

//    REUNITE THIS WITH THE ORIGINAL DATA
merge 1:m who_lead loan_year using `dataset', assert(match using) nogenerate
order partner_health, last
order loan_year, after(Loan_date)
order who_lead, after(Lead_lender)

Notes:

1. To run this code you must install -rangestat- and -rangejoin-. The latter is by Robert Picard, and the former by him, Nick Cox, and Roberto Ferrer. Both are available from SSC.

2. Your instructions leave open the possibility that when selecting the "top 5" partners, there will be a tie for 5th place. If that happens, this code will break this tie randomly and the results will not be reproducible from one run of the code to the next.

3. In addition to the two requested variables, this code creates two other variables which are indispensable for these calculations but will probably also prove useful to you for additional analyses: who_lead is the lender ID number for the lead lender on each loan, and loan_year is just the year extracted from the loan date.

Comment

Mo Gad

Join Date: Feb 2019

Posts: 16
#5

19 Feb 2019, 14:50

Thank you very much Clyde Schechter for your help in this. The code is spot on and works nicely.

2. Your instructions leave open the possibility that when selecting the "top 5" partners, there will be a tie for 5th place. If that happens, this code will break this tie randomly and the results will not be reproducible from one run of the code to the next.

You are right about this. Every time I run the code I get different results for the second variable. To avoid this issue, I would like to construct two variables as follows:

1- partner_health_1

- identify all partners with whom the lead_lender has worked at least twice in the past three years. Then get the average profit of these partners. If none of the partners have worked more than once I need this variable to be zero.

2- partner_health_2

- identify all partners with whom the lead_lender has worked in the past three years. Then get the average profit of these partners.

Thanks in advance
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30170

19 Feb 2019, 15:12

I believe the following does it:

Code:

//    CREATE A VARIABLE SHOWING WHO THE LEAD LENDER IS IN EACH LOAN
by Loan_ID (Lead_lender), sort: assert (Lead_lender == 1) == (_n == _N)
by Loan_ID (Lead_lender): gen who_lead = Lender_ID[_N]

//    CREATE A YEAR VARIABLE
gen loan_year = yofd(Loan_date)

//    CALCULATE PORTFOLIO HEALTH
gen llbp = cond(Lead_lender, Borrower_Profits, .)
rangestat (mean) portfolio_health = llbp, by(who_lead) ///
    interval(loan_year -3 -1)
drop llbp

//    CREATE A DATA SET ALL PARTNERS FOR EACH LEAD LENDER IN PAST 3 YEARS
tempfile dataset
save `dataset'
keep who_lead Loan_ID Lender_ID loan_year Lender_Profits
drop if who_lead == Lender_ID
tempfile copy
save `copy'
rangejoin loan_year -3 -1 using `copy', by(who_lead)
drop Lender_ID
by who_lead loan_year Lender_ID_U, sort: gen n_loans = _N
by who_lead loan_year Lender_ID_U: keep if _n == 1
keep who_lead loan_year Lender_ID_U Lender_Profits n_loans
drop if missing(Lender_ID_U)
rename Lender_ID_U Lender_ID
gen lender_profits_2 = cond(n_loans >= 2, Lender_Profits, .)
//    AND NOW GET AVERAGE LENDER PROFITS, REDUCING TO ONE OBSERVATION
//    PER LEADER PER YEAR
collapse (mean) partner_health_1 = lender_profits_2 ///
                partner_health_2 = Lender_Profits ///
                (max) n_loans, by(who_lead loan_year)
replace partner_health_1 = 0 if n_loans < 2
drop n_loans

//    REUNITE THIS WITH THE ORIGINAL DATA
merge 1:m who_lead loan_year using `dataset', assert(match using) nogenerate
order partner_health*, last
order loan_year, after(Loan_date)
order who_lead, after(Lead_lender)

Comment

Mo Gad

Join Date: Feb 2019
Posts: 16

20 Feb 2019, 05:07

Thank you very much Clyde Schechter for this very useful code!

I had to make a couple of changes:

1- for some reason rangejoin creates some duplicates (I couldn't figure out the reason). I had to drop the duplicates as follows:

Code:

rangejoin loan_year -3 -1 using `copy', by(who_lead)
duplicates drop Loan_ID Lender_ID_U loan_year_U,force

2- for the following part of the code, I think the focus should be on Lender_Profits_U not Lender_Profits, as follows:

Code:

gen lender_profits_2 = cond(n_loans >= 2, Lender_Profits_U, .)

//    AND NOW GET AVERAGE LENDER PROFITS, REDUCING TO ONE OBSERVATION
//    PER LEADER PER YEAR
collapse (mean) partner_health_1 = lender_profits_2 ///
                partner_health_2 = Lender_Profits_U ///
                (max) n_loans, by(who_lead loan_year)
replace partner_health_1 = 0 if n_loans < 2
drop n_loans

Thanks!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#8

20 Feb 2019, 13:25

Yes, you are correct on both counts. Thank you.
Comment

Announcement

loans/portfolio/partners

Comment

Comment

Comment

Comment

Comment

Comment

Comment