Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • loans/portfolio/partners

    Hi all,

    I have data that looks like the following
    Loan_ID Loan_date Lender_ID Lead_lender Borrower_ID Borrower_Profits Lender_Profits VAR1 VAR2
    1 31/10/2008 231 1 1100 0.1 0.12
    1 31/10/2008 234 0 1100 0.1 0.13
    1 31/10/2008 235 0 1100 0.1 0.1
    5 20/12/2009 150 0 3000 0.2 0.07
    5 20/12/2009 234 0 3000 0.2 0.11
    5 20/12/2009 231 1 3000 0.2 0.07
    5 20/12/2009 400 0 3000 0.2 0.09
    7 01/10/2010 120 0 4000 0.05 0.12
    7 01/10/2010 125 0 4000 0.05 0.14
    7 01/10/2010 240 1 4000 0.05 0.11
    7 01/10/2010 234 0 4000 0.05 0.1
    9 01/10/2011 231 1 1500 0.15 0.01
    9 01/10/2011 234 0 1500 0.15 0.2
    9 01/10/2011 250 0 1500 0.15 0.15
    3 01/10/2012 231 1 2300 0.1 0.13 0.175 0.136
    3 01/10/2012 234 0 2300 0.1 0.01 0.175 0.136
    3 01/10/2012 270 0 2300 0.1 0.09 0.175 0.136

    This data represents loan data where multiple lenders (identified by Lender_ID) are giving loans to one borrower (identified by Borrower_ID). For example, where Loan_ID=1 there are 3 different lenders providing credit to one borrower. In each loan, there is one lead lender (i.e. main lender) identified by Lead_lender column and the remaining lenders are partners. Now I want to construct two variables as follows:

    1. for each lead_lender, I need the average borrower_profits for all borrowers that the lead lender has been giving loans to over the past 3 years. For example, in the last loan (loan_id=3) the lead lender has an ID=231. This lender is also the lead lender in loan_id=9 and loan_id=5 in the past three years. So the average profits for his portfolio=(0.15+0.20) /2 = 0.175. See VAR1 column.

    2. for each lead_lender, I need the average partner lender_profits for lenders that are most frequently working with the lead lender (say top 5 by frequency) over the past 3 years. For example, in the last loan (loan_id=3) the lead lender has an ID=231. This lender is frequently working with lender_id=234. So the average profits for his partners=(0.20+0.10+0.11) / 3 = 0.136. See VAR2 column.

    Hope it is clear

    Thank you for help

    Gad

  • #2
    I'm confused. Lender 231 is the lead lender on loans 1, 3, 5, and 9. So why, in calculating average profits, are loans 1 and 3 excluded? Also, when lender 231 participates in a loan, it is always the lead in your example. If a lender participates as a non-lead in some loans, do you also want to include the profits on those loans in the average, or only those loans in which the lender is a lead?

    I'm similarly confused about the second variable. Lender 234 participates in loans 1, 3, 5, 7, and 9. Why do you not include loans 7 and 1 in the calculation?

    Also, since you want a calculation based on the top 5 partners, it would make sense to post example data where each lead lender (or at least 1 lead lender) actually has more than five different partners

    When you post back, please use the -dataex- command to show your example data, rather than typing out a table that is both difficult for the eye to read and a nuisance to import into Stata. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thank you Clyde Schechter . That is very helpful!

      I'm confused. Lender 231 is the lead lender on loans 1, 3, 5, and 9. So why, in calculating average profits, are loans 1 and 3 excluded?

      1 and 3 are excluded because I am looking at t-1,t-2 and t-3 only (i.e. the past three years from loan_date). This will help me to identify how healthy was the lead-lender portfolio of borrowers over the past 3 years.

      Also, when lender 231 participates in a loan, it is always the lead in your example. If a lender participates as a non-lead in some loans, do you also want to include the profits on those loans in the average, or only those loans in which the lender is a lead?

      I need only these cases where lender 231 was the lead-lender in the past three years from loan date. So if lender 231 was a partner in any of the previous three years it shouldn't be included.

      I'm similarly confused about the second variable. Lender 234 participates in loans 1, 3, 5, 7, and 9. Why do you not include loans 7 and 1 in the calculation?

      loan 7 should be included but loan 1 should not as it is t-4 and my focus is on t-1,t-2 and t-3.

      This will help me to identify how healthy was the lead-lender top 5 partners over the past 3 years from loan date.

      Since you want a calculation based on the top 5 partners, it would make sense to post example data where each lead lender actually has more than five different partners

      I did it, thanks! Please see the example below:


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte Loan_ID int(Loan_date Lender_ID) byte Lead_lender int Borrower_ID double(Borrower_Profits Lender_Profits) byte(VAR1 VAR2)
       1 17836 231 1 1100  .1 .12 . .
       1 17836 233 0 1100  .1  .1 . .
       1 17836 234 0 1100  .1 .05 . .
       1 17836 250 0 1100  .1 .13 . .
       1 17836 125 0 1100  .1 .12 . .
       1 17836 235 0 1100  .1  .1 . .
       5 18251 150 0 3000  .2 .07 . .
       5 18251 234 0 3000  .2 .11 . .
       5 18251 231 1 3000  .2 .07 . .
       5 18251 400 0 3000  .2 .09 . .
       7 18536 231 0 4000 .05 .12 . .
       7 18536 125 0 4000 .05 .14 . .
       7 18536 240 1 4000 .05 .11 . .
       7 18536 234 0 4000 .05  .1 . .
       9 18901 231 1 1500 .15 .01 . .
       9 18901 230 0 1500 .15 .07 . .
       9 18901 125 0 1500 .15 .11 . .
       9 18901 250 0 1500 .15 .02 . .
       9 18901 233 0 1500 .15 .03 . .
       9 18901 234 0 1500 .15  .2 . .
       9 18901 270 0 1500 .15 .15 . .
       3 19267 231 1 2300 .04 .13 . .
       3 19267 234 0 2300 .04 .01 . .
       3 19267 270 0 2300 .04 .09 . .
       3 19267 233 0 2300 .04  .1 . .
       3 19267 250 0 2300 .04 .01 . .
       3 19267 230 0 2300 .04 .08 . .
      22 19632 231 1 2500 .09 .12 . .
      22 19632 234 0 2500 .09 .02 . .
      22 19632 270 0 2500 .09  .1 . .
      22 19632 250 0 2500 .09 .03 . .
      22 19632 120 0 2500 .09 .05 . .
      end
      format %tdnn/dd/CCYY Loan_date
      In this example:

      1- If I want to calculate VAR1 (portfolio health) at loan_ID=22,
      - I need to identify loans where lender 231 is the lead-lender in the last three years (2012,2011,2010)
      - Then calculate average profit of the borrowers in these loans.

      2- If I want to calculate VAR2 (partners health) at loan_ID=22:
      - I need to identify loans where lender 231 is the lead-lender in the last three years (2012,2011,2010)
      - Identify the top 5 partners (most frequent non-lead-lenders) working with the same lead-lender (lender 231) over the past three years
      - Calculate the average profit of these partners over the past three years



      Thanks again!

      Comment


      • #4
        OK. Thanks for the clarifications. I think I get it now.

        Code:
        //    CREATE A VARIABLE SHOWING WHO THE LEAD LENDER IS IN EACH LOAN
        by Loan_ID (Lead_lender), sort: assert (Lead_lender == 1) == (_n == _N)
        by Loan_ID (Lead_lender): gen who_lead = Lender_ID[_N]
        
        //    CREATE A YEAR VARIABLE
        gen loan_year = yofd(Loan_date)
        
        //    CALCULATE PORTFOLIO HEALTH
        gen llbp = cond(Lead_lender, Borrower_Profits, .)
        rangestat (mean) portfolio_health = llbp, by(who_lead) ///
            interval(loan_year -3 -1)
        drop llbp
        
        //    CREATE A DATA SET OF TOP 5 PARTNERS FOR EACH LEAD LENDER
        tempfile dataset
        save `dataset'
        keep who_lead Loan_ID Lender_ID loan_year Lender_Profits
        drop if who_lead == Lender_ID
        tempfile copy
        save `copy'
        rangejoin loan_year -3 -1 using `copy', by(who_lead)
        drop Lender_ID
        by who_lead loan_year Lender_ID_U, sort: gen n_loans = _N
        by who_lead loan_year Lender_ID_U: keep if _n == 1
        gsort who_lead loan_year -n_loans
        by who_lead loan_year: keep if _n <= 5
        keep who_lead loan_year Lender_ID_U Lender_Profits
        drop if missing(Lender_ID_U)
        rename Lender_ID_U Lender_ID
        //    AND NOW GET AVERAGE LENDER PROFITS, REDUCING TO ONE OBSERVATION
        //    PER LEADER PER YEAR
        collapse (mean) partner_health = Lender_Profits, by(who_lead loan_year)
        
        //    REUNITE THIS WITH THE ORIGINAL DATA
        merge 1:m who_lead loan_year using `dataset', assert(match using) nogenerate
        order partner_health, last
        order loan_year, after(Loan_date)
        order who_lead, after(Lead_lender)
        Notes:

        1. To run this code you must install -rangestat- and -rangejoin-. The latter is by Robert Picard, and the former by him, Nick Cox, and Roberto Ferrer. Both are available from SSC.

        2. Your instructions leave open the possibility that when selecting the "top 5" partners, there will be a tie for 5th place. If that happens, this code will break this tie randomly and the results will not be reproducible from one run of the code to the next.

        3. In addition to the two requested variables, this code creates two other variables which are indispensable for these calculations but will probably also prove useful to you for additional analyses: who_lead is the lender ID number for the lead lender on each loan, and loan_year is just the year extracted from the loan date.

        Comment


        • #5
          Thank you very much Clyde Schechter for your help in this. The code is spot on and works nicely.

          2. Your instructions leave open the possibility that when selecting the "top 5" partners, there will be a tie for 5th place. If that happens, this code will break this tie randomly and the results will not be reproducible from one run of the code to the next.

          You are right about this. Every time I run the code I get different results for the second variable. To avoid this issue, I would like to construct two variables as follows:

          1- partner_health_1

          - identify all partners with whom the lead_lender has worked at least twice in the past three years. Then get the average profit of these partners. If none of the partners have worked more than once I need this variable to be zero.

          2- partner_health_2

          - identify all partners with whom the lead_lender has worked in the past three years. Then get the average profit of these partners.



          Thanks in advance

          Comment


          • #6
            I believe the following does it:

            Code:
            //    CREATE A VARIABLE SHOWING WHO THE LEAD LENDER IS IN EACH LOAN
            by Loan_ID (Lead_lender), sort: assert (Lead_lender == 1) == (_n == _N)
            by Loan_ID (Lead_lender): gen who_lead = Lender_ID[_N]
            
            //    CREATE A YEAR VARIABLE
            gen loan_year = yofd(Loan_date)
            
            //    CALCULATE PORTFOLIO HEALTH
            gen llbp = cond(Lead_lender, Borrower_Profits, .)
            rangestat (mean) portfolio_health = llbp, by(who_lead) ///
                interval(loan_year -3 -1)
            drop llbp
            
            //    CREATE A DATA SET ALL PARTNERS FOR EACH LEAD LENDER IN PAST 3 YEARS
            tempfile dataset
            save `dataset'
            keep who_lead Loan_ID Lender_ID loan_year Lender_Profits
            drop if who_lead == Lender_ID
            tempfile copy
            save `copy'
            rangejoin loan_year -3 -1 using `copy', by(who_lead)
            drop Lender_ID
            by who_lead loan_year Lender_ID_U, sort: gen n_loans = _N
            by who_lead loan_year Lender_ID_U: keep if _n == 1
            keep who_lead loan_year Lender_ID_U Lender_Profits n_loans
            drop if missing(Lender_ID_U)
            rename Lender_ID_U Lender_ID
            gen lender_profits_2 = cond(n_loans >= 2, Lender_Profits, .)
            //    AND NOW GET AVERAGE LENDER PROFITS, REDUCING TO ONE OBSERVATION
            //    PER LEADER PER YEAR
            collapse (mean) partner_health_1 = lender_profits_2 ///
                            partner_health_2 = Lender_Profits ///
                            (max) n_loans, by(who_lead loan_year)
            replace partner_health_1 = 0 if n_loans < 2
            drop n_loans
            
            //    REUNITE THIS WITH THE ORIGINAL DATA
            merge 1:m who_lead loan_year using `dataset', assert(match using) nogenerate
            order partner_health*, last
            order loan_year, after(Loan_date)
            order who_lead, after(Lead_lender)

            Comment


            • #7
              Thank you very much Clyde Schechter for this very useful code!

              I had to make a couple of changes:

              1- for some reason rangejoin creates some duplicates (I couldn't figure out the reason). I had to drop the duplicates as follows:

              Code:
              rangejoin loan_year -3 -1 using `copy', by(who_lead)
              duplicates drop Loan_ID Lender_ID_U loan_year_U,force
              2- for the following part of the code, I think the focus should be on Lender_Profits_U not Lender_Profits, as follows:

              Code:
              gen lender_profits_2 = cond(n_loans >= 2, Lender_Profits_U, .)
              
              //    AND NOW GET AVERAGE LENDER PROFITS, REDUCING TO ONE OBSERVATION
              //    PER LEADER PER YEAR
              collapse (mean) partner_health_1 = lender_profits_2 ///
                              partner_health_2 = Lender_Profits_U ///
                              (max) n_loans, by(who_lead loan_year)
              replace partner_health_1 = 0 if n_loans < 2
              drop n_loans

              Thanks!

              Comment


              • #8
                Yes, you are correct on both counts. Thank you.

                Comment

                Working...
                X