Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a Reciprocal Dummy

    Hi all,

    I want to generate a dummy variable where over the past three years the the lead bank was a participant in a loan led by one of the current participants.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(facilityid companyid participantid) int datayear byte period
    71268  5978  5905 2000 1
    71268  5978 27454 2000 1
    71268  5978 33509 2000 1
    71268  5978  6001 2000 1
    71268  5978 29986 2000 1
    71590  6031 26335 2000 1
    71590  6031  5972 2000 1
    71590  6031  6532 2000 1
    71590  6031 33878 2000 1
    71590  6031 26175 2000 1
    71590  6031  6443 2000 1
    71590  6031 34769 2000 1
    71590  6031  7838 2000 1
    71590  6031  8861 2000 1
    71590  6031  8834 2000 1
    71590  6031  9849 2000 1
    71590  6031  8926 2000 1
    71590  6031 23679 2000 1
    71590  6031 23087 2000 1
    71590  6031  5851 2000 1
    71590  6031  7861 2000 1
    71590  6031  8857 2000 1
    71590  6031 23222 2000 1
    71590  6031 30898 2000 1
    71590  6031 29380 2000 1
    71590  6031  7829 2000 1
    71590  6031  7855 2000 1
    71590  6031  7922 2000 1
    71590  6031  5893 2000 1
    71592  6031  6443 2000 1
    71592  6031  9849 2000 1
    71592  6031 23222 2000 1
    71592  6031 33878 2000 1
    71592  6031  7855 2000 1
    71592  6031  5851 2000 1
    71592  6031 26175 2000 1
    71592  6031 34769 2000 1
    71592  6031 26335 2000 1
    71592  6031 23679 2000 1
    71592  6031  8857 2000 1
    71592  6031 29380 2000 1
    71592  6031  5893 2000 1
    71592  6031  8926 2000 1
    71592  6031  7829 2000 1
    71592  6031  8861 2000 1
    71592  6031 23087 2000 1
    71592  6031  6532 2000 1
    71592  6031  8834 2000 1
    71592  6031  7861 2000 1
    71592  6031  7838 2000 1
    71592  6031  9197 2000 1
    71592  6031  5972 2000 1
    71592  6031 30898 2000 1
    71592  6031  7922 2000 1
    71592  6031 84644 2000 1
    73007  7827 12316 2000 1
    73007  7827  5851 2000 1
    73007  7827 21976 2000 1
    73007  7827  6827 2000 1
    73567  7827 12316 2000 1
    73567  7827  5851 2000 1
    73567  7827  6827 2000 1
    73567  7827 21976 2000 1
    75031 24010  6443 2000 1
    75031 24010 32641 2000 1
    75031 24010 33982 2000 1
    75032 24010 32641 2000 1
    75032 24010 33982 2000 1
    75032 24010  6179 2000 1
    75032 24010  6443 2000 1
    75596  6179  7873 2000 1
    75596  6179 30898 2000 1
    75596  6179  6123 2000 1
    75596  6179  7827 2000 1
    75596  6179  7878 2000 1
    75596  6179  6532 2000 1
    75596  6179  7855 2000 1
    75596  6179  1468 2000 1
    75596  6179 84644 2000 1
    75596  6179  5893 2000 1
    75596  6179 29112 2000 1
    75596  6179  5905 2000 1
    75596  6179 30309 2000 1
    75596  6179  7860 2000 1
    75596  6179  8873 2000 1
    75596  6179 18874 2000 1
    75596  6179  8857 2000 1
    75596  6179 17337 2000 1
    75596  6179  7861 2000 1
    75596  6179  6089 2000 1
    75596  6179  5848 2000 1
    75596  6179  5889 2000 1
    75596  6179  5851 2000 1
    75596  6179  7845 2000 1
    75596  6179  6895 2000 1
    75596  6179  8834 2000 1
    75596  6179  7894 2000 1
    75596  6179  9141 2000 1
    75596  6179  6050 2000 1
    75598  5893 18642 2000 1
    end
    Companyid Participantid
    5978 5905
    So company 5978 has a link with 5905, but I am wondering if in a period of three years 5905 also created a link with 5978.

    I already created the three year period in the following way:

    Code:
    gen period=1 if inrange(datayear, 2000, 2002)
    replace period=2 if inrange(datayear, 2001, 2003)
    replace period=3 if inrange(datayear, 2002, 2004)
    replace period=4 if inrange(datayear, 2003, 2005)
    replace period=5 if inrange(datayear, 2004, 2006)
    replace period=6 if inrange(datayear, 2005, 2007)
    replace period=7 if inrange(datayear, 2006, 2008)
    replace period=8 if inrange(datayear, 2007, 2009)
    replace period=9 if inrange(datayear, 2008, 2010)
    replace period=10 if inrange(datayear, 2009, 2011)
    replace period=11 if inrange(datayear, 2010, 2012)
    replace period=12 if inrange(datayear, 2011, 2013)
    replace period=13 if inrange(datayear, 2012, 2014)
    replace period=14 if inrange(datayear, 2013, 2015)
    I followed with creating variables where the lead and participant are matched, and the other way around:

    Code:
    egen leaderparticipantid = concat(companyid participantid), punct(,)
    egen participantleaderid = concat(participantid companyid), punct(,)
    In short: I want to create a variable where i can find whether the leader has been a participant, and if in this case the lead bank has been a participant in one of his loans.
    Example:

    leadparticipantid participantleaderid dummyvariable Explaination
    1,2 2,1 1
    1,3 3,2 0
    2,4 4,3 1 Here is 1, because it matches
    3,4 in the leadpart variable
    2,6 6,2 0
    3,4 4,3 1

    This is of course a very simplified example. If this is a success I still need to find if this holds for certain periods.

    I tried using Vlookup, but that wasn't a success, at least with the knowledge I have.
    Furthermore, if I could swap the numbers in the participantleaderid around it would make life a lot easier.

    I hope you are able to help me out!

    Kind regards,

    David de Vos


  • #2
    I'm not sure I understand what you want. I think what you are looking for is this: look at each observation's company id and participant id. Search the rest of the data for that period and indicate whether or not there is another observation with the same two id's but in reverse order.



    Note that in your example data, there aren't actually instances of this, so the result is always 0. (Which is one reason why I'm not sure I've understood what you want.)

    Note also that your code for generating the period variable is both unwieldy and almost certainly wrong. Think about what happens to an observation with year = 2003. With the first -gen- command you get period set to missing value. The first -replace- command finds 2003 within range, so period now gets set to 1. The second -replace- command also finds 2003 within range, so period now gets reset to 2. The third replace command also finds 2003 within range, so period is now reset to 3. The fourth and following, do not find 2003 within range, so period remains 3. In short, this lengthy code is just a long-winded way of saying -gen period = year - 2000-, which, I'm pretty sure is not really what you want.

    If I have correctly understood what you want, the following will do it:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(facilityid companyid participantid) int datayear
    71268  5978  5905 2000
    71268  5978 27454 2000
    71268  5978 33509 2000
    71268  5978  6001 2000
    71268  5978 29986 2000
    71590  6031 26335 2000
    71590  6031  5972 2000
    71590  6031  6532 2000
    71590  6031 33878 2000
    71590  6031 26175 2000
    71590  6031  6443 2000
    71590  6031 34769 2000
    71590  6031  7838 2000
    71590  6031  8861 2000
    71590  6031  8834 2000
    71590  6031  9849 2000
    71590  6031  8926 2000
    71590  6031 23679 2000
    71590  6031 23087 2000
    71590  6031  5851 2000
    71590  6031  7861 2000
    71590  6031  8857 2000
    71590  6031 23222 2000
    71590  6031 30898 2000
    71590  6031 29380 2000
    71590  6031  7829 2000
    71590  6031  7855 2000
    71590  6031  7922 2000
    71590  6031  5893 2000
    71592  6031  6443 2000
    71592  6031  9849 2000
    71592  6031 23222 2000
    71592  6031 33878 2000
    71592  6031  7855 2000
    71592  6031  5851 2000
    71592  6031 26175 2000
    71592  6031 34769 2000
    71592  6031 26335 2000
    71592  6031 23679 2000
    71592  6031  8857 2000
    71592  6031 29380 2000
    71592  6031  5893 2000
    71592  6031  8926 2000
    71592  6031  7829 2000
    71592  6031  8861 2000
    71592  6031 23087 2000
    71592  6031  6532 2000
    71592  6031  8834 2000
    71592  6031  7861 2000
    71592  6031  7838 2000
    71592  6031  9197 2000
    71592  6031  5972 2000
    71592  6031 30898 2000
    71592  6031  7922 2000
    71592  6031 84644 2000
    73007  7827 12316 2000
    73007  7827  5851 2000
    73007  7827 21976 2000
    73007  7827  6827 2000
    73567  7827 12316 2000
    73567  7827  5851 2000
    73567  7827  6827 2000
    73567  7827 21976 2000
    75031 24010  6443 2000
    75031 24010 32641 2000
    75031 24010 33982 2000
    75032 24010 32641 2000
    75032 24010 33982 2000
    75032 24010  6179 2000
    75032 24010  6443 2000
    75596  6179  7873 2000
    75596  6179 30898 2000
    75596  6179  6123 2000
    75596  6179  7827 2000
    75596  6179  7878 2000
    75596  6179  6532 2000
    75596  6179  7855 2000
    75596  6179  1468 2000
    75596  6179 84644 2000
    75596  6179  5893 2000
    75596  6179 29112 2000
    75596  6179  5905 2000
    75596  6179 30309 2000
    75596  6179  7860 2000
    75596  6179  8873 2000
    75596  6179 18874 2000
    75596  6179  8857 2000
    75596  6179 17337 2000
    75596  6179  7861 2000
    75596  6179  6089 2000
    75596  6179  5848 2000
    75596  6179  5889 2000
    75596  6179  5851 2000
    75596  6179  7845 2000
    75596  6179  6895 2000
    75596  6179  8834 2000
    75596  6179  7894 2000
    75596  6179  9141 2000
    75596  6179  6050 2000
    75598  5893 18642 2000
    12345  18642 5893 2000
    end
    
    // gen byte wanted = .
    // forvalues i = 1/`=_N' {
    //     count if participantid == companyid[`i'] & companyid == participantid[`i'] ///
    //         & datayear - datayear[`i'] < 3
    //     replace wanted = `r(N)' > 0 in `i'
    // }
    //
    
    rename (companyid participantid) (participantid companyid)
    tempfile copy
    save `copy'
    
    rename (companyid participantid) (participantid companyid)
    
    rangejoin datayear 0 2 using `copy', by(companyid participantid)
    gen byte wanted = !missing(datayear_U)
    drop *_U
    Note: I have added an extra fake observation at the end of your -dataex- so that there will be at least one instance of the phenomenon you are looking for. I have also removed your period variable; right or wrong, it plays no role in the solution.

    Added: To run this code you will need the -rangejoin- command, by Robert Picard, available from SSC. To use -rangejoin- you will also need the -rangestat- command, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    Comment


    • #3
      1- How come, if use the -rangejoin- function the amount of observations goes from 51000 tot 81000?
      -rangejoin- pairs up each observation with all of the "reciprocal" observations within the three year period. So the more matches you have, the larger the data set will get. In your example data, there actually were no matches at all, so as I explained in #2, I added in one match to the data just for demonstration purposes. That didn't expand the data set. So I forgot about the fact that multiple matches for an observation would result in expansion. You can fix that with the following code (modifications to original in italics.)

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double(facilityid companyid participantid) int datayear
      71268  5978  5905 2000
      71268  5978 27454 2000
      71268  5978 33509 2000
      71268  5978  6001 2000
      71268  5978 29986 2000
      71590  6031 26335 2000
      71590  6031  5972 2000
      71590  6031  6532 2000
      71590  6031 33878 2000
      71590  6031 26175 2000
      71590  6031  6443 2000
      71590  6031 34769 2000
      71590  6031  7838 2000
      71590  6031  8861 2000
      71590  6031  8834 2000
      71590  6031  9849 2000
      71590  6031  8926 2000
      71590  6031 23679 2000
      71590  6031 23087 2000
      71590  6031  5851 2000
      71590  6031  7861 2000
      71590  6031  8857 2000
      71590  6031 23222 2000
      71590  6031 30898 2000
      71590  6031 29380 2000
      71590  6031  7829 2000
      71590  6031  7855 2000
      71590  6031  7922 2000
      71590  6031  5893 2000
      71592  6031  6443 2000
      71592  6031  9849 2000
      71592  6031 23222 2000
      71592  6031 33878 2000
      71592  6031  7855 2000
      71592  6031  5851 2000
      71592  6031 26175 2000
      71592  6031 34769 2000
      71592  6031 26335 2000
      71592  6031 23679 2000
      71592  6031  8857 2000
      71592  6031 29380 2000
      71592  6031  5893 2000
      71592  6031  8926 2000
      71592  6031  7829 2000
      71592  6031  8861 2000
      71592  6031 23087 2000
      71592  6031  6532 2000
      71592  6031  8834 2000
      71592  6031  7861 2000
      71592  6031  7838 2000
      71592  6031  9197 2000
      71592  6031  5972 2000
      71592  6031 30898 2000
      71592  6031  7922 2000
      71592  6031 84644 2000
      73007  7827 12316 2000
      73007  7827  5851 2000
      73007  7827 21976 2000
      73007  7827  6827 2000
      73567  7827 12316 2000
      73567  7827  5851 2000
      73567  7827  6827 2000
      73567  7827 21976 2000
      75031 24010  6443 2000
      75031 24010 32641 2000
      75031 24010 33982 2000
      75032 24010 32641 2000
      75032 24010 33982 2000
      75032 24010  6179 2000
      75032 24010  6443 2000
      75596  6179  7873 2000
      75596  6179 30898 2000
      75596  6179  6123 2000
      75596  6179  7827 2000
      75596  6179  7878 2000
      75596  6179  6532 2000
      75596  6179  7855 2000
      75596  6179  1468 2000
      75596  6179 84644 2000
      75596  6179  5893 2000
      75596  6179 29112 2000
      75596  6179  5905 2000
      75596  6179 30309 2000
      75596  6179  7860 2000
      75596  6179  8873 2000
      75596  6179 18874 2000
      75596  6179  8857 2000
      75596  6179 17337 2000
      75596  6179  7861 2000
      75596  6179  6089 2000
      75596  6179  5848 2000
      75596  6179  5889 2000
      75596  6179  5851 2000
      75596  6179  7845 2000
      75596  6179  6895 2000
      75596  6179  8834 2000
      75596  6179  7894 2000
      75596  6179  9141 2000
      75596  6179  6050 2000
      75598  5893 18642 2000
      12345  18642 5893 2000
      end
      
      gen long obs_no = _n
      rename (companyid participantid) (participantid companyid)
      tempfile copy
      save `copy'
      
      rename (companyid participantid) (participantid companyid)
      
      rangejoin datayear 0 2 using `copy', by(companyid participantid)
      gen byte wanted = !missing(datayear_U)
      drop *_U
      by obs_no, sort: keep if _n == 1
      This will bring you back down to the original number of observations.

      2- In order to gen the second byte wanted:

      Code:
      gen byte wanted = !missing(datayear_U)
      do I have to drop the first generate byte wanted in the `copy' file? (Because if I do not, I get " variable wanted already defined")
      Please re-read my response in #2. I stated there quite clearly that I modified your data example to remove the wanted variable that was in there. That was done precisely because you can't -generate- a variable that already exists, and the existing variable was of no use. So you not only have to remove it in the `copy' file, you have to remove it altogether.

      Furthermore, what do you recommend to use for the period problem?
      The "period" does not actually have any relevance to this calculation. You are looking to match up observations when the date in one is within three years of the other. The -datayear 0 2- part of the -rangejoin- command takes care of that for you much more simply than any kind of period variable would. Your problem does not benefit from defining fixed 3-year periods like 2000-2002 etc because, for example, you are not looking to match observations dated between 2000 and 2002 with each other: you want to match 2000 with 2000, 2001, and 2002. But you want to match 2001 with 2001, 2002, and 2003, but not with 2000. The fixed interval 2000-2002 is only applicable to observations dated 2000, and the way -rangejoin- works, you don't need to specify that with variables identifying a three year period to which the observation belongs.

      Comment


      • #4
        Code:
        gen long obs_no = _n
        rangestat (count) obs_no, by(companyid borrowercompanyid) interval(datayear -3 -1)
        replace obs_no_count = 0 if missing(obs_no_count)
        Note: This assumes that "past three years" means from three years ago through one year ago, excluding the current year. If you want from two years ago through current year, change -3 -1 to -2 0.

        Comment

        Working...
        X