Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lag Values Excluding Observation

    Hi,

    First, I am sorry for not being able to find a better title than this. But my problem is a little more complicated. The data example below shows directors (with unique director ids) belonging to firms (with unique firm ids). Fiscal years are denoted by fyear while tsize, ind, city, and perf are size tercile the company belongs to, its industry code, city its headquarters are located at, and its past performance, respectively. I am trying to find the determinants of director hire. Specifically, I want to know how director's positions at other firms in the last year influence the probability of his selection in the current company (assuming the director is newly hired). For this reason, I need to compute whether, in the previous year, the director has served in a company of same size etc. I need to do that separately for industry and city. For performance, I have to calculate the average performance of all companies the director served on in the previous year.

    Here's the example dataset.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(gvkey fyear directorid) byte(tsize ind) str4 city float perf
    1001 2000 12345 2 12 "Hel"    .2
    1001 2000   123 2 12 "Hel"    .3
    1001 2000  4531 2 12 "Hel"    .1
    1002 2000    98 3 15 "Tamp"   .1
    1002 2000 12345 3 15 "Tamp"  .05
    1002 2000   123 3 15 "Tamp"  .04
    1003 2000   100 1  9 "Hel"   .12
    1003 2000    98 1  9 "Hel"   .02
    1003 2000   109 1  9 "Hel"  -.01
    1001 2001 12345 2 12 "Hel"    .2
    1001 2001   123 2 12 "Hel"    .3
    1001 2001  4531 2 12 "Hel"    .1
    1002 2001    98 3 15 "Tamp"   .1
    1002 2001 12345 3 15 "Tamp"  .05
    1002 2001   123 3 15 "Tamp"  .04
    1003 2001   100 1  9 "Hel"   .12
    1003 2001    98 1  9 "Hel"   .02
    1003 2001   109 1  9 "Hel"  -.01
    end

  • #2
    Thank you for using -dataex- on your first post!

    I believe you want the following:
    Code:
    tempfile copy
    save `copy'
    
    //    PAIR UP EACH OBSERVATION WITH ALL OF THE SAME DIRECTOR'S OBSERVATIONS FROM
    //    THE PRECEDING YEAR
    rangejoin fyear -1 -1 using `copy', by(directorid) prefix(prior_)
    
    //    DETERMINE WHETHER ANY WERE IN THE SAME INDUSTRY OR FIRM OF SAME SIZE TERCILE
    by directorid fyear, sort: egen byte same_industry = max(ind == prior_ind) if !missing(prior_ind)
    by directorid fyear: egen byte same_size = max(tsize == prior_tsize)if !missing(prior_tsize)
    
    //    CALCULATE MEAN PERFORMANCE IN PRIOR YEAR
    by directorid fyear: egen mean_prior_perf = mean(prior_perf)
    
    //    REDUCE TO ONE OBSERVATION PER DIRECTOR PER YEAR
    drop prior_*
    by directorid fyear: keep if _n == 1
    -rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    Comment


    • #3
      Thank you so much Clyde. This works like a charm. One more question, how would I assign 1,000 random directors (not picked) to every director who was picked. I have a big list of all directors along with their information and want to know how to pair the picked director with 1,000 random directors and then run a logit model.

      Comment


      • #4
        How large is the file of directors? How many are picked? How many are unpicked?

        Comment


        • #5
          The list of picked directors is around 80,000 for 14 years but for each director selected, I have to assign 1,000 other directors randomly from the 80,000 as not picked ones. I am trying to replicate a paper which uses this technique to proxy for other potential candidates that could've been picked but weren't.

          Comment


          • #6
            I don't understand. There are 80,000 directors that are "picked" and the same list is also the "not picked" directors? I don't get it at all.

            And in any case, you do realize that when you start with 80,000 picked directors and you match each of them with 1,000 other directors (regardless of where those come from), you end up with 80 million pairs of directors. Are you sure that is what you want?

            Comment


            • #7
              Sorry for the confusion here. There are 80,000 new directors who are available to be picked. Some are picked by one firm and some by other. I have to retain the director who was chosen and randomly add 1,000 directors from the 79,999 left as "unpicked" directors. It will lead to an 80 million pairs and that's exactly what I am looking for.

              Comment


              • #8
                So, you will potentially run up against two problems here: memory issues and speed issues. So the first thing is to strip down the data set to just the director_id's. Later you can merge the results back to the original full data set to get the other variables.

                The following code illustrates how you can do it. I have created a toy data set with just 8000 directors to demonstrate the code. (I don't have the patience to run it with 80,000.) You would actually begin by using your real data set, keeping only the directorid variable, dropping any duplicates and then running this code from where it says "MATCHING BEGINS HERE."

                Code:
                //    CREATE TOY DATA SET
                clear*
                set obs 8000
                
                gen long directorid = _n
                
                //    MATCHING BEGINS HERE
                tempfile copy
                rename directorid unpicked
                save `copy'
                
                rename unpicked picked
                gen filename = `"`copy'"'
                
                set seed 1234
                
                capture program drop one_match
                program define one_match
                    local copy = filename[1]
                    cross using `copy'
                    drop if picked == unpicked
                    gen double shuffle = runiform()
                    sort shuffle
                    keep in 1/1000
                    drop shuffle
                    exit
                end
                
                runby one_match, by(picked) status    
                drop filename
                At no time will the data in memory be appreciably larger than just 80,000,000 observations, and those observations will each only contain two longs and a double. As for running time, with 8,000 observations it takes a bit under 2 minutes on my nothing-special computer. Since the execution time should go up as the square of the number of observations, I would expect this to take around 3 hours with 80,000 observations. The -status- option on -runby- will cause Stata to give you ongoing progress reports, so you won't have to wonder whether your computer is hung or is still working, and the progress report also gives you an estimate of the time remaining.

                -runby- is written by Robert Picard and me, and is available from SSC.

                Comment


                • #9
                  Thank you so much once again, this works great. What would I have to do to make the picked and unpicked appear as one variable taking the value of 1 and 0, receptively.

                  Comment


                  • #10
                    I don't follow what you want here. Right now you have a data set where each of the 80,000 picked directorid's appears in one variable in a block of 1000 observations--the other variable in the data set is the id of the matched unpicked director. It appears that you want to somehow stack them into one variable with a new variable that indicates picked vs unpicked status. But if you do that, you will lose the information about which picked director a given unpicked one is actually paired with. What am I missing here? Can you maybe show what the result you want would look like?

                    Comment


                    • #11
                      The data has a couple of other variables. Namely, company code and fiscal year. I want to create a dummy equal to 1 if the firm picks the director and 0 otherwise. By definition the firm will pick a director and I will have 1,000 directors unpicked for each company.

                      Comment


                      • #12
                        Will something like this work:

                        Code:
                        capture program drop one_match
                        program define one_match
                            local copy = filename[1]
                            cross using `copy'
                            gen double shuffle = runiform()
                            sort shuffle
                            keep in 1/1001
                            drop shuffle
                            exit
                        end
                        
                        gen dummy=(picked==unpicked)
                        
                        end

                        Comment


                        • #13
                          I have divided the data by fiscal year to make things work more smoothly. Here's an example for 2005.

                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input str6 gvkey float fyear double directorid
                          "001004" 2005 346677
                          "001004" 2005 487132
                          "001004" 2005 320199
                          "164506" 2005  63027
                          "164506" 2005  51772
                          "164506" 2005  51647
                          "164506" 2005 374419
                          "164506" 2005  51697
                          "164506" 2005 374415
                          "164506" 2005 374410
                          "164506" 2005 374412
                          "164506" 2005   2455
                          "065569" 2005 334583
                          "021686" 2005 347086
                          "061523" 2005 345630
                          "011903" 2005  14468
                          "001449" 2005 333115
                          "025180" 2005  36502
                          "011361" 2005 449622
                          "004090" 2005  91707
                          "004090" 2005 373195
                          "065576" 2005  65022
                          "065072" 2005 443707
                          "144520" 2005  89327
                          "061407" 2005 370434
                          "061407" 2005 347235
                          "162076" 2005  32197
                          "162076" 2005 372988
                          "029478" 2005 444680
                          "029478" 2005 349916
                          "030584" 2005 204185
                          "030584" 2005 335869
                          "009899" 2005  51246
                          "009899" 2005  33072
                          "009899" 2005  36684
                          "009899" 2005  36525
                          "029274" 2005 200742
                          "022308" 2005 343872
                          "182309" 2005  41971
                          "147101" 2005 516078
                          "147101" 2005 516076
                          "142156" 2005 372911
                          "142156" 2005 372903
                          "061253" 2005 141369
                          "061253" 2005 339432
                          "110760" 2005 347143
                          "063643" 2005 350093
                          "063643" 2005  51386
                          "013619" 2005  36560
                          "013619" 2005 140728
                          "121553" 2005 443657
                          "001081" 2005 331921
                          "141846" 2005 203582
                          "141846" 2005 349883
                          "129632" 2005 374956
                          "014999" 2005 324239
                          "014999" 2005 442517
                          "001094" 2005 446200
                          "032459" 2005 510359
                          "032459" 2005 510558
                          "031564" 2005  33949
                          "124374" 2005 346820
                          "230796" 2005 342542
                          "180405" 2005 342489
                          "001706" 2005 331619
                          "001706" 2005 141698
                          "014753" 2005  57662
                          "014753" 2005 310278
                          "001119" 2005  57662
                          "001119" 2005 310278
                          "163628" 2005 142264
                          "163628" 2005 483715
                          "061726" 2005 488605
                          "061726" 2005 492115
                          "061726" 2005 349624
                          "161037" 2005 180752
                          "012540" 2005  37206
                          "012540" 2005  49351
                          "012540" 2005 182352
                          "133724" 2005 271624
                          "003905" 2005  34718
                          "003905" 2005 326472
                          "003905" 2005 440347
                          "124934" 2005 347595
                          "124934" 2005 347606
                          "124934" 2005 347597
                          "124934" 2005 347599
                          "030576" 2005 204208
                          "030576" 2005 490428
                          "030576" 2005 204217
                          "161067" 2005 349769
                          "161067" 2005 451616
                          "145977" 2005 206315
                          "145977" 2005 206327
                          "002831" 2005 596717
                          "002831" 2005 596712
                          "002831" 2005 596719
                          "002831" 2005 596720
                          "163679" 2005  61752
                          "023616" 2005 347841
                          end

                          Comment


                          • #14
                            OK. So here's some code, again starting with a toy data set. Use your actual data and go from "DATA MANAGEMENT BEGINS HERE"--although you may need to modify the -keep- commands so as to retain other important variables. The toy data has 2 firms with 5 years of data, and in each firm-year combination there is 1 picked director matched to 3 unpicked.

                            Code:
                            //    CREATE TOY DATA SET TO DEMONSTRATE CODE
                            clear*
                            tempfile controls
                            set obs 10
                            gen unpicked = _n
                            save `controls'
                            clear
                            set obs 10
                            gen firm_id = (_n <= 5)
                            by firm_id, sort: gen year = 2000+_n
                            gen picked = _n
                            cross using `controls'
                            drop if picked == unpicked
                            set seed 1234
                            gen double shuffle = runiform()
                            by firm_id year (shuffle), sort: keep if _n <= 3
                            drop shuffle
                            
                            //    DATA MANAGEMENT BEGINS HERE
                            preserve
                            keep firm_id year unpicked
                            rename unpicked directorid
                            gen byte selection = 0
                            tempfile unpicked
                            save `unpicked'
                            
                            restore
                            keep firm_id year picked
                            duplicates drop
                            rename picked directorid
                            gen byte selection = 1
                            append using `unpicked'
                            sort firm_id year selection directorid

                            Comment


                            • #15
                              Thank you so much Clyde. I am all set now.

                              Comment

                              Working...
                              X