Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to drop random years from panel data?

    I have a panel data set, consisting of 125 countries, 36 years. I want to run an IV regression multible times and randomly drop 5 (of the 36 years) each time.

    At the moment (see code below), I run a loop 20 times and each round I drop the last year. How can I drop 5 random years each round (and put them back before the next round)?

    I had a look at bootstrapp, but it doesn't seem to work here, since the variable fadum_avg and other controls including it have to be newly calculated in each round of the loop.

    Code:
    postfile buffer coeff standart using mcs, replace
    
    forval x=0/20{
        tsset obs year
        drop if year>(2006-`x')
        bysort risocode: egen fadum_avg=mean(fadum)
        
        foreach x of varlist all_Precip_jan-all_Precip_dec all_Temp_jan-all_Temp_dec{
            drop `x'_faavg
            gen `x'_faavg=`x'*fadum_avg
        }
        tsset obs year
        gen instrument=l.US_wheat_production*fadum_avg
        bysort risocode: egen ln_rgdpch_avg=mean(ln_rgdpch)
            
        gen oil_fadum_avg=oil_price_2011_USD*fadum_avg
        gen US_income_fadum_avg=USA_ln_income*fadum_avg
        gen US_democ_pres_fadum_avg=US_president_democ*fadum_avg
        
        local US_controls "oil_fadum_avg US_income_fadum_avg US_democ_pres_fadum_avg"
        local weather_controls "all_Precip_jan-all_Precip_dec all_Temp_jan-all_Temp_dec all_Precip_jan_faavg-all_Precip_dec_faavg all_Temp_jan_faavg-all_Temp_dec_faavg"    
        
        sor risocode year
        
        ivreg2 intra_state (wheat_aid=instrument) c.recipient_pc_cereals_prod_avg#i.year c.cereal_pc_import_quantity_avg#i.year c.ln_rgdpch_avg#i.year ///
        c.real_us_nonfoodaid_ecaid_avg#i.year c.real_usmilaid_avg#i.year `weather_controls' `US_controls' i.year##i.wb_region_n i.risocode_n ,cluster(risocode) ffirst
        outreg2 wheat_aid using "T2_TESTneu5.xls", se noast nocons lab dec(5) adds(KP F-Stat, e(rkf))
        
        post buffer (_b[wheat_aid]) (_se[wheat_aid])
        
        //droppen
        drop ln_rgdpch_avg
        drop instrument
        drop fadum_avg
        drop oil_fadum_avg
        drop US_income_fadum_avg
        drop US_democ_pres_fadum_avg
    }
    
    postclose buffer

  • #2
    See whether -sample- cannot do what you want. E.g., from the documentation:

    sample 50, count by(sex) would draw a sample of size 50 for men and 50 for women.

    Comment


    • #3
      Thank you Joro, I know about -sample- but I would not know how to use it in this context. How would I be about to randomly select 5 years with -sample-?
      As far as I know, it only rondomly selects observations.

      Comment


      • #4
        Post some data we can toy around with (-dataex-) if you try it yourself and it does not work.

        It seems to me that if you say something like


        sample 31, count by(panel_id_variable)


        this should do the trick.

        Comment


        • #5
          I agree with Joro that some example data would be nice. I believe that:
          Code:
          sample 31, count by(panel_id_variable)
          will choose different years for each country at each repetition, which might not be what you want.

          Presuming you do want to exclude the same years for each country at each repetition, I can suggest an approach. It will work, I think, but I hope someone else has a simpler solution. Note that what I offer presumes there are no issues with missing years and so forth:
          Code:
          set seed 49866
          local reps = 20
          // Initialize a matrix to contain the list of years
          levelsof year
          local list36 = r(levels)
          local list36 = "(" + subinstr("`list36'", " ", "\", .) + ")"
          matrix M = `list36'
          di "Checking initial matrix"
          mat list M
          di ""
          //  Repeatedly shuffle that matrix to get the years to exclude each time.
          forval i = 1/`reps'  {
              // I didn't want to write Stata code for a shuffler, but Mata can do it.
              mata: st_matrix("M", jumble(st_matrix("M")))
             // First 5 items will be the years to exclude.  Put them in a list for inlist().
             local exclude = ""
             forval i = 1/5 {
                local exclude =  "`exclude'" + string(M[`i', 1]) + ", "
             }
             local exclude = substr("`exclude'", 1, length("`exclude'") -2)
             di "Next five to exclude: `exclude' "
             //        ivreg .... if !inlist(year, `exclude'), .......
          }

          Comment


          • #6
            Mike is right, the solution with -sample- will drop different years across panel_id_variable. If this is not what is needed, but the same years need to be dropped across panel_id_variable, there is an easier solution that what Mike proposes.

            One round of the easier solution is to generate random uniform variable which takes the same value for each year. Then sort by panel_id and that random variable, and drop by panel_id the last 5 observations.

            Comment


            • #7
              Here's another way to do it. We shuffle the data randomly and then identify the years to be excluded in the first panel as the years in the first 5 observations. Then the years are the same for all the panels.

              Code:
              webuse grunfeld , clear 
              
              set seed 2803 
              gen double random = . 
              gen include = . 
              
              quietly forval j = 1/20 { 
                  replace random = runiform() 
                  sort company random 
                  levelsof year in 1/5 , local(levels) sep(,) 
                  replace include = !inlist(year, `levels') 
                  summarize invest if include 
                  noisily di "#`j'{col 5}" %10.3f r(mean) 
              }

              Comment


              • #8
                Thank you Mike, Joro, and Nick for your helpful advice.
                Joro Kolev how would the Stata code lool like? I tried this, following your advice, but I am missing something.

                Code:
                webuse grunfeld , clear
                save "in_sample.dta"
                
                postfile buffer coeff standart using dropping_random_years, replace
                
                quietly forval j = 1/20 {
                    use insample.dta, clear
                    bysort: year egen unirv=uniform()
                    sort company unirv
                    drop if unirv<6
                    
                    reg invest mvalue
                    
                    post buffer (_b[wheat_aid]) (_se[wheat_aid])
                }
                
                postclose buffer

                Comment


                • #9
                  Code:
                   bysort: year egen unirv=uniform()
                  is quite wrong. The colon is in the wrong place and uniform() is not an egen function. For your purposes

                  Code:
                  gen unirv = uniform()
                  should be enough. Also,

                  Code:
                  drop if unirv < 6 
                  will lose everything. All results of uniform() will qualify.

                  But even otherwise your code messing the data around (with use and drop loops) is more complicated than it need be.

                  More bugs:

                  Second time round the loop, unirv already exists and the generate statement will fail.

                  In the Grunfeld data there is no variable wheat_aid.

                  This works:

                  Code:
                  webuse grunfeld , clear
                  
                  set seed 1234 
                  postfile buffer coeff se using karl, replace
                  gen double unirv = . 
                  gen include = . 
                  
                  quietly forval j = 1/20 {
                      
                      replace unirv = runiform()
                      sort company unirv
                      levelsof year in 1/5, local(which) sep(,)  
                      replace include = !inlist(year, `which')   
                      reg invest mvalue if include 
                      post buffer (_b[mvalue]) (_se[mvalue])
                  }
                  
                  postclose buffer
                  Last edited by Nick Cox; 04 Dec 2018, 05:59.

                  Comment


                  • #10
                    If I am not wrong, this below is doing one round of dropping 5 years, in the lines in which I was suggesting:

                    Code:
                    . webuse grunfeld , clear
                    
                    . egen uniform = max(runiform()), by(year)
                    
                    . bysort company (uniform): drop if _n> _N-5
                    (50 observations deleted)

                    Comment


                    • #11
                      And the whole thing would be something like:

                      Code:
                      set seed 1234 
                      postfile buffer coeff se using karl, replace
                      
                      
                      quietly forval j = 1/20 {
                          
                      webuse grunfeld , clear    
                      egen uniform = max(runiform()), by(year)
                      bysort company (uniform): drop if _n> _N-5 
                             
                          reg invest mvalue
                          post buffer (_b[mvalue]) (_se[mvalue])
                      }
                      
                      postclose buffer

                      Comment

                      Working...
                      X