How to drop random years from panel data?

Karl Kiesinger

Join Date: Jul 2018
Posts: 14

How to drop random years from panel data?

01 Dec 2018, 07:47

I have a panel data set, consisting of 125 countries, 36 years. I want to run an IV regression multible times and randomly drop 5 (of the 36 years) each time.

At the moment (see code below), I run a loop 20 times and each round I drop the last year. How can I drop 5 random years each round (and put them back before the next round)?

I had a look at bootstrapp, but it doesn't seem to work here, since the variable fadum_avg and other controls including it have to be newly calculated in each round of the loop.

Code:

postfile buffer coeff standart using mcs, replace

forval x=0/20{
    tsset obs year
    drop if year>(2006-`x')
    bysort risocode: egen fadum_avg=mean(fadum)
    
    foreach x of varlist all_Precip_jan-all_Precip_dec all_Temp_jan-all_Temp_dec{
        drop `x'_faavg
        gen `x'_faavg=`x'*fadum_avg
    }
    tsset obs year
    gen instrument=l.US_wheat_production*fadum_avg
    bysort risocode: egen ln_rgdpch_avg=mean(ln_rgdpch)
        
    gen oil_fadum_avg=oil_price_2011_USD*fadum_avg
    gen US_income_fadum_avg=USA_ln_income*fadum_avg
    gen US_democ_pres_fadum_avg=US_president_democ*fadum_avg
    
    local US_controls "oil_fadum_avg US_income_fadum_avg US_democ_pres_fadum_avg"
    local weather_controls "all_Precip_jan-all_Precip_dec all_Temp_jan-all_Temp_dec all_Precip_jan_faavg-all_Precip_dec_faavg all_Temp_jan_faavg-all_Temp_dec_faavg"    
    
    sor risocode year
    
    ivreg2 intra_state (wheat_aid=instrument) c.recipient_pc_cereals_prod_avg#i.year c.cereal_pc_import_quantity_avg#i.year c.ln_rgdpch_avg#i.year ///
    c.real_us_nonfoodaid_ecaid_avg#i.year c.real_usmilaid_avg#i.year `weather_controls' `US_controls' i.year##i.wb_region_n i.risocode_n ,cluster(risocode) ffirst
    outreg2 wheat_aid using "T2_TESTneu5.xls", se noast nocons lab dec(5) adds(KP F-Stat, e(rkf))
    
    post buffer (_b[wheat_aid]) (_se[wheat_aid])
    
    //droppen
    drop ln_rgdpch_avg
    drop instrument
    drop fadum_avg
    drop oil_fadum_avg
    drop US_income_fadum_avg
    drop US_democ_pres_fadum_avg
}

postclose buffer

Tags: loop, panel data

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

01 Dec 2018, 08:06

See whether -sample- cannot do what you want. E.g., from the documentation:

sample 50, count by(sex) would draw a sample of size 50 for men and 50 for women.
Comment
Karl Kiesinger

Join Date: Jul 2018

Posts: 14
#3

01 Dec 2018, 08:35

Thank you Joro, I know about -sample- but I would not know how to use it in this context. How would I be about to randomly select 5 years with -sample-?
As far as I know, it only rondomly selects observations.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#4

01 Dec 2018, 09:04

Post some data we can toy around with (-dataex-) if you try it yourself and it does not work.

It seems to me that if you say something like

sample 31, count by(panel_id_variable)

this should do the trick.
Comment

Mike Lacy

Join Date: Apr 2014
Posts: 2416

01 Dec 2018, 10:11

I agree with Joro that some example data would be nice. I believe that:

Code:

sample 31, count by(panel_id_variable)

will choose different years for each country at each repetition, which might not be what you want.

Presuming you do want to exclude the same years for each country at each repetition, I can suggest an approach. It will work, I think, but I hope someone else has a simpler solution. Note that what I offer presumes there are no issues with missing years and so forth:

Code:

set seed 49866
local reps = 20
// Initialize a matrix to contain the list of years
levelsof year
local list36 = r(levels)
local list36 = "(" + subinstr("`list36'", " ", "\", .) + ")"
matrix M = `list36'
di "Checking initial matrix"
mat list M
di ""
//  Repeatedly shuffle that matrix to get the years to exclude each time.
forval i = 1/`reps'  {
    // I didn't want to write Stata code for a shuffler, but Mata can do it.
    mata: st_matrix("M", jumble(st_matrix("M")))
   // First 5 items will be the years to exclude.  Put them in a list for inlist().
   local exclude = ""
   forval i = 1/5 {
      local exclude =  "`exclude'" + string(M[`i', 1]) + ", "
   }
   local exclude = substr("`exclude'", 1, length("`exclude'") -2)
   di "Next five to exclude: `exclude' "
   //        ivreg .... if !inlist(year, `exclude'), .......
}

Comment

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#6

01 Dec 2018, 10:47

Mike is right, the solution with -sample- will drop different years across panel_id_variable. If this is not what is needed, but the same years need to be dropped across panel_id_variable, there is an easier solution that what Mike proposes.

One round of the easier solution is to generate random uniform variable which takes the same value for each year. Then sort by panel_id and that random variable, and drop by panel_id the last 5 observations.
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35694

02 Dec 2018, 01:51

Here's another way to do it. We shuffle the data randomly and then identify the years to be excluded in the first panel as the years in the first 5 observations. Then the years are the same for all the panels.

Code:

webuse grunfeld , clear 

set seed 2803 
gen double random = . 
gen include = . 

quietly forval j = 1/20 { 
    replace random = runiform() 
    sort company random 
    levelsof year in 1/5 , local(levels) sep(,) 
    replace include = !inlist(year, `levels') 
    summarize invest if include 
    noisily di "#`j'{col 5}" %10.3f r(mean) 
}

Comment

Karl Kiesinger

Join Date: Jul 2018
Posts: 14

04 Dec 2018, 05:43

Thank you Mike, Joro, and Nick for your helpful advice.
Joro Kolev how would the Stata code lool like? I tried this, following your advice, but I am missing something.

Code:

webuse grunfeld , clear
save "in_sample.dta"

postfile buffer coeff standart using dropping_random_years, replace

quietly forval j = 1/20 {
    use insample.dta, clear
    bysort: year egen unirv=uniform()
    sort company unirv
    drop if unirv<6
    
    reg invest mvalue
    
    post buffer (_b[wheat_aid]) (_se[wheat_aid])
}

postclose buffer

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35694
#9

04 Dec 2018, 05:49

Code:

bysort: year egen unirv=uniform()

is quite wrong. The colon is in the wrong place and uniform() is not an egen function. For your purposes

Code:

gen unirv = uniform()

should be enough. Also,

Code:

drop if unirv < 6

will lose everything. All results of uniform() will qualify.

But even otherwise your code messing the data around (with use and drop loops) is more complicated than it need be.

More bugs:

Second time round the loop, unirv already exists and the generate statement will fail.

In the Grunfeld data there is no variable wheat_aid.

This works:

Code:

webuse grunfeld , clear set seed 1234 postfile buffer coeff se using karl, replace gen double unirv = . gen include = . quietly forval j = 1/20 { replace unirv = runiform() sort company unirv levelsof year in 1/5, local(which) sep(,) replace include = !inlist(year, `which') reg invest mvalue if include post buffer (_b[mvalue]) (_se[mvalue]) } postclose buffer

Last edited by Nick Cox; 04 Dec 2018, 05:59.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#10

04 Dec 2018, 06:43

If I am not wrong, this below is doing one round of dropping 5 years, in the lines in which I was suggesting:

Code:

. webuse grunfeld , clear . egen uniform = max(runiform()), by(year) . bysort company (uniform): drop if _n> _N-5 (50 observations deleted)
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

#11

04 Dec 2018, 06:53

And the whole thing would be something like:

Code:

set seed 1234 
postfile buffer coeff se using karl, replace


quietly forval j = 1/20 {
    
webuse grunfeld , clear    
egen uniform = max(runiform()), by(year)
bysort company (uniform): drop if _n> _N-5 
       
    reg invest mvalue
    post buffer (_b[mvalue]) (_se[mvalue])
}

postclose buffer

Announcement

How to drop random years from panel data?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment