Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data loop: dropping years

    I have a panel data set (36 years, 120 countries). I want to check how the results change when dropping different years. Though, some of the preliminary "Basic coding & cleaning" depends on the number of years and on the particular years used.
    a) How can I loop over my entire analysis and drop e.g. the most recent year in each loop?
    b) How can I define the range inside the forval commands depending on the number of years used (since it will differ from 36 in each loop)?


    Code:
    ************************************
    *** Some basic coding & cleaning ***
    ************************************
    use "FAid_Final.dta", clear
    tsset obs year
    
    tab cont, gen(contdum)
    tab year, gen(ydum)
    tab risocode, gen(cdum)
    
    
    forval x=1/36{
        gen cont1_y`x'=contdum1*ydum`x'
    }
    forval x=1/36{
        gen cont2_y`x'=contdum2*ydum`x'
    }
    forval x=1/36{
        gen cont3_y`x'=contdum3*ydum`x'
    }
    forval x=1/36{
        gen cont4_y`x'=contdum4*ydum`x'
    }
    forval x=1/36{
        gen cont5_y`x'=contdum5*ydum`x'
    }
    forval x=1/36{
        gen cont6_y`x'=contdum6*ydum`x'
    }
    forval x=1/36{
        gen rcereal_y`x'=recipient_pc_cereals_prod_avg*ydum`x'
    }
    forval x=1/36{
        gen rimport_y`x'=cereal_pc_import_quantity_avg*ydum`x'
    }
    forval x=1/36{
        gen usec_y`x'=real_us_nonfoodaid_ecaid_avg*ydum`x'
    }
    forval x=1/36{
        gen usmil_y`x'=real_usmilaid_avg*ydum`x'
    }
    
    gen USA_ln_income = ln(USA_rgdpch)
    
    bysort risocode: egen ln_rgdpch_avg=mean(ln_rgdpch) if year>=1971 & year<=2006
    
    forval x=1/36{
        gen gdp_y`x'=ln_rgdpch_avg*ydum`x'
    }
    gen oil_fadum_avg=oil_price_2011_USD*fadum_avg
    gen US_income_fadum_avg=USA_ln_income*fadum_avg
    gen US_democ_pres_fadum_avg=US_president_democ*fadum_avg
    
    local US_controls "oil_fadum_avg US_income_fadum_avg US_democ_pres_fadum_avg"
    local weather_controls "all_Precip_jan-all_Precip_dec all_Temp_jan-all_Temp_dec all_Precip_jan_faavg-all_Precip_dec_faavg all_Temp_jan_faavg-all_Temp_dec_faavg"
    local country_chars_controls "gdp_y2-gdp_y36 usmil_y2-usmil_y36 usec_y2-usec_y36"
    local cereals_controls "rcereal_y2-rcereal_y36 rimport_y2-rimport_y36"
    local baseline_controls "oil_fadum_avg US_income_fadum_avg US_democ_pres_fadum_avg gdp_y2-gdp_y36 usmil_y2-usmil_y36 usec_y2-usec_y36 rcereal_y2-rcereal_y36 rimport_y2-rimport_y36 all_Precip_jan-all_Precip_dec all_Temp_jan-all_Temp_dec all_Precip_jan_faavg-all_Precip_dec_faavg all_Temp_jan_faavg-all_Temp_dec_faavg"
    
    sor risocode year
    save "in_sample.dta", replace
    
    
    ***********************************
    *** TABLE 1: Summary Statistics ***
    ***********************************
    
    use "in_sample.dta", clear
    
    xi: ivreg2 intra_state (wheat_aid=instrument) `US_controls' i.risocode i.year*i.wb_region if year>=1971 & year<=2006, cluster(risocode)






  • #2
    Looking at all those loops, and combining that with your asking if there is a way to code out one year from each, suggests to me that what you are trying to do is create a large number of indicator ("dummy") variables and interaction terms for use in a regression model (though you don't actually use them at all in the code you show). If I have guessed right, there is no need to create these variables at all. Because I also surmise by your use of -xi- that you are unaware that all of this has been superseded by the use of factor variable notation. If I am right that this is your purpose, you can just eliminate all those loops. Whatever regression you want can be done with factor-variable notation and Stata will generate the indicators and interactions for you automatically "on the fly."

    Code:
    regression_command outcome_variable (i.cont c.recipient_pc_cereals_prod_avg ///
    c.cereal_pc_import_quantity_avg real_us_nonfoodaid_ecaid_avg ///
    c.real_usmilaid_avg)##i.year  perhaps_other_variables, perhaps_options
    Do read -help fvvarlist- so you understand what this notation means and what Stata does with it. And do your best to almost entirely forget you ever knew about -xi-. There are a few commands that do not allow factor-variable notation, but most of these have functions that are better done with newer commands that do. There are some unusual exotic situations that still require the use of -xi- and cannot have factor-variables substituted for them. But these are unusual, and most Stata users will never encounter any of them. So move -xi- to the dusty corners of your remote memory.
    Last edited by Clyde Schechter; 13 Nov 2018, 20:04.

    Comment


    • #3
      Thank you very much Clyde. Sorry it took me so long, but I needed time to process your input.
      (1) I incorporated your suggestion about fvvarlist into my code, leaving out xi (see 2nd code I post). However, when I put more than one variable in brackets to multiply it with i.year, Stata tells me: “syntax is "(all instrumented variables = instrument variables)"
      Code:
      ivreg2 intra_state (wheat_aid=instrument) (c.recipient_pc_cereals_prod_avg c.cereal_pc_import_quantity_avg)##i.year, cluster(risocode) ffirst
      (2) Still, it leaves me with my question how I can loop over the entire “Preliminary Coding” and “IV regression” and check how results react to taking away different years. As you can see in the code, the variable fadum_avg is interacted with other variables. And fadum_avg depends on how many and which years are used. So all that has to be inside the loop.
      Code:
      **************************
      *** preliminary coding ***
      **************************
      
      ***generate the fadum_avg of the period 1987 - 2006 for FAO18 food aid year
      *generate dummies
      generate fadum = 0
      replace fadum = 1 if wheat_aid>0
      replace fadum = . if wheat_aid == .
      *generate fadum_avg
      bysort risocode: egen fadum_avg=mean(fadum) if year>=1987 & year<=2006
      
      *Creating all instruments
      gen instrument=l.US_wheat_production*fadum_avg
      la var instrument "Baseline interaction instrument: US wheat prod (t-1) x avg food aid prob (1971-2006)"
      
      *Creating locals
      gen oil_fadum_avg=oil_price_2011_USD*fadum_avg
      gen US_income_fadum_avg=USA_ln_income*fadum_avg
      gen US_democ_pres_fadum_avg=US_president_democ*fadum_avg
      
      local US_controls "oil_fadum_avg US_income_fadum_avg US_democ_pres_fadum_avg"
      local weather_controls "all_Precip_jan-all_Precip_dec all_Temp_jan-all_Temp_dec all_Precip_jan_faavg-all_Precip_dec_faavg all_Temp_jan_faavg-all_Temp_dec_faavg"
      
      
      *********************
      *** IV Regression ***
      *********************
      
      ivreg2 intra_state (wheat_aid=instrument) c.recipient_pc_cereals_prod_avg##i.year c.cereal_pc_import_quantity_avg##i.year c.real_us_nonfoodaid_ecaid_avg##i.year c.real_usmilaid_avg##i.year `weather_controls' `US_controls' c.risocode, cluster(risocode) ffirst

      Comment

      Working...
      X