Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating neighbors' values for panel data

    Hi All:

    I have a problem where I have a (balanced) panel data set, and each cross-sectional unit in the data set comes with two neighbors, contained in the variables nb1 and nb2 (although some neighbor values are missing). I'd like to put the values of some of the neighbors' covariates on the proper line so I can include them in regression analysis. I know how to do this without a panel structure, and I found a recent helpful entry by Nick Cox that confirms what I thought. But I'm stuck for the panel data case.

    The identities of the neighbors do not change over time.

    The first block below gives a simple example of the structure, and the second block gives what I would like to have. I'd really appreciate any hints. Thanks, Jeff.

    Code:
    id year nb1 nb2 x
    
    
    1 2000 2 3 100
    1 2001 2 3 110
    1 2002 2 3 120
    2 2000 1 4 200
    2 2001 1 4 210
    2 2002 1 4 220
    3 2000 4 . 300
    3 2001 4 . 310
    3 2002 4 . 320
    4 2000 3 2 400
    4 2001 3 2 410
    4 2002 3 2 420
    
    id year nb1 nb2 x x_nb1 x_nb2
    
    1 2000 2 3 100 200 300
    1 2001 2 3 110 210 310
    1 2002 2 3 120 220 320
    2 2000 1 4 200 100 400
    2 2001 1 4 210 110 410
    2 2002 1 4 220 120 420
    3 2000 4 . 300 400  .
    3 2001 4 . 310 410  .
    3 2002 4 . 320 420  .
    4 2000 3 2 400 300 200
    4 2001 3 2 410 310 210
    4 2002 3 2 420 320 220

  • #2
    Hi Jeff, the following may not be the most elegant way of doing it but it gives you the desired results:
    Code:
    sort id year
    by id: gen n = _n
    local N = n[_N]
    gen x_nb1 = x[(nb1 - 1) * `N' + n]
    gen x_nb2 = x[(nb2 - 1) * `N' + n]
    drop n
    https://twitter.com/Kripfganz

    Comment


    • #3
      The typical way to match values from other observations is to use merge.

      Code:
      clear
      input id year nb1 nb2 x
      1 2000 2 3 100
      1 2001 2 3 110
      1 2002 2 3 120
      2 2000 1 4 200
      2 2001 1 4 210
      2 2002 1 4 220
      3 2000 4 . 300
      3 2001 4 . 310
      3 2002 4 . 320
      4 2000 3 2 400
      4 2001 3 2 410
      4 2002 3 2 420
      end
      save "main.dta", replace
      
      * save neighbor data for nb1
      keep id year x
      rename (id x) (nb1 x_nb1)
      save "nb1.dta", replace
      
      * save neighbor data for nb2
      use "main.dta", clear
      keep id year x
      rename (id x) (nb2 x_nb2)
      save "nb2.dta", replace
      
      * go back to main data and merge with neighbor data
      use "main.dta", clear
      merge m:1 nb1 year using "nb1.dta", keep(master match) nogen
      merge m:1 nb2 year using "nb2.dta", keep(master match) nogen
      
      isid id year, sort
      list, sepby(id)

      Comment


      • #4
        Thanks Sebastian and Robert. Your suggestions are better than my brute force attempts. Much appreciated.

        It does seem like Stata should allow double indexing for panel data sets so I could just reference x[nbd1,year]. At least that's what the computer programmer still lurking somewhere in me thinks.

        Comment


        • #5
          The programmer lurking in me since the late 1960's remembers having access to no more than a single record from of each several input files ("tapes") at a time and thus finds Robert's solution comfortingly familiar. The only index I knew was in the back of the FORTRAN manual.

          Comment


          • #6
            If you conflate the id with the year, you can use rangestat (from SSC) to locate the observation that falls in the degenerate interval [nb1_year,nb1_year]. To install rangestat, type in Stata's Command window:

            Code:
            ssc install rangestat
            A solution would look something like:
            Code:
            clear
            input id year nb1 nb2 x
            1 2000 2 3 100
            1 2001 2 3 110
            1 2002 2 3 120
            2 2000 1 4 200
            2 2001 1 4 210
            2 2002 1 4 220
            3 2000 4 . 300
            3 2001 4 . 310
            3 2002 4 . 320
            4 2000 3 2 400
            4 2001 3 2 410
            4 2002 3 2 420
            end
            
            * create an observation id that conflates id and year
            gen double bigid = id * 10000 + year
            
            * get the value of x from the observation that falls in the interval
            gen double idyear = nb1 * 10000 + year
            rangestat (min) x_nb1 = x, interval(bigid idyear idyear)
            
            * with -rangestat-, missing values exclude observations so use a value
            * will never match
            replace idyear = nb2 * 10000 + year
            replace idyear = 0 if mi(idyear)
            rangestat (min) x_nb2 = x, interval(bigid idyear idyear)
            
            list, sepby(id)

            Comment


            • #7
              Hi everyone,

              My panel data have 66 countries, 20 industries and time 2006-2014, but my denpend_var is country-industry-year,that is OFDI_ikt, there only one origin country China, and 66 host countries. I want to find out the determinants of China's outward foreign direct investment.

              Firstly, I tell the stata that this is a panel.

              egen id=group(country industry)
              xtset id year
              xtreg lnofdi_ikt lngdp_i lngdp_china countryskill_i industryskill_k countryskillĂ—industryskill_ik fta bits ruleoflaw i.year, cluster(industry)


              Is it right to cluster with industry, not the panel id ?


              Best Regards,

              Meng
              Last edited by Meng Zhang; 11 May 2016, 07:43.

              Comment


              • #8
                Dear Meng: your question has nothing to do with the topic of this thread. Start a new topic on the main forum page: http://www.statalist.org/forums/foru...ussion/general
                Last edited by Steve Samuels; 11 May 2016, 10:44.
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment

                Working...
                X