Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • tab between years in long form panel

    Say I have a panel data and I wish to see how many individuals moved from occupation "3" to occupation "2". This is easily done with the panel in wide format, how can I achieve this in long format (which is the preferable way to deal with panel in stata as panel estimation models require it). Following is an example code to go from long to wide and calculate this:

    Code:
    clear all
    webuse nlswork
    bysort idcode: gen t = _n
    keep idcode t occ_code
    reshape wide occ_code, i(idcode) j(t)
    tab occ_code1 occ_code13, row
    The result btw is that 19.1% of the females in the data that had occupation code 3 in the first wave of the panel had occupation code 2 in the last wave of the panel.

  • #2
    Code:
    count if occ_code[_n-1] == 3 & occ_code == 2 & idcode == idcode[_n-1]
    is an answer to your specific question for long layout assuming panel sort order (I follow the suggestion of Clyde Schechter to use this term rather than format or structure). (Shape works for me too.)

    I guess you're wanting something more general too. A talk by Austin Nichols at a Stata Conference is packed with detailed advice.

    https://www.stata.com/meeting/boston...14_nichols.pdf

    Comment


    • #3
      I am perhaps a little confused by the problem statement.
      Code:
      . webuse nlswork, clear
      (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
      
      . xtset
             panel variable:  idcode (unbalanced)
              time variable:  year, 68 to 88, but with gaps
                      delta:  1 unit
      
      . levelsof year
      68 69 70 71 72 73 75 77 78 80 82 83 85 87 88
      This suggests to me that there are 15 waves of data, not 13. Modifying the code presented in post #1 to make use of the existing year variable:
      Code:
      . keep idcode year occ_code
      
      . reshape wide occ_code, i(idcode) j(year)
      (note: j = 68 69 70 71 72 73 75 77 78 80 82 83 85 87 88)
      
      Data                               long   ->   wide
      -----------------------------------------------------------------------------
      Number of obs.                    28534   ->    4711
      Number of variables                   3   ->      16
      j variable (15 values)             year   ->   (dropped)
      xij variables:
                                     occ_code   ->   occ_code68 occ_code69 ... occ_code88
      -----------------------------------------------------------------------------
      
      . tab occ_code88 if occ_code68==3
      
      88 occ_code |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                1 |         17        7.11        7.11
                2 |         36       15.06       22.18
                3 |        132       55.23       77.41
                4 |         11        4.60       82.01
                5 |          7        2.93       84.94
                6 |         14        5.86       90.79
                7 |          1        0.42       91.21
                8 |         15        6.28       97.49
               11 |          2        0.84       98.33
               12 |          1        0.42       98.74
               13 |          3        1.26      100.00
      ------------+-----------------------------------
            Total |        239      100.00
      we see that of the 239 women with an occupation code of 3 in the first (1968) wave who do not have a missing occupation code in the final (1988) wave, 36 (15.06%) had an occupation code of 2 in the final (1988) wave. (If you add the missing option to the tab command, you will see that an additional 341 of the women with an occupation code of 3 in the first wave have a missing occupation code in the final wave.)

      To address the original question, we can accomplish this without reshaping as follows.
      Code:
      . generate occ_code88 = F20.occ_code if year==68
      (27,966 missing values generated)
      
      . tab occ_code88 if occ_code==3 & year==68
      
       occ_code88 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                1 |         17        7.11        7.11
                2 |         36       15.06       22.18
                3 |        132       55.23       77.41
                4 |         11        4.60       82.01
                5 |          7        2.93       84.94
                6 |         14        5.86       90.79
                7 |          1        0.42       91.21
                8 |         15        6.28       97.49
               11 |          2        0.84       98.33
               12 |          1        0.42       98.74
               13 |          3        1.26      100.00
      ------------+-----------------------------------
            Total |        239      100.00
      Last edited by William Lisowski; 24 Aug 2017, 06:56.

      Comment

      Working...
      X