Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transition probabilities

    Hi!

    I am using panel data to compute transition probabilities. The data is appended for years 2000 to 2017. I have a variable emp_state that takes the value 1 if a person is in paid employment, 2 if self-employed and 3 if unemployment. I want to compute the transition probabilities of moving from one state in year t to another state in year t+1 for all years. This means a have a 3x3 transition matrix for each year. I need to compute this for a period 2000-2016. I use the following code (stata 15.1) where persnr is individual is and syear is the survey year

    xtset persnr syear, yearly
    gen nextyr=f.emp_state
    quietly {

    forval j = 1/9 {
    gen f`j' = .
    }

    local i = 1

    forval y= 2000/2017{
    count if emp_state <. & nextyr <. & syear == `y'
    if r(N) > 0 {
    tab emp_state nextyr if syear == `y', row matcell(e_state)
    replace f1 = e_state[1,1] in `i'
    replace f2 = e_state[1,2] in `i'
    replace f3 = e_state[1,3] in `i'
    replace f4 = e_state[2,1] in `i'
    replace f5 = e_state[2,2] in `i'
    replace f6 = e_state[2,3] in `i'
    replace f7 = e_state[3,1] in `i'
    replace f8 = e_state[3,2] in `i'
    replace f9 = e_state[3,3] in `i'

    }
    local ++i
    }

    gen p11 = f1 / (f1 + f2+f3)
    gen p12 = f2 / (f1 + f2+f3)
    gen p13 = f3 / (f1 + f2+f3)
    gen p21 = f4 / (f4 + f5+f6)
    gen p22 = f5 / (f4 + f5+f6)
    gen p23 = f6 / (f4 + f5+f6)
    gen p31 = f7 / (f7 + f8+f9)
    gen p32 = f8 / (f7 + f8+f9)
    gen p33 = f9 / (f7 + f8+f9)

    }

    gen year = 2000 + _n
    format p* %4.3f
    list year f* p* if f1 <.

    The code seems to work fine in computing the transition probabilities however, the output looks like this

    1. year fsize f1 f2 f3 f4 f5 f6 f7 f8 f9 persnr p11 p12
    2001 2 965 156 1 152 10695 208 5 160 373 203.000 0.860 0.139

    p13 p21 p22 p23 p31 p32 p33
    0.001 0.014 0.967 0.019 0.009 0.297 0.693

    I do not understand what is fsize and why is it even there? Also why is the persnr (individual id) there and only a specific number (not all ids)?
    Is it possible to get the table without the fsize and persnr?
    I would appreciate a lot any help.

  • #2
    Apparently fsize and persnr are variables that were in your original data and thus were selected in the variable list f* p* of variables whose names begin with the letter f or p.

    With that said, I think you would be better off creating a new dataset of transition probabilities rather than sticking them arbitrarily into your existing dataset. Here is an approach.
    Code:
    // create invented example data since none was provided
    set obs 1000
    set seed 42
    generate persnr = ceil(_n/5)          // 1 to 200
    generate syear = mod(_n-1,5)+2001     // 2001 to 2005
    generate emp_state = runiformint(1,3) // 1, 2, or 3
    
    // create a datset of probabilities using the example data
    xtset persnr syear, yearly
    generate nextyr=f.emp_state
    drop if missing(nextyr)
    generate f = 1
    collapse (sum) f, by(syear emp_state nextyr)
    bysort syear emp_state: egen all = total(f)
    generate p = f/all
    
    // review intermediate output
    format %9.3f p
    list if syear==2001, noobs abbreviate(12) sepby(syear emp_state)
    
    // creat the final table
    generate fromto = 10*emp_state+nextyr
    drop emp_state nextyr all
    reshape wide f p, i(syear) j(fromto)
    order syear f* p*
    list, clean noobs
    Code:
    . // create invented example data since none was provided
    . set obs 1000
    number of observations (_N) was 0, now 1,000
    
    . set seed 42
    
    . generate persnr = ceil(_n/5)          // 1 to 200
    
    . generate syear = mod(_n-1,5)+2001     // 2001 to 2005
    
    . generate emp_state = runiformint(1,3) // 1, 2, or 3
    
    .
    . // create a datset of probabilities using the example data
    . xtset persnr syear, yearly
           panel variable:  persnr (strongly balanced)
            time variable:  syear, 2001 to 2005
                    delta:  1 year
    
    . generate nextyr=f.emp_state
    (200 missing values generated)
    
    . drop if missing(nextyr)
    (200 observations deleted)
    
    . generate f = 1
    
    . collapse (sum) f, by(syear emp_state nextyr)
    
    . bysort syear emp_state: egen all = total(f)
    
    . generate p = f/all
    
    .
    . // review intermediate output
    . format %9.3f p
    
    . list if syear==2001, noobs abbreviate(12) sepby(syear emp_state)
    
      +-----------------------------------------------+
      | syear   emp_state   nextyr    f   all       p |
      |-----------------------------------------------|
      |  2001           1        1   21    65   0.323 |
      |  2001           1        2   24    65   0.369 |
      |  2001           1        3   20    65   0.308 |
      |-----------------------------------------------|
      |  2001           2        1   29    75   0.387 |
      |  2001           2        2   21    75   0.280 |
      |  2001           2        3   25    75   0.333 |
      |-----------------------------------------------|
      |  2001           3        1   26    60   0.433 |
      |  2001           3        2   20    60   0.333 |
      |  2001           3        3   14    60   0.233 |
      +-----------------------------------------------+
    
    .
    . // creat the final table
    . generate fromto = 10*emp_state+nextyr
    
    . drop emp_state nextyr all
    
    . reshape wide f p, i(syear) j(fromto)
    (note: j = 11 12 13 21 22 23 31 32 33)
    
    Data                               long   ->   wide
    -----------------------------------------------------------------------------
    Number of obs.                       36   ->       4
    Number of variables                   4   ->      19
    j variable (9 values)            fromto   ->   (dropped)
    xij variables:
                                          f   ->   f11 f12 ... f33
                                          p   ->   p11 p12 ... p33
    -----------------------------------------------------------------------------
    
    . order syear f* p*
    
    . list, clean noobs
    
        syear   f11   f12   f13   f21   f22   f23   f31   f32   f33     p11     p12     p13     p21     p22     p23     p31     p32     p33  
         2001    21    24    20    29    21    25    26    20    14   0.323   0.369   0.308   0.387   0.280   0.333   0.433   0.333   0.233  
         2002    21    28    27    23    20    22    19    23    17   0.276   0.368   0.355   0.354   0.308   0.338   0.322   0.390   0.288  
         2003    14    23    26    21    23    27    19    27    20   0.222   0.365   0.413   0.296   0.324   0.380   0.288   0.409   0.303  
         2004    19    15    20    21    25    27    24    22    27   0.352   0.278   0.370   0.288   0.342   0.370   0.329   0.301   0.370  
    
    .
    Last edited by William Lisowski; 08 May 2021, 09:26.

    Comment


    • #3
      See also the recent thread https://www.statalist.org/forums/for...-data-of-firms and https://www.stata.com/meeting/boston...14_nichols.pdf

      Comment


      • #4
        Thank you William and Nick for the help. I agree that I should rather not stick the probabilities arbitrarily to my existing dataset.
        Thanks a lot William for your input and spending time in providing a good code.
        Thanks Nick for the link. It is very helpful.

        Comment

        Working...
        X