Transition probabilities

Zainab Iftikhar

Join Date: Apr 2021

Posts: 13
#1

Transition probabilities

08 May 2021, 05:45

Hi!

I am using panel data to compute transition probabilities. The data is appended for years 2000 to 2017. I have a variable emp_state that takes the value 1 if a person is in paid employment, 2 if self-employed and 3 if unemployment. I want to compute the transition probabilities of moving from one state in year t to another state in year t+1 for all years. This means a have a 3x3 transition matrix for each year. I need to compute this for a period 2000-2016. I use the following code (stata 15.1) where persnr is individual is and syear is the survey year

xtset persnr syear, yearly
gen nextyr=f.emp_state
quietly {

forval j = 1/9 {
gen f`j' = .
}

local i = 1

forval y= 2000/2017{
count if emp_state <. & nextyr <. & syear == `y'
if r(N) > 0 {
tab emp_state nextyr if syear == `y', row matcell(e_state)
replace f1 = e_state[1,1] in `i'
replace f2 = e_state[1,2] in `i'
replace f3 = e_state[1,3] in `i'
replace f4 = e_state[2,1] in `i'
replace f5 = e_state[2,2] in `i'
replace f6 = e_state[2,3] in `i'
replace f7 = e_state[3,1] in `i'
replace f8 = e_state[3,2] in `i'
replace f9 = e_state[3,3] in `i'

}
local ++i
}

gen p11 = f1 / (f1 + f2+f3)
gen p12 = f2 / (f1 + f2+f3)
gen p13 = f3 / (f1 + f2+f3)
gen p21 = f4 / (f4 + f5+f6)
gen p22 = f5 / (f4 + f5+f6)
gen p23 = f6 / (f4 + f5+f6)
gen p31 = f7 / (f7 + f8+f9)
gen p32 = f8 / (f7 + f8+f9)
gen p33 = f9 / (f7 + f8+f9)

}

gen year = 2000 + _n
format p* %4.3f
list year f* p* if f1 <.

The code seems to work fine in computing the transition probabilities however, the output looks like this

1. year fsize f1 f2 f3 f4 f5 f6 f7 f8 f9 persnr p11 p12
2001 2 965 156 1 152 10695 208 5 160 373 203.000 0.860 0.139

p13 p21 p22 p23 p31 p32 p33
0.001 0.014 0.967 0.019 0.009 0.297 0.693

I do not understand what is fsize and why is it even there? Also why is the persnr (individual id) there and only a specific number (not all ids)?
Is it possible to get the table without the fsize and persnr?
I would appreciate a lot any help.
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

08 May 2021, 08:34

Apparently fsize and persnr are variables that were in your original data and thus were selected in the variable list f* p* of variables whose names begin with the letter f or p.

With that said, I think you would be better off creating a new dataset of transition probabilities rather than sticking them arbitrarily into your existing dataset. Here is an approach.

Code:

// create invented example data since none was provided
set obs 1000
set seed 42
generate persnr = ceil(_n/5)          // 1 to 200
generate syear = mod(_n-1,5)+2001     // 2001 to 2005
generate emp_state = runiformint(1,3) // 1, 2, or 3

// create a datset of probabilities using the example data
xtset persnr syear, yearly
generate nextyr=f.emp_state
drop if missing(nextyr)
generate f = 1
collapse (sum) f, by(syear emp_state nextyr)
bysort syear emp_state: egen all = total(f)
generate p = f/all

// review intermediate output
format %9.3f p
list if syear==2001, noobs abbreviate(12) sepby(syear emp_state)

// creat the final table
generate fromto = 10*emp_state+nextyr
drop emp_state nextyr all
reshape wide f p, i(syear) j(fromto)
order syear f* p*
list, clean noobs

Code:

. // create invented example data since none was provided
. set obs 1000
number of observations (_N) was 0, now 1,000

. set seed 42

. generate persnr = ceil(_n/5)          // 1 to 200

. generate syear = mod(_n-1,5)+2001     // 2001 to 2005

. generate emp_state = runiformint(1,3) // 1, 2, or 3

.
. // create a datset of probabilities using the example data
. xtset persnr syear, yearly
       panel variable:  persnr (strongly balanced)
        time variable:  syear, 2001 to 2005
                delta:  1 year

. generate nextyr=f.emp_state
(200 missing values generated)

. drop if missing(nextyr)
(200 observations deleted)

. generate f = 1

. collapse (sum) f, by(syear emp_state nextyr)

. bysort syear emp_state: egen all = total(f)

. generate p = f/all

.
. // review intermediate output
. format %9.3f p

. list if syear==2001, noobs abbreviate(12) sepby(syear emp_state)

  +-----------------------------------------------+
  | syear   emp_state   nextyr    f   all       p |
  |-----------------------------------------------|
  |  2001           1        1   21    65   0.323 |
  |  2001           1        2   24    65   0.369 |
  |  2001           1        3   20    65   0.308 |
  |-----------------------------------------------|
  |  2001           2        1   29    75   0.387 |
  |  2001           2        2   21    75   0.280 |
  |  2001           2        3   25    75   0.333 |
  |-----------------------------------------------|
  |  2001           3        1   26    60   0.433 |
  |  2001           3        2   20    60   0.333 |
  |  2001           3        3   14    60   0.233 |
  +-----------------------------------------------+

.
. // creat the final table
. generate fromto = 10*emp_state+nextyr

. drop emp_state nextyr all

. reshape wide f p, i(syear) j(fromto)
(note: j = 11 12 13 21 22 23 31 32 33)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                       36   ->       4
Number of variables                   4   ->      19
j variable (9 values)            fromto   ->   (dropped)
xij variables:
                                      f   ->   f11 f12 ... f33
                                      p   ->   p11 p12 ... p33
-----------------------------------------------------------------------------

. order syear f* p*

. list, clean noobs

    syear   f11   f12   f13   f21   f22   f23   f31   f32   f33     p11     p12     p13     p21     p22     p23     p31     p32     p33  
     2001    21    24    20    29    21    25    26    20    14   0.323   0.369   0.308   0.387   0.280   0.333   0.433   0.333   0.233  
     2002    21    28    27    23    20    22    19    23    17   0.276   0.368   0.355   0.354   0.308   0.338   0.322   0.390   0.288  
     2003    14    23    26    21    23    27    19    27    20   0.222   0.365   0.413   0.296   0.324   0.380   0.288   0.409   0.303  
     2004    19    15    20    21    25    27    24    22    27   0.352   0.278   0.370   0.288   0.342   0.370   0.329   0.301   0.370  

.

Last edited by William Lisowski; 08 May 2021, 09:26.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35461
#3

08 May 2021, 09:44

See also the recent thread https://www.statalist.org/forums/for...-data-of-firms and https://www.stata.com/meeting/boston...14_nichols.pdf
Comment
Zainab Iftikhar

Join Date: Apr 2021

Posts: 13
#4

09 May 2021, 15:13

Thank you William and Nick for the help. I agree that I should rather not stick the probabilities arbitrarily to my existing dataset.
Thanks a lot William for your input and spending time in providing a good code.
Thanks Nick for the link. It is very helpful.
Comment

Announcement

Transition probabilities

Comment

Comment

Comment