Construction of mobility matrix or transition matrix for panel data of firms

Jessica Thacker

Join Date: Aug 2020

Posts: 39
#1

Construction of mobility matrix or transition matrix for panel data of firms

04 May 2021, 12:58

Hi,

I have a panel of firms from 2011-2015 (please refer to the sample dataset attached to the post), with panel id "cocode" and year "Year". I wish to find how is the regime of the firm changing from one year to another i.e. want to know
how many of firms in regime 0 in 2011 moved to regime 2 in 2012
how many of firms in regime 2 in 2011 moved to regime 0,1, 2 in 2014
how many of firms in regime 1 in 2011 move to regime 2 in 2013
(a) which commands can be used in stata for this. I have used xttrans, but I am unsure if that is the correct way to come up with transition probabilities. I would be very grateful if someone can help me with this.
(b) I also want to know how is transition matrix calculated in stata if we use xttrans command i.e. for calculating transition matrix from 2011 to 2014. Is stata taking the data only for 2011 and 2015 or it is taking data for 2011,2012, 2013, 2014 to calculate transition matrix from 2011 to 2014?
(c) what is the role of delta in xtset while finding these transition matrices?
(d) How is missing values treated by stata?

Example generated by -dataex-. To install: ssc install dataex
clear
input float(cocode Year regime)
1 2011 1
1 2012 1
1 2013 .
1 2014 1
1 2015 1
2 2011 2
2 2012 2
2 2013 0
2 2014 0
2 2015 1
3 2011 1
3 2012 1
3 2013 .
3 2014 2
3 2015 .
4 2011 1
4 2012 1
4 2013 1
4 2014 0
4 2015 2
5 2011 2
5 2012 2
5 2013 2
5 2014 0
5 2015 0
end

I will really be grateful if someone can help me with this.

Last edited by Jessica Thacker; 04 May 2021, 13:01.
Tags: None

Mike Lacy

Join Date: Apr 2014
Posts: 2421

04 May 2021, 14:54

I haven't used -xttrans-, but I believe what you want can be done pretty easily on a do-it-yourself basis. The following hasn't been rigorously checked, but should be close:

Code:

// Each observation gets prior year value
bysort cocode (Year): gen regime_prior = ///
   regime[_n-1] if (Year[_n-1] == Year - 1)
//
// All years, tab regime this year vs. prior
tab regime regime_prior, matcell(T)
mat list T
// Convert to probabilities based on column totals
forval j = 1/`=colsof(T)' {
   // totals each column
   local total = 0
   forval i = 1/`=rowsof(T)' {
      local total = `total' + T[`i', `j']
   }
   di "total col `j' = " total
   //
   forval i = 1/`=rowsof(T)' {
      mat T[`i', `j'] = T[`i', `j']/`total'
   }
}
mat list T
// Perhaps you wanted each year separately?
// Here that is in raw frequency terms.
quiet summarize Year
local start = r(min) + 1
local stop = r(max)
forval y = `start'/`stop' {
   di "Year = `y'"
   tab regime regime_prior if (Year == `y'), matcell(T`y')
}
mat dir

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35755

05 May 2021, 03:00

As flagged in the previous thread you started

https://www.statalist.org/forums/for...ition-matrices

time series operators make this much easier. The machinery takes care of gaps, etc. This is a two-step: first construct variables containing previous states;then create the tables you want.

In your "i.e." (really e.g.) you have

i.e. want to know
how many of firms in regime 0 in 2011 moved to regime 2 in 2012
how many of firms in regime 2 in 2011 moved to regime 0,1, 2 in 2014
how many of firms in regime 1 in 2011 move to regime 2 in 2013

The first would be selecting 2012 and comparing with 1 year previous.
The second would be selecting 2014 and comparing with 3 years previous

and so on.

The tables here show counts, but you can get probabilities by standard options. You can phrase this in terms of future states if you prefer. Note also the missing option.

Code:

clear
input float(cocode Year regime)
1 2011 1
1 2012 1
1 2013 .
1 2014 1
1 2015 1
2 2011 2
2 2012 2
2 2013 0
2 2014 0
2 2015 1
3 2011 1
3 2012 1
3 2013 .
3 2014 2
3 2015 .
4 2011 1
4 2012 1
4 2013 1
4 2014 0
4 2015 2
5 2011 2
5 2012 2
5 2013 2
5 2014 0
5 2015 0
end

tsset cocode Year 

quietly forval j = 1/3 { 
    gen L`j'regime = L`j'.regime 
    if `j' > 1 local s "s"
    label var L`j'regime "`j' year`s' earlier" 
}

describe 

tab regime L1regime 

tab regime L3regime if Year == 2014 

. describe 

Contains data
  obs:            25                          
 vars:             6                          
---------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
---------------------------------------------------------------------------------------------
cocode          float   %9.0g                 
Year            float   %9.0g                 
regime          float   %9.0g                 
L1regime        float   %9.0g                 1 year earlier
L2regime        float   %9.0g                 2 years earlier
L3regime        float   %9.0g                 3 years earlier
---------------------------------------------------------------------------------------------
Sorted by: cocode  Year
     Note: Dataset has changed since last saved.

. 
. tab regime L1regime 

           |          1 year earlier
    regime |         0          1          2 |     Total
-----------+---------------------------------+----------
         0 |         2          1          2 |         5 
         1 |         1          5          0 |         6 
         2 |         1          0          3 |         4 
-----------+---------------------------------+----------
     Total |         4          6          5 |        15 

. 
. tab regime L3regime if Year == 2014 

           |    3 years earlier
    regime |         1          2 |     Total
-----------+----------------------+----------
         0 |         1          2 |         3 
         1 |         1          0 |         1 
         2 |         1          0 |         1 
-----------+----------------------+----------
     Total |         3          2 |         5

Comment

Jessica Thacker

Join Date: Aug 2020

Posts: 39
#4

10 May 2021, 02:57

Thanks Nick Cox and Mike Lacy!
Comment
Jessica Thacker

Join Date: Aug 2020

Posts: 39
#5

23 May 2021, 04:26

Hi,
This is in continuation with the above example. However, a different question. I wanted to construct a graph showing Year on the x-axis and different lines for percentage of firms in regime 0, 1, and 2 respectively. I have tried using xtline and twoway graph. However, it is giving weird plots.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35755

23 May 2021, 11:46

Code:

clear
input float(cocode Year regime)
1 2011 1
1 2012 1
1 2013 .
1 2014 1
1 2015 1
2 2011 2
2 2012 2
2 2013 0
2 2014 0
2 2015 1
3 2011 1
3 2012 1
3 2013 .
3 2014 2
3 2015 .
4 2011 1
4 2012 1
4 2013 1
4 2014 0
4 2015 2
5 2011 2
5 2012 2
5 2013 2
5 2014 0
5 2015 0
end

contract Year regime if regime < . 
reshape wide _freq , i(Year) j(regime)
mvencode _* , mv(0)

forval j = 0/2 {
    gen pc`j' = 100 * _freq`j' / (_freq0 + _freq1 + _freq2)
}

line pc? Year

Comment

Jessica Thacker

Join Date: Aug 2020

Posts: 39
#7

24 May 2021, 01:42

Thank you so much Nick Cox. It really worked.
Comment

Michael Duarte Goncalves

Join Date: Oct 2022
Posts: 500

05 Dec 2023, 02:23

Hi, I am writing exactly about the same topic. I am struggling to have a proper transition matrix.
Here is what I tried:

Code:

use "export_contratos_original_date_cleaned.dta", clear

// Each observation gets prior date value
bysort id (date_contract_start): gen product_prior = ///
   product_classification_encod[_n-1] if (date_contract_end[_n-1] < date_contract_start)
// All dates, tab product_classification_encode "current" date vs. last
tab product_classification_encod product_prior if !missing(powers_tariff2_less_15000w) & !missing(powers_tariff2_less_15000w[_n-1]), matcell(T)
mat list T
// Convert to probabilities based on column totals
forval j = 1/`=colsof(T)' {
   // totals each column
   local total = 0
   forval i = 1/`=rowsof(T)' {
      local total = `total' + T[`i', `j']
   }
   di "total col `j' = " `total'
   //
   forval i = 1/`=rowsof(T)' {
      mat T[`i', `j'] = T[`i', `j']/`total'
   }
}
mat list T


// Each observation gets prior date value
bysort id (date_contract_start): gen tariff_prior = ///
   tariff_ekon_id_encod[_n-1] if (date_contract_end[_n-1] < date_contract_start) 

 
// All dates, tab product_classification_encode "current" date vs. last
tab tariff_ekon_id_encod tariff_prior if !missing(powers_tariff2_less_15000w) & !missing(powers_tariff2_less_15000w[_n-1]), matcell(U)
mat list U

// Convert to probabilities based on column totals
forval k = 1/`=colsof(U)' {
   // totals each column
   local total_tar = 0
   forval l = 1/`=rowsof(U)' {
      local total_tar = `total_tar' + U[`k', `l']
   }
   di "total col `k' = " `total_tar'
   //
   forval l = 1/`=rowsof(U)' {
      mat U[`k', `l'] = U[`k', `l']/`total_tar'
   }
}
mat list U

But I am not sure about the result. Indeed, my date variables are not in years, but the loop I applied is based on the code above.
Here is a small dataex:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long id double(date_contract_start date_contract_end) long(product_classification_encod tariff_ekon_id_encod)
1001 18887 21700 1 1
1001 21701 22431 1 2
1001 22432 22645 1 4
1001 22646 22676 1 4
1001 22677 22735 1 4
1001 22736 23010 1 4
1001 23011 23069 1 4
1001 23070     . 4 4
1005 18800 21639 1 1
1005 21640 21651 1 1
end
format %td date_contract_start
format %td date_contract_end
label values product_classification_encod product_classification_encod
label def product_classification_encod 1 "Clasico", modify
label def product_classification_encod 4 "Tarifa Justa", modify
label values tariff_ekon_id_encod tariff_ekon_id_encod
label def tariff_ekon_id_encod 1 "20A", modify
label def tariff_ekon_id_encod 2 "20DHA", modify
label def tariff_ekon_id_encod 4 "20TD", modify

Could you give me more insights about it, please?
Also, could you tell me how we should read a transition matrix, please?

Comment

Castor Comploj

Join Date: Mar 2021

Posts: 91
#9

16 Apr 2024, 07:52

-xttrans- does the same as -tab myvar myvar_lead, nofreq row-, where myvar_lead = f.myvar.

Can anyone tell me how to best make a visual representation of the resulting table though?
Comment

Announcement