150x150 crosstab in stata, showing timeseries movement between categories

EmilBeBri

Join Date: Aug 2014

Posts: 10
#16

16 Mar 2015, 09:06

Allright, thanks! I tried something like that but in the the end, Nick Cox proved to be a bit better than I was in writing stata code, so I just his tsspel, but that doens't that much with understanding the logic, of course, which your code above does. Great
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#17

16 Mar 2015, 09:19

The logic behind tsspell (SSC) is spelled out in the paper you cited in post #3 in the thread. The small question, if there is one, of why the program is not mentioned in the paper is answered by the fact that the paper was already long enough, without several extra pages on tsspell.
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

#18

16 Mar 2015, 09:36

I've had this thread in the back of my head for a while because I worry when people start doing arithmetics with observation subscripting. This is usually not needed, in the same way that looping over observations is generally not needed. Here's an even simpler solution for the original problem

Code:

clear
input id year occ_code unempl
1 1999 4 0
1 2000 4 0
1 2001 . 1
1 2002 . 1
1 2003 . 1
1 2004 5 0
1 2005 5 0
1 2006 5 0
1 2007 . 1
2 1999 . 1
2 2000 . 1
2 2001 . 1
2 2002 2 0
2 2003 2 0
2 2004 . 1
2 2005 2 0
2 2006 2 0
2 2007 . 1
3 1999 1 0
3 2000 1 0
3 2001 . 1
3 2002 1 0
3 2003 . 1
3 2004 2 0
3 2005 2 0
3 2006 3 0
3 2007 . 1
4 1999 1 0
4 2000 2 0
4 2001 3 0
4 2002 4 0
4 2003 . 1
4 2004 4 0
4 2005 3 0
4 2006 . 1
4 2007 3 0
5 1999 1 0
5 2000 2 0
5 2001 3 0
5 2002 4 0
5 2003 . 1
5 2004 4 0
5 2005 3 0
5 2006 . 1
5 2007 3 0
6 2005 15 0
6 2006 . 1
6 2007 16 0
end

* verify assumptions about the data
isid id year, sort

* Tag observations that start a new job
by id: gen newjob = unempl == 0 & unempl[_n-1] == 1

* Discard unemployment obs
drop if unempl

* Note the previous occupation and reduce to new job observations
by id: gen old_occ = occ_code[_n-1]
keep if newjob

list id year old_occ occ_code, sepby(id) noobs

* Calculate the frequency for each transition
collapse (count) occ_=year, by(old_occ occ_code)

* Create a cross-tab
list, sepby(occ_code) noobs
reshape wide occ_, i(old_occ) j(occ_code)
mvencode _all, mv(0)
list

Comment

Klaudia Erhardt

Join Date: Mar 2015

Posts: 74
#19

16 Mar 2015, 09:42

I worry when people start doing arithmetics with observation subscripting.

And why?
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#20

16 Mar 2015, 12:01

I worry in the sense that if I don't propose a more Stataish way, some people on Statalist will probably pick-up on the idea and start looping over observations or do observation arithmetic when there are far simpler ways to achieve the same results. Don't forget the OP's comment: "But damn, I had to strain my head in order to understand the idea of using hardbrackets within hard brackets!".
Comment
Klaudia Erhardt

Join Date: Mar 2015

Posts: 74
#21

17 Mar 2015, 04:04

Hello Robert,

I see your point. But just to put that clear: my "using hardbrackets within hard brackets" - bit of syntax has nothing to do with looping over oberservations. Rather it is a way to determine the x of _n-x if you have to reference a value in another record where the relative position of that record is determined by a variable. Usually one refers to _n - #, but here I referred to _n-x with x being the value of temp[_n-1].
Your far simpler solution works when it is allowed to drop observations. My less Stataish way (?? whatever that means) is more general, as it allows to "jump" over the records you dropped.
But lets not start a competition on who has the better solution. There are often several solutions, and it is a matter of taste or of 'programming style' which one appeals more to someone.

No offense meant! (I looked that one up in leo.org - no idea if it says really what I intend to say )
Greetings, Klaudia
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

#22

17 Mar 2015, 09:41

I understand that observation index arithmetics is not the same as looping and it certainly doesn't have the execution time penalties that are involved with looping over observations but the approach still stems from a reflex (probably inherited from the use of other computer languages) to target an individual observation via its computed position. My point is that almost every time you think of a solution that involves index arithmetics, you have a simpler solution that uses basic Stata commands.

Even though the end game in the example in this thread requires destroying the original data, you can still make the required frequency computations on the full sample without index arithmetics. Here's a reworked example

Code:

clear
input id year occ_code unempl
1 1999 4 0
1 2000 4 0
1 2001 . 1
1 2002 . 1
1 2003 . 1
1 2004 5 0
1 2005 5 0
1 2006 5 0
1 2007 . 1
2 1999 . 1
2 2000 . 1
2 2001 . 1
2 2002 2 0
2 2003 2 0
2 2004 . 1
2 2005 2 0
2 2006 2 0
2 2007 . 1
3 1999 1 0
3 2000 1 0
3 2001 . 1
3 2002 1 0
3 2003 . 1
3 2004 2 0
3 2005 2 0
3 2006 3 0
3 2007 . 1
4 1999 1 0
4 2000 2 0
4 2001 3 0
4 2002 4 0
4 2003 . 1
4 2004 4 0
4 2005 3 0
4 2006 . 1
4 2007 3 0
5 1999 1 0
5 2000 2 0
5 2001 3 0
5 2002 4 0
5 2003 . 1
5 2004 4 0
5 2005 3 0
5 2006 . 1
5 2007 3 0
6 2005 15 0
6 2006 . 1
6 2007 16 0
end

* verify assumptions about the data
isid id year, sort

* Tag observations that start a new job
by id: gen newjob = unempl == 0 & unempl[_n-1] == 1

* move unemployment obs out of the way
sort unempl id year

* Note the previous occupation and reduce to new job observations
by unempl id: gen old_occ = occ_code[_n-1]

* Calculate the frequency for each transition on full data
bysort old_occ occ_code: egen occ_ = total(newjob)

* Build the cross-tab
by old_occ occ_code: keep if _n == 1
drop if occ_ == 0
keep old_occ occ_code occ_
reshape wide occ_, i(old_occ) j(occ_code)
mvencode _all, mv(0)
list

Comment

EmilBeBri

Join Date: Aug 2014

Posts: 10
#23

20 Mar 2015, 07:59

Nick Cox, I am aware that the logic is spelled out (!) in your journal article, and I will return to it ind order to grasp it fully. However, since, as Klaudia notes, fixing the problem with counting "across" panels actually didn't have any practical implication for the end result, I choose to just use your .ado file without understanding the finer details in this particular logic, since I needed to move on, but I do want to "get it", so I'll return to the article later, I want to know everything there is to know :D
Comment
EmilBeBri

Join Date: Aug 2014

Posts: 10
#24

11 May 2015, 06:03

Hi again, just wanted to let you know the result of the help you gave me, which I learned tremendeously from. Here is a map of the job-to-job mobility for *all* unemployed people in denmark during 1996-2009. You most likely - except you Klaudia - won't understand the labels for the different job types, but regardless.. it's very pretty! And thanks again. (this is made in R with ggplot2, and is still a work in progress)

https://www.dropbox.com/s/9chg5ztmqv...e.150.pdf?dl=0

and

https://www.dropbox.com/s/etr6di9c00...e.150.pdf?dl=0
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment