hello,
I'm a bit over my head here and I hope you can help me, or at least point me in the right direction.
I got a massive dataset (5.8 mio observations per year, over 14 years), which deals with individuals' occupation over time. I need to sum up the changes in occupation in this timeperiod, so I can see from which occupation people go from and to, after periods of unemployment, like this:
http://i.stack.imgur.com/ujr8R.png
however, there are 150 categories, which means that I can't get stata to show the whole crosstable without linebreaks. I need to output the crosstab in .csv or some other format for further manipulation in GNU/R afterwards. So the first question is this:
1) how to extract the crosstab of this size to an csv/xls-file?
A solution would be to construct data in such a way that I can import it to SPSS, which is perfectly able to output such a huge crosstab to an excel-sheet.
Now, my data is structured like this, with (.) being missings
http://i.stack.imgur.com/Bh3To.png
As you can see, this is a simple dataframe in a long format. I need to construct the data in such a way so I can create a crosstab that shows the movement over years. E.g. one person might be occupied with "A" in 1996, then with "B" in 1997, then again in "A" in 1998. This would mean that for these two years, he would be be counted twice, as he would go from A to B and then from B to A. So the idea is that it just counts the number of shifts between categories, no matter the year.
I have created an example dataset:
https://www.dropbox.com/s/ihxo2temqp...hange.csv?dl=0
I hope my question is asked in a clear and precise way. I do not take lightly on the fact that you spend your time on this, so if I can improve my question in any way, please say so and I will do my utmost to refine my question so as not to waste your time. Thank you in advance.
I also asked the question on stackoverflow:
http://stackoverflow.com/questions/2...een-categories
I'm a bit over my head here and I hope you can help me, or at least point me in the right direction.
I got a massive dataset (5.8 mio observations per year, over 14 years), which deals with individuals' occupation over time. I need to sum up the changes in occupation in this timeperiod, so I can see from which occupation people go from and to, after periods of unemployment, like this:
http://i.stack.imgur.com/ujr8R.png
however, there are 150 categories, which means that I can't get stata to show the whole crosstable without linebreaks. I need to output the crosstab in .csv or some other format for further manipulation in GNU/R afterwards. So the first question is this:
1) how to extract the crosstab of this size to an csv/xls-file?
A solution would be to construct data in such a way that I can import it to SPSS, which is perfectly able to output such a huge crosstab to an excel-sheet.
Now, my data is structured like this, with (.) being missings
http://i.stack.imgur.com/Bh3To.png
As you can see, this is a simple dataframe in a long format. I need to construct the data in such a way so I can create a crosstab that shows the movement over years. E.g. one person might be occupied with "A" in 1996, then with "B" in 1997, then again in "A" in 1998. This would mean that for these two years, he would be be counted twice, as he would go from A to B and then from B to A. So the idea is that it just counts the number of shifts between categories, no matter the year.
I have created an example dataset:
https://www.dropbox.com/s/ihxo2temqp...hange.csv?dl=0
Code:
import delimited "path\to\testforstackexhange.csv", delimiter(";") varnames(1) clear
I also asked the question on stackoverflow:
http://stackoverflow.com/questions/2...een-categories
Comment