Urgent: counting correctly in panel data

Anders Green

Join Date: Mar 2018

Posts: 11
#1

Urgent: counting correctly in panel data

18 Apr 2018, 05:35

Hello everybody,

I am working on a panel dataset consisting of individual-by-month data for one year. I have attached a picture of its structure and I would really like your help.

I have a variable - Branche - which contains information on the individuals workplace/industry each month. It is a six digit code and it is changes in this variable, I would like to count by individual under three certain conditions.

First of, I would like to count the total number of changes in Branche (going from six digits to another six digits). A change from six digits to missing is not relevant, but from missing to six digits is - unless the individual has missing for the first months and then has a six digit code. In other words it is only going from sex digits to another with "missing breaks" being welcome. Check out the column "Count" in the picture.

Second of, another variable - Service - takes on either "B" or "OF". If Service is "OF" then Branche typically is missing (.). I would like to count each time Branche changes from six digits to another without a "missing break" due to "OF" in Service. Check out the column "B-to-B".

Third of, I would like to do the opposite of "second" and count each time it goes B-to-OF/missing-to-B, i.e. from a six digit code to missing and to another six digits again. Check "B-to-OF-to-B".

I have tried replacing with multiple conditions on Branche and Service, but is does not seem to do the trick correctly. One example of code is:

gen b2b = 0

by ID: replace b2b = 1 if Branche[_n] != Branche[_n-12] & Branche[_n] !=. & Branche[_n-12] != . & Service[_n-1/12] != "OF" & b2b[_n-1/11] != 1

The last part of the code is to secure that a situation with ten missing (from _n = 2 to _n=11) is still being counted if _n = 1 and _n-12 are different in terms of Branche. And to be secure that a change is only counted once.

Last edited by Anders Green; 18 Apr 2018, 06:27.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35724
#2

18 Apr 2018, 05:44

Claims of urgency are widely regarded as anti-social and explicitly advised against. There is no queue here that you can jump because you are under time pressure (you think that other people's problems are less important than yours: would you like to explain to them all why that is so?).

You would be quite likely to get an answer if you followed advice on giving data examples.

See https://www.statalist.org/forums/help

especially https://www.statalist.org/forums/help/#stata
Comment

Anders Green

Join Date: Mar 2018
Posts: 11

18 Apr 2018, 06:26

My sincere apologies because of course you are right, mr. Cox. The lack of respect was solely driven by a lot of frustration with my own abilities to solve the puzzle and I would like to change my headline accordingly, but it is not possible, as far as I can see. But yes, a sincere apology from my behalf. It will not repeat it self.

In relation to -dataex- I did try it, but my dataset is 6.6 million observations long and the example does not provide clarity of the different situations I am asking for help for. It looks like this (ID in the above example = lobenr below. The same for: Branche = fuldbranche and Service = status):

EDIT: I tried changing it to fit my problem.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double lobenr byte month str4 arbsted long fuldbranche str4 nykom2008 str2(kon type) double(national alderultimo mdr aar dag) str2 status str16 branche str10(udd1 sektor) long ofvsb
1000002  1 ""    . "851" "1" "D" 1  36.77481177275838 12 2008 28 "OF" "" "Erhverv" "" 2
1000002  2 ""    450011 "851" "1" "D" 1  36.77481177275838 12 2008 28 "B" "" "Erhverv" "" 2
1000002  3 ""    . "851" "1" "D" 1  36.77481177275838 12 2008 28 "OF" "" "Erhverv" "" 2
1000002  4 ""    435014 "851" "1" "D" 1  36.77481177275838 12 2008 28 "B" "" "Erhverv" "" 2
1000002  5 ""    . "851" "1" "D" 1  36.77481177275838 12 2008 28 "OF" "" "Erhverv" "" 2
1000002  6 ""    . "851" "1" "D" 1  36.77481177275838 12 2008 28 "OF" "" "Erhverv" "" 2
1000002  7 ""    . "851" "1" "D" 1  36.77481177275838 12 2008 28 "OF" "" "Erhverv" "" 2
1000002  8 ""    452222 "851" "1" "D" 1  36.77481177275838 12 2008 28 "B" "" "Erhverv" "" 2
1000002  9 ""    . "851" "1" "D" 1  36.77481177275838 12 2008 28 "OF" "" "Erhverv" "" 2
1000002 10 ""    . "851" "1" "D" 1  36.77481177275838 12 2008 28 "B" "" "Erhverv" "" 2
1000002 11 ""    487787 "851" "1" "D" 1  36.77481177275838 12 2008 28 "OF" "" "Erhverv" "" 2
1000002 12 ""    . "851" "1" "D" 1  36.77481177275838 12 2008 28 "OF" "" "Erhverv" "" 2
1000009  1 ""    . "101" "0" "I" 2 38.614647501711154 12 2008 28 "B" "" ""        "" 2
1000009  2 ""    . "101" "0" "I" 2 38.614647501711154 12 2008 28 "OF" "" ""        "" 2
1000009  3 "101" 351478 "101" "0" "I" 2 38.614647501711154 12 2008 28 "B"  "" ""        "" 1
1000009  4 "101" 468889 "101" "0" "I" 2 38.614647501711154 12 2008 28 "B"  "" ""        "" 1
1000009  5 ""    . "101" "0" "I" 2 38.614647501711154 12 2008 28 "B"  "" ""        "" 1
1000009  6 ""    . "101" "0" "I" 2 38.614647501711154 12 2008 28 "OF" "" ""        "" 2
1000009  7 "101" 471315 "101" "0" "I" 2 38.614647501711154 12 2008 28 "B"  "" ""        "" 1
1000009  8 "101" 471315 "101" "0" "I" 2 38.614647501711154 12 2008 28 "B"  "" ""        "" 1
end
label values ofvsb ofvsb
label def ofvsb 1 "B", modify
label def ofvsb 2 "OF", modify

Last edited by Anders Green; 18 Apr 2018, 06:32.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

18 Apr 2018, 07:11

OK; thanks.

For #1 consider

Code:

. bysort lobenr (month) : gen count = fuldbranche != fuldbranche[_n-1] &  fuldbranche < .

. by lobenr : replace count = 0 if count == 1 & sum(count) == 1
(2 real changes made)

. list lobenr month fuldbranche count , sepby(lobenr)

     +------------------------------------+
     |  lobenr   month   fuldbr~e   count |
     |------------------------------------|
  1. | 1000002       1          .       0 |
  2. | 1000002       2     450011       0 |
  3. | 1000002       3          .       0 |
  4. | 1000002       4     435014       1 |
  5. | 1000002       5          .       0 |
  6. | 1000002       6          .       0 |
  7. | 1000002       7          .       0 |
  8. | 1000002       8     452222       1 |
  9. | 1000002       9          .       0 |
 10. | 1000002      10          .       0 |
 11. | 1000002      11     487787       1 |
 12. | 1000002      12          .       0 |
     |------------------------------------|
 13. | 1000009       1          .       0 |
 14. | 1000009       2          .       0 |
 15. | 1000009       3     351478       0 |
 16. | 1000009       4     468889       1 |
 17. | 1000009       5          .       0 |
 18. | 1000009       6          .       0 |
 19. | 1000009       7     471315       1 |
 20. | 1000009       8     471315       0 |
     +------------------------------------+

The trick there is explained at https://www.stata.com/support/faqs/d...t-occurrences/

Last edited by Nick Cox; 18 Apr 2018, 07:24.

Comment

Jesse Wursten

Join Date: Jan 2016

Posts: 915
#5

18 Apr 2018, 07:14

One tip I give is that it can be very useful to xtset your data. E.g. xtset ID month will tell stata that you have monthly data for each ID. Then you can use L. and F. to refer to lagged and lead values, and Stata will know not to consider the first observation of the next customer as consecutive to the last value of the current one. Then it should just be a matter of combining the right conditions?
Comment

Announcement