Create a variable based on times of one value appeared in the other variable in panel data

Connie Gao

Join Date: Jun 2016

Posts: 40
#1

Create a variable based on times of one value appeared in the other variable in panel data

26 Apr 2018, 09:23

Hi Experts:

I have a panel data looks like this:

ID employment New Variable
1 1 1
1 1 1
1 1 1
1 2 0
1 2 0
1 1 1
1 2 0
2 1 1
2 1 1
2 2 2
2 2 2
2 2 2
2 2 2

I want to create a new variable that equals to 2 if "2" in the variable employment appears continuously from first time of being observed to the last time of being observed. If the number "2" only appears sporadically, and does not last to the last time of being observed. Then the new variable only records this as "0".

Is there anyone who knows how to code this?

Thank you in advance!

Connie
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

26 Apr 2018, 10:20

Your example calculation of the new variable is inconsistent with your explanation of what your want. Your explanation calls for a variable that will take on the values 0 and 2, but your example includes many observations where it is 1. Moreover, you say that you want it to be 2 if employment remains 2 throughout once the first observation with 2 occurs. This is the case for ID 2 in your data, yet you have it as 1 in some of that person's observations.

Since I don't understand your example, I'll just show you how to get a variable that does what you asked for in words. Perhaps you can take it from there.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(id employment wanted_cg) 1 1 1 1 1 1 1 1 1 1 2 0 1 2 0 1 1 1 1 2 0 2 1 1 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2 end gen long obs_no = _n by id (obs_no), sort: gen byte two_in_two_out = sum((employment==2) != (employment[_n-1] ==2)) by id (obs_no): gen byte wanted_cs = two_in_two_out[_N] == 1 replace wanted_cs = 2*wanted_cs

Note: The variable wanted_cs which I have calculated above does what you asked for in your explanation. The variable wanted_cg is what you showed as "new variable" in your example. I leave it to you to reconcile the difference between them.

In the future, when showing data examples, please use the -dataex- command to do so, as I have done in this response. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35642

26 Apr 2018, 10:25

Presumably you have some kind of time variable too; if not that is surprising if not alarming.

You don't appear to state all the rules which are

2 for 2 if there was just one change to 2 before the end of the panel

0 for 2 otherwise

1 otherwise.

Code:

clear
input id employment new
1 1 1
1 1 1
1 1 1
1 2 0
1 2 0
1 1 1
1 2 0
2 1 1
2 1 1
2 2 2
2 2 2
2 2 2
2 2 2
end

sort id, stable
by id : gen time = _n

by id : gen changeto2 = sum(employ == 2 & employ[_n-1] != 2)

by id : gen wanted = cond(employ == 2 & changeto2[_N] == 1, 2, employ == 1)

list, sepby(id)

    +------------------------------------------------+
     | id   employ~t   new   time   change~2   wanted |
     |------------------------------------------------|
  1. |  1          1     1      1          0        1 |
  2. |  1          1     1      2          0        1 |
  3. |  1          1     1      3          0        1 |
  4. |  1          2     0      4          1        0 |
  5. |  1          2     0      5          1        0 |
  6. |  1          1     1      6          1        1 |
  7. |  1          2     0      7          2        0 |
     |------------------------------------------------|
  8. |  2          1     1      1          0        1 |
  9. |  2          1     1      2          0        1 |
 10. |  2          2     2      3          1        2 |
 11. |  2          2     2      4          1        2 |
 12. |  2          2     2      5          1        2 |
 13. |  2          2     2      6          1        2 |
     +------------------------------------------------+

.

EDIT: Clyde makes very similar comments. Teachers' t test may be applied at your discretion.

Comment

Connie Gao

Join Date: Jun 2016

Posts: 40
#4

27 Apr 2018, 01:31

Clyde and Nick: thank you so much for the fabulous coding!! They are just what I want . Will remember to use -dataex- next time.
Comment
Connie Gao

Join Date: Jun 2016

Posts: 40
#5

01 May 2018, 13:34

Hi Professor Clyde and Nick: regarding to my above post, I am wondering how to solve the missing variable in such case.

Suppose I have below data. According to my definition, if employment=2 appears continuously to the end of the wave, even though there is missing value in between, it is still recoded as 2. This is the case in ID1. If there is missing at the last wave, but previous waves show employment=2 continuously, it is still recoded as 2. This is the case in ID2. However, if there is missing value in certain wave, and the last wave is employment=1, then previous employment=2 should be coded as 0. This is the case in ID3.

[CODE]clear

. input byte (id employment wanted)

id employ~t wanted
1. 1 1 1
2. 1 2 2
3. 1 2 2
4. 1 . .
5. 1 2 2
6. 2 1 1
7. 2 2 2
8. 2 2 2
9. 2 2 2
10. 2 . .
11. 3 1 1
12. 3 2 0
13. 3 2 0
14. 3 . .
15. 3 1 1
/CODE]

I am not sure how to revise the code you provided with missing variable.

Thank you,

Connie
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#6

01 May 2018, 13:48

I'm confused. You start by talking about missing values, but in your example, you never change the missing values: you leave them missing in wanted. So I'll just ignore that part of the post.

It seems you are concerned with recoding employment = 2 as employment = 0 if the final observation for a given id has employment = 1

Code:

clear input byte(id employt wanted) 1 1 1 1 2 2 1 2 2 1 . . 1 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 . . 3 1 1 3 2 0 3 2 0 3 . . 3 1 1 end gen long obs_no = _n by id (obs_no), sort: replace employt = 0 if employt == 2 & employt[_N] == 1 // VERIFY RESULTS ARE AS DESIRED assert employt == wanted
Comment

Announcement

Create a variable based on times of one value appeared in the other variable in panel data

Comment

Comment

Comment

Comment

Comment