Create Indicator Based on Change in Another Variable

Cody Fiduccia

Join Date: Jun 2019
Posts: 13

Create Indicator Based on Change in Another Variable

05 Jun 2019, 15:25

Hi again, STATA friends!

Great first experience using the forum, thank you to the respondents for the help. I now have a question regarding categorical variables based off of the content of other variables.

I'm trying to create a field which will tell me if, and how many times, an individual in the data changed living location based on city. I could be more specific and try to obtain changes by the field containing street address, but there is input [spelling] error on the dataset's side which would taint the results. City changes will be fine for now.

I used the following code to successfully create a variable:

by ind_id (city), sort: gen changed = (city[1] != city[_N])

...but it doesn't account for an individual who may have moved more than once. I would like to have a result which displays a category for how many changes in 'city' are noted per ind_id, but then stop at the final change.

For example, I'd like to create the variable changed below:

ind_id	visit_id	city	changed
1	1	Hope	0
1	2	Hope	0
1	3	Aurora	1
1	4	Aurora	.
1	5	Aurora	.

2	1	Jackson	0
2	2	Jackson	0

3	1	Hope	0
3	2	Aurora	1
3	3	Aurora	.
3	4	Jackson	2
3	5	Jackson	2
3	6	Hope	3

Thanks in advance and I'm sorry I couldn't paste any code or screenshots in,data is housed on a secured remote desktop for privacy reasons.

-Peter

Last edited by Cody Fiduccia; 05 Jun 2019, 15:27.

Tags: categorical, data, panel data, string

Cody Fiduccia

Join Date: Jun 2019

Posts: 13
#2

05 Jun 2019, 15:45

NOTE: If it makes a difference, I could also note a change in zip code (numerical field), instead of city (string field)!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#3

05 Jun 2019, 17:45

Well, I don't fully understand what you want. It's something like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(ind_id visit_id) str7 city byte changed 1 1 "Hope" 0 1 2 "Hope" 0 1 3 "Aurora" 1 1 4 "Aurora" . 1 5 "Aurora" . 2 1 "Jackson" 0 2 2 "Jackson" 0 3 1 "Hope" 0 3 2 "Aurora" 1 3 3 "Aurora" . 3 4 "Jackson" 2 3 5 "Jackson" 2 3 6 "Hope" 3 end by ind_id (visit_id), sort: gen wanted = sum(city != city[_n-1]) - 1

Where this disagrees with your variable, changed, is when you have missing value for changed. At first I thought perhaps you want to put a missing value there after the final change for a given ind_id. That's what you (appear to) do with ind_id1. But then with ind_id3, you have a missing value when visit_id == 3 even though there are still changes to come. So I haven't been able to figure out what you want those missing values to signify. But apart from the missing values, this code gives you what you asked for.
1 like
Comment

Cody Fiduccia

Join Date: Jun 2019
Posts: 13

05 Jun 2019, 18:44

Hi Clyde - Thank you once again! Apologies for my error in describing the example data above. By indicating '.' I meant to signify that I wished the sequence to end after the first instance of each change in city. What you provided did work, though!

Is there a way by which I can modify the code you provided above to provide a numerical output only for the first instance of change? For example, once someone is noted as moved, until the next instance of city change it does not repeat the number but instead marks it as null. The reason I'm asking for it this way is because I'd like to be able to tabulate 'wanted' and show how many people moved cities within one year (but as it is now it is also counting each instance of the new city, inflating my n per category within 'wanted').

I've modified the example with which we were working in the previous reply to what I'm looking for:

ind_id	visit_id	city	wanted
1	1	"Hope"	0
1	2	"Hope"	-
1	3	"Aurora"	1
1	4	"Aurora"	-
1	5	"Aurora"	-

2	1	"Jackson"	0
2	2	"Jackson"	-

3	1	"Hope"	0
3	2	"Aurora"	1
3	3	"Aurora"	-
3	4	"Jackson"	2
3	5	"Jackson"	-
3	6	"Hope"	3

Thank you so much again for the help, I'm learning a lot.

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30101

05 Jun 2019, 19:12

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(nd_id visit_id) str9 city byte wanted
1 1 `""Hope""'    0
1 2 `""Hope""'    .
1 3 `""Aurora""'  1
1 4 `""Aurora""'  .
1 5 `""Aurora""'  .
2 1 `""Jackson""' 0
2 2 `""Jackson""' .
3 1 `""Hope""'    0
3 2 `""Aurora""'  1
3 3 `""Aurora""'  .
3 4 `""Jackson""' 2
3 5 `""Jackson""' .
3 6 `""Hope""'    3
end

by nd_id (visit_id), sort: gen change_count = sum(city != city[_n-1]) - 1
by nd_id (visit_id): replace change_count = . if city == city[_n-1]

Comment

Cody Fiduccia

Join Date: Jun 2019

Posts: 13
#6

05 Jun 2019, 19:26

Clyde - wonderful, exactly what I was looking for and it helped rework my tables to show a rough proxy for mobility across the year of data. You're a lifesaver! Thank you very much, I sincerely appreciate both the easy-to-follow instructions as well as the exceedingly timely responses.

Warm regards,
PCF
Comment
Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#7

06 Jun 2019, 01:14

Just make 1-line for the solution.

Code:

bys nd_id (visit_id): gen x = sum(_n>1) if city != city[_n-1]
3 likes
Comment

Announcement

Create Indicator Based on Change in Another Variable

Comment

Comment

Comment

Comment

Comment

Comment