Matching data over time?

Enas Farag

Join Date: Oct 2022

Posts: 25
#1

Matching data over time?

28 Oct 2022, 18:13

Hello,

I have a sample from the Current Population Survey (CPS). My data is monthly from 2015-present for the variables empsame, empstat, and labforce. EMPSAME indicates whether or not the respondent was employed by the same employer and the same job he/she reported working as his/her main job in the previous month's survey. What I am trying to do here is to define the "newly employed" as those who responded NO in EMPSAME because this means that they are working for a new employer compared to the previous month.
Once I have the set of "newly employed" in period t, I then want to split it into the ones who were employed in t-1, and those who were not employed in t-1.

I have an identifier variable that gives a unique id for each individual surveyed.

I do not know how I can use the data at hand to achieve the above goal. Any ideas on how to use the identifier to track individuals over each 2 consecutive months to be able to get the 2 sets?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#2

28 Oct 2022, 19:06

The solution to your problem will be something along these lines:

Code:

by id (date), sort: egen newly_employed = max(empsame == "No") by id (date): gen transition_type = empstat[_n-1] if empsame == "No"

But the details of how you would write the actual code depend on the details of how your data is organized and coded. Without example data to work with, nobody can give you the exact code you need. There are just too many possibilities, all consistent with your description, but each requiring different versions of the code. If you are able to figure it out from what I've shown, that's great.

If not, post back and use the -dataex- command to show example data from your actual data set, choosing a sample that reflects the particular issues your question raises. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.

Last edited by Clyde Schechter; 28 Oct 2022, 19:09.
Comment
Enas Farag

Join Date: Oct 2022

Posts: 25
#3

30 Oct 2022, 13:44

Hello Clyde,
Thank you so much. I am a beginner in STATA. I do not know if I can find a dataset in STATA that is similar to mine. Would it be helpful if I created a link for the dataset I am using?
Also, in your 1st code, why are you using "max" before the variable condition?
Thanks
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#4

30 Oct 2022, 15:35

Would it be helpful if I created a link for the dataset I am using?

Sorry, but no. I'm not going to risk my computer security downloading data from a source that I am not familiar with. As I suggested in #2, use the -dataex- command to show example data from your Stata dataset. If the issue is that your data is not yet imported into Stata, then it is premature to be asking for help with code. Import the data and then show the example from there.

...why are you using "max" before the variable condition?

The expression -empsame == "No"- that is the argument of the -egen, max()- function is a logical expression. Stata evaluates it for each observation, and when it is true, Stata returns 1, when false, 0. So looking at all of the observations for a given id, some of them may have -empsame == "No"- and others not, so -empsame == "No"- will be a mixture of 0's and 1's in the different observations. If -empsame == "No"- is ever true for a given id, then in those observations, its value will be 1, and since the values in the other observations are 0, the largest value will be 1. On the other hand if -empsame == "No"- is never true for a given id, its value will be 0 in all those observations, and the largest value will be 0. Thus the entire command returns 1 when empsame takes on the value "No" at least once, and 0 if it never does.

The key concept is understanding that logical expressions in Stata have the numeric value of 1 when they are true and 0 when they are false. This makes it possible to calculate ever with the -egen, max()- function. It also makes it possible to calculate always with the -egen, min()- function by similar reasoning.
Comment

Announcement

Matching data over time?

Comment

Comment

Comment