Identify time variable in Panel dataset

Ali Abutaleb

Join Date: Jun 2022

Posts: 36
#1

Identify time variable in Panel dataset

30 Jun 2022, 13:23

Hello all,

I have a five-quarter longitudinal dataset from (labour force survey UK LFS) April 2019 to June 2020. LFS has a rotating panel, so we can follow people for a five conceive quarters (five waive )

The first thing I need to define is the time index in this panel data, as seen in the example below. There is a variable "PERSID", which means persistent identifier, a variable "FLOW", which means Categories relating to labour force gross flows, and variable "FLEXW73", which means a worker who has a zero-hours contract.

note that: because I have longitudinal data the majority of variables are repeated 5 times, on other words, the variable "FLEXW71" this for quarter 1 and "FLEXW72" is for quarter 2 and so on...

my question :
how can I use this variable " PERSID" as a time variable? For example, if I need to define the flow of some worker's group ( FLEXW73) from the first quarter until the last quarter ( increase or decrease of workers in this type of contract ). what do I have to do in this case?

Any suggestions on how to go about commanding the dataset?

input double PERSID byte(FLOW anflow FLEXW73)
210292030101 3 1 1
400192040101 3 11 1
480892050102 3 1 1
600292010103 3 1 1
610392030101 3 1 1
630392070101 3 1 1
640892040102 3 1 1
790292040101 3 1 1
860792020101 3 1 1
910492010102 5 22 1
1020292020103 3 1 1
1110392030101 6 7 1
1201092020101 3 1 1
1230392030101 3 1 1
1240192040101 3 1 1
1280692060102 3 1 1
end
label values FLOW FLOW
label def FLOW 3 "In employment at first quarter; in employment at final quarter (EE)", modify
label def FLOW 4 "In employment at first quarter; unemployed at final quarter (EU)", modify
label def FLOW 5 "In employment at first quarter; inactive at final quarter (EN)", modify
label def FLOW 6 "Unemployed at first quarter; in employment at final quarter (UE)", modify
label def FLOW 9 "Inactive at first quarter; in employment at final quarter (NE)", modify
label def FLOW 10 "Inactive at first quarter; unemployed at final quarter (NU)", modify
label def FLOW 11 "Inactive at first quarter; inactive at final quarter (NN)", modify
label values anflow anflow
label def anflow 1 "In employment in all quarters (E)", modify
label def anflow 4 "In employment at first quarter; unemployed at final quarter (EU)", modify
label def anflow 5 "In employment at first quarter; inactive at final quarter (EN)", modify
label def anflow 7 "Unemployed at first quarter; in employment at final quarter (UE)", modify
label def anflow 8 "Inactive at first quarter; in employment at final quarter (NE)", modify
label def anflow 10 "Employed at first; unemployed; in employment at final quarter(EUE)", modify
label def anflow 11 "Employed at first; inactive; in employment at final quarter (ENE)", modify
label def anflow 14 "Inactive at first; employed; inactive at final quarter (NEN)", modify
label def anflow 21 "Inactive at first; unemployed; employed at final quarter (NUE)", modify
label def anflow 22 "3 or 4 moves between categories", modify
label values FLEXW73 FLEXW73
label def FLEXW73 1 "Yes", modify

Thank you,
Ali

Last edited by Ali Abutaleb; 30 Jun 2022, 13:54.
Tags: None
Daniel Schaefer

Join Date: Mar 2020

Posts: 818
#2

30 Jun 2022, 17:17

When you say you have variables repeated five times, one for each wave, this makes me think your data is in wide format. You won't have a time variable unless your data is in long format. I would suggest reading the -reshape- documentation as a next step. https://www.stata.com/manuals13/dreshape.pdf
1 like
Comment
Ali Abutaleb

Join Date: Jun 2022

Posts: 36
#3

05 Jul 2022, 14:04

Originally posted by Daniel Schaefer View Post

When you say you have variables repeated five times, one for each wave, this makes me think your data is in wide format. You won't have a time variable unless your data is in a long format. I would suggest reading the -reshape- documentation as a next step. https://www.stata.com/manuals13/dreshape.pdf

Many thanks for your reply and suggestion
Yes, my data is in wide format, so I reshaped it to a long format. However, i got now have some missing values in the same variable. How can I deal with this matter ? is it just drop the missing value? Or this maybe affect the result?

more explanation :

In the wide-format I have these variables :
Hourpay1, Hourpay2 Hourpay3, Hourpay4 , Hourpay5.
Flexw73, Flexw74, Flexw75. (here, we have just 3 waives in longitudinal data because this question is asked just in some quarters)

I got now in the log format :
Hourpay (for 5 quarters )
Flexw7 (for 3 quarters only ), so this gives me a missing value in this variable in two quarters.

Could you advise the best way to deal with this issue?
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#4

05 Jul 2022, 14:19

Hey Ali, watch this please. Here at Statalist, for optimal service (which I do that you'll get here), we demand (more like strongly encourage) that you follow the instructions on the YouTube video I've linked here. That is, a data example using dataex AND the code you'd tried so far (properly formatted in the appropriate delimiters).

Using Statalist is just like making apple pie. If I asked you to make my sister's favorite version of apple pie, you'd likely need to know what the SPECIFIC ingredients are and the steps you do to make said pie. So far you've given some of the ingredients, but it isn't formatted in code delimiters, and you haven't given the recipe, that you've tried and failed, you've just sort of described the issue, which is fine, but we need the real details.

Perhaps reshape is the answer. Maybe it isn't. But to know the truth, we must see your data and code you've tried.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 818
#5

05 Jul 2022, 16:02

Hi Ali,

What to do about missing data is a nuanced question.

is it just drop the missing value? Or this maybe affect the result?

Yes, dropping missing observations can bias your results. You may be able to detect bias by taking summary statistics on your analytic sample and comparing them to your observations for which there are missing values. If there are statistically significant differences between the means for these groups, then that is evidence that you will introduce some bias when you drop missing observations. You should definitely think about this carefully if you are going to drop entire waves of data from your analysis. That being said, listwise deletion may be your best option. It depends on the details of your data, your research question, and your modeling strategy. Jared Greathouse is quite right when he says that reshape long may not even be best for you and your data. It depends on the details of analysis you want to do. I would strongly advise reaching out to a senior colleague to discuss this with them in detail.
Comment

Announcement

Identify time variable in Panel dataset

Comment

Comment

Comment

Comment