Lag string variables

Alyssa Beavers

Join Date: Feb 2015

Posts: 72
#1

Lag string variables

06 Feb 2019, 07:05

Hello,

I lagged (almost) all my variables, and I noticed that my one string variable is now all missing in the lagged variable.

Is there a way to lag string variables?

Many thanks,
Alyssa
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

06 Feb 2019, 10:30

To answer this correctly requires understanding the command you tried that did not work. As the Statalist FAQ linked to from the top of the page recommends,

12.1 What to say about your commands and your problem

Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!

So we know what Stata did - created missing values - but we don't know the command you gave. I can offhand think of at least three different commands you might have used, depending on whether you have cross sectional data or a single time series, and whether you used the time series variable list notation or not.
Comment

Alyssa Beavers

Join Date: Feb 2015
Posts: 72

06 Feb 2019, 11:09

Hi William Lisowski

I did the following:

Code:

local vars "Garden_Active_ GardenZip_ r_number_adults_ r_number_children_ GardenType__ Site_Visit_Curr_or_Prior_ Cold_Crop_ Fall_Crop_ Hot_Crop_ Seeds_ Sold_GID_ Size__orig__ Pickups_ MGTP_Curr_Yr_or_Prior_ Total_Classes_ Social_ Volunteer_ UR_Curr_Yr_or_Prior_ SOD_Curr_or_Prior_ KGD_Curr_Or_Prior_ Community_Garden_ Family_Garden_ School_Garden_ Market_Garden_ Yr_Act_Prior_ r2_number_adults_ r2_number_children_ r_Size_in_Acres_ Soil_Test_Curr_or_Prior_ r_Do_you_own_ Size_Cat_";
xtset Garden_ID Year;
foreach var in `vars'{;
     gen L`var'= L.`var';
};

All of the variables lagged except for Size__orig__. The above command did create a lag variable for size, but all values are missing. I have read elsewhere that Stata has issues with lagging string variables, but I was wondering if there is a workaround

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35736
#4

06 Feb 2019, 11:52

So, why is Size__orig__ string at all? This is the real question.

Code:

tab Size__orig__

is a first step here.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

06 Feb 2019, 11:54

Thank you. Now that I see you have cross-sectional data, I think the following will start you on your way to what you want.

Code:

bysort Garden_ID (Year): generate str16 LSize__orig__ = Size__orig__[_n-1] if Year[_n-1]==Year-1
Comment

LEILA BEN AOUN

Join Date: Jan 2018
Posts: 9

06 Feb 2019, 12:00

Hi,

If you don't find a solution , one solution would be to add a variable period_fwd1 to the variable, then save the important variables + this period variable and then save it as fwd_data , then open it and then merge this files using the key and the period variable so to have your lagged variable

original data

Id	period	expenses
A	2000m12	red
A	2001m1	yellow
A	2001m2	blue
A	2001m3	green
A	2001m4	violet
A	2001m5	red
A	2001m6	green
A	2001m7	blue

adding the variable :

Id	period	expenses	period_f1
A	2000m12	red	2001m1
A	2001m1	yellow	2001m2
A	2001m2	blue	2001m3
A	2001m3	green	2001m4
A	2001m4	violet	2001m5
A	2001m5	red	2001m6
A	2001m6	green	2001m7
A	2001m7	blue	2001m8

then save it keeping id expenses period_f1

Id	expenses	period_f1
A	red	2001m1
A	yellow	2001m2
A	blue	2001m3
A	green	2001m4
A	violet	2001m5
A	red	2001m6
A	green	2001m7
A	blue	2001m8

then rename period_f1 to period to then merge this dataset to the original one

I hope this will help you!

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35736
#7

06 Feb 2019, 12:19

If a lagged string variable makes sense, you really don't need the approach in #6. #5 gets you there directly.
Comment
Alyssa Beavers

Join Date: Feb 2015

Posts: 72
#8

06 Feb 2019, 12:35

Hi Nick Cox

Size__orig__ was a free text option and I am recoding it later in my do-file.

Thanks,
Alyssa
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35736
#9

06 Feb 2019, 12:37

Previous value makes therefore what sense? For a categorical data analysis?
Comment
Alyssa Beavers

Join Date: Feb 2015

Posts: 72
#10

06 Feb 2019, 12:44

Hi Nick Cox ,

I am not sure if I am understanding your question. I am looking at how variables in one year influence the outcome variable in the following year. I am using multillevel longitudinal logistic esgression. William Lisowski I see that you said that I have cross sectional data, and I actually have longitudinal data.

Thanks,
Alyssa
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35736
#11

06 Feb 2019, 12:45

How are you going to use lagged string variables in your analyses?
Comment
Alyssa Beavers

Join Date: Feb 2015

Posts: 72
#12

06 Feb 2019, 12:49

Hi Nick Cox ,

Thanks for the clarification. Yes, I am recoding into categorical variables.

Many thanks
Alyssa
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#13

06 Feb 2019, 13:04

Alyssa Beavers I was too hasty, I always mix up longitudinal and cross-sectional. The command is right (but keep[ reading, I think it's inappropriate), despite my incorrect comment.

With that said, I do not understand why you don't recode your string variable before lagging it.

I don't also don't understand why you are creating lagged variables, when most Stata modeling commands will accept (and prefer) that you use time series variable list notation to include them in the model. That is, this code

Code:

xtset Garden_ID Year xtreg y L.X

is preferred to this code

Code:

xtset Garden_ID Year generate LX = L.X xtreg y LX

because, for example using the L. notation in your model lets postestimation commands know that L.X and X are related, while LX and X will be treated as two unrelated variables.

I also note that your variables all seem to end in at least one "_" character, which suggests you reshaped your data from a wide layout to a long layout. You can get rid of the unwanted trailing underscore characters with

Code:

rename (*_) (*)

as described in the output of help rename group.
Comment
Alyssa Beavers

Join Date: Feb 2015

Posts: 72
#14

14 Feb 2019, 11:18

Hi William,

Thanks for your response. With regards to your question on why I don't use time series: the command I am using for analysis (gllamm) does not accept time series.
Yes, you are right that I could recode before lagging, but it is nice for me to have both the string and categorized data together in the final, lagged dataset so I can see how I categorized each string entry. Also, thanks for letting me know how to get rid of the underscores after my variable names!

Many thanks,
Alyssa
Comment

Announcement