Using a loop to append multiple waves of data

Chris Boulis

Join Date: Feb 2019

Posts: 368
#1

Using a loop to append multiple waves of data

24 Sep 2019, 00:34

I want to create a loop to append multiple waves of panel data. Most vars appear in each wave, three additional vars only appear in 4 waves. Should I therefore create two separate loops to append the waves?

I understand when appending waves the vars names need to be the same. In the panel data I have, a prefix is added to each varname to reflect the wave (age in wave 1 is aage, year 2 is bage, year 3 is cage), and for appending the vars need to have the same name. So can we code the removal this prefix within the loop to append the waves? And if not, how would I go about coding this?

Using the source datafile I save the renaming vars to a temp datafile, then save the appending to a new datafile, right? Some clarity in the code on this would be appreciated.

Thank you in advance.
Tags: None
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#2

24 Sep 2019, 00:46

Answered in part in this post: https://www.statalist.org/forums/for...nd-or-to-merge

But in general:
you need to rename the variable to 'age' in each dataset if you want Stata to put the values of 'age' from dataset a, , b, and c, into a single variable called age.
If you do not rename, Stata will still append, but he appended dataset will have 3 variables, aage, bage and cage, which will hold the values for dataset a in aage, and be missing for bage and cage, in a single observation.

So if you have a dataset with variables A B C, and you the append a dataset with variables A B, this is fine. Values for C will be set to missing for the appended set.
If you have a dataset with variables A B C, and you append a dataset with variables A B C X, his is fine. Values for X will be set to missing for the dataset hat was already in memory.
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 368
#3

24 Sep 2019, 01:55

Thank you Jorrit Gosens.
Comment
Muskan Singh

Join Date: Sep 2019

Posts: 11
#4

29 Sep 2019, 08:26

I have three dta files for three sectors of an economy (and these are the only three sectors that constitutes the country): rural, urban, and semi-urban. Each of the files has 3000 observation. I have to run this OLS regression:
log(wage)= a+ summation b(sectors)+c(hhsize)+errors

where summation implies summation of sectors running from 1 to 3 and b represents their respective coefficients

Since it is a categorical variable the final results will have only 2 categories.

The catch is I have to run this equation at the national level but without loading the data for all sectors at one go, that means I cannot have more than 3000 observations when I am trying to run this equation at the national level which means appending is not to be done explicitly. This is a programming task that I have to accomplish. Please help!
Comment

Announcement

Using a loop to append multiple waves of data

Comment

Comment

Comment