Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using a loop to append multiple waves of data

    I want to create a loop to append multiple waves of panel data. Most vars appear in each wave, three additional vars only appear in 4 waves. Should I therefore create two separate loops to append the waves?

    I understand when appending waves the vars names need to be the same. In the panel data I have, a prefix is added to each varname to reflect the wave (age in wave 1 is aage, year 2 is bage, year 3 is cage), and for appending the vars need to have the same name. So can we code the removal this prefix within the loop to append the waves? And if not, how would I go about coding this?

    Using the source datafile I save the renaming vars to a temp datafile, then save the appending to a new datafile, right? Some clarity in the code on this would be appreciated.

    Thank you in advance.

  • #2
    Answered in part in this post: https://www.statalist.org/forums/for...nd-or-to-merge

    But in general:
    you need to rename the variable to 'age' in each dataset if you want Stata to put the values of 'age' from dataset a, , b, and c, into a single variable called age.
    If you do not rename, Stata will still append, but he appended dataset will have 3 variables, aage, bage and cage, which will hold the values for dataset a in aage, and be missing for bage and cage, in a single observation.

    So if you have a dataset with variables A B C, and you the append a dataset with variables A B, this is fine. Values for C will be set to missing for the appended set.
    If you have a dataset with variables A B C, and you append a dataset with variables A B C X, his is fine. Values for X will be set to missing for the dataset hat was already in memory.

    Comment


    • #3
      Thank you Jorrit Gosens.

      Comment


      • #4
        I have three dta files for three sectors of an economy (and these are the only three sectors that constitutes the country): rural, urban, and semi-urban. Each of the files has 3000 observation. I have to run this OLS regression:
        log(wage)= a+ summation b(sectors)+c(hhsize)+errors

        where summation implies summation of sectors running from 1 to 3 and b represents their respective coefficients

        Since it is a categorical variable the final results will have only 2 categories.

        The catch is I have to run this equation at the national level but without loading the data for all sectors at one go, that means I cannot have more than 3000 observations when I am trying to run this equation at the national level which means appending is not to be done explicitly. This is a programming task that I have to accomplish. Please help!

        Comment

        Working...
        X