Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Labels when using append

    hi,

    i have a number of data sets of the same survey in different years. In each of these data sets I create a combination variable xy using the below code:
    Code:
    egen xy = group(x  y), label
    so the new variable takes the labels of x and y. However, the underlying values differ in each year depending on which combinations are available. For example, in Year 1, there may be 350 combinations whereas in Year too only 340. Therefore, values 341-350 won't exist in Year 2.

    I then need to append all the data sets together but when I do so my labels get mixed up and I am even missing some labels in each year. How can I solve this problem?

    Thanks in advance.

  • #2
    Two possibilities:

    1. The original x y are what you want to work with.

    2. The value labels are informative. So decode to string equivalent; then if desired encode to a consistent numeric variable.
    But watch out for problems with inconsistent spelling or extra punctuation characters (including spaces).

    Comment


    • #3
      Thanks for this.

      I realised I failed to mention something important about my xy variable. After creating it I manipulated some of its categories that were small and regrouped them by pooling smalls ones together based on a specific characteristic (same categories in each year). So for example in Year 1 i did
      Code:
      replace xy=105 if xy==121
      label define xy 105 "xymodifiedlabel", modify
      In Year 2 I did the same thing only that the number behind the label was not originally 105 but 108 i.e. :
      Code:
      replace xy=108 if xy==120
      label define xy 108 "xymodifiedlabel", modify
      And I did this in all my years before appending them. So now when I append, my xy variable has the same label but for a different number. In other words, the labels are the same in all my years but the numbers behind them are different in each year because of the way the variable was generated.

      Sorry if I am not being clear. I can see how what I did confuses Stata I just dont know how to correct it. Would appending my files first and then creating the xy variable solve the problem? For example I would then deal with small categories by doing
      Code:
      replace xy=105 if xy==121 & year==1
      How would I then assign the xdmodified label to the 105 category in just Year 1 and this same label to 108 category in Year 2 and so on?

      Apologies for the long post.

      Comment


      • #4
        If you made modifications consistently, the advice in #1 stands. But the more you tell us, the more it seems that you should work on the original two variables after appending all the datasets. That way, you can be sure that you are acting consistently. Even more important, the more ad hoc changes you make in different files at different times, the less confidence any user or assessor of your work can have about the consequences.

        This may seem like extra work, but it's advisable for an audit trail worth anything at all.

        By the way, Stata is never confused. It is sometimes puzzled at what you are asking, but never confused. Confusion is the privilege of the user.

        Comment


        • #5
          thank you for this. I tried everything, however I end up with consistently defined labels of my variable but with different values behind each labeled category. Is there a way in which I can modify the values of my variable (before the append) so that they are consistent in all years? Maybe by creating a new variable that is going to be based on the old one but with the option to be able to manipulate the values behind each label?

          Comment


          • #6
            Code:
            recode oldvar, gen(newvar)
            is the code I was looking for.

            Comment

            Working...
            X