Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate new variable from several variables

    I have a survey dataset from several countries from a few time periods. Each country and period has its own variable (coded 0 and 1) which practically refers to the same thing. Variable A1 for example is only for individuals in country A in period 1. Variable A2 for country A, period 2, Variable B1 for country B, period 1, and so on. Since I want to combine the variables, I initially did something like:
    Code:
    egen combined = rowtotal(A1 A2 B1)
    It wasn't my intention to add anything but I did so since I figured the other rows would be empty anyway. However, when I looked into the data browser, the observations with response "not applicable", "refusal", etc, also gave a value of 0.

    My question is this: is there a way to combine these variables while still maintaining the "not applicable", "refusal", etc responses?

  • #2
    There are many ways to combine variables,

    With the egen command rowtotal(A1 A2 B1) will give you A1 + A2 + B1, but ignoring missing values, If that is what you want, you're fine, but note the missing option too. That is, with the missing option specified and all of those variables missing in an observation (row), the result will be missing too (not zero).


    Otherwise, if that doesn't answer your question, what is that you want?

    Comment


    • #3
      The dataset I am using looks something like this:
      Country year ID A1 A2 A3 B1 B2 B3
      A 1 1 1 . . . . .
      A 2 2 . 0 . . . .
      A 3 3 . . did not respond . . .
      B 1 4 . . . 1 . .
      B 2 5 . . . . 0 .
      B 3 6 . . . . . did not respond


      Question A1 is answered by person 1 who is interviewed from country A in period 1, and so on. Variable A2 will be missing for the person with ID 1, since A2 is answered by person in country A in period 2, and so on. I want to generate a data that looks something like this:
      Combined
      1
      0
      did not respond
      1
      0
      did not respond


      Initially I thought the rowtotal command will solve my problem, but it turns out it considers the "did not respond" as 0, giving me (1,0,0,1,0,0) instead of the one above.
      I was hoping it could identify those who did not respond since refusal to answer the survey question is different from the yes/no response of 1 or 0, and it is also not missing.

      When I run sum on the variables, it shows min and max as 0 and 1 respectively. However, this is clearly not the case when I look into the data browser.
      Although I want to point out that the variables are in blue when I view it in the browser. Does this have something to do with the
      rowtotal command giving me 0 for "did not respond"?
      I was hoping I could copy the exact information into the new variable..

      Comment


      • #4
        Tell us more about your value labels, i.e. see which value labels are mentioned in


        Code:
        describe A1 A2 A3 B1 B2 B3
        and then use

        Code:
        label list
        to show them to use. If necessary look at the details for

        Code:
        help label 
        My guess is that "did not respond" is a value label corresponding to an extended missing value such as .a

        Comment


        • #5
          I did as above and yes you are right. The responses are .a .b .c for did not respond, not applicable etc..
          Is there a way to carry these into the new variable instead of having a 0 value, as what I generated using rowtotal?

          Comment


          • #6
            I think you need custom code, something like

            Code:
            clear 
            input str1 Country year ID A1 A2 A3 B1 B2 B3
            A 1 1 1 . . . . .
            A 2 2 . 0 . . . .
            A 3 3 . . .a . . .
            B 1 4 . . . 1 . .
            B 2 5 . . . . 0 .
            B 3 6 . . . . . .a
            end 
            
            gen wanted = . 
            
            foreach v in A1 A2 A3 B1 B2 B3 { 
                replace wanted = `v' if `v' != . 
            } 
            
            label def wanted .a "did not respond" 
            label val wanted wanted 
            
            list wanted, sepby(Country) 
            
               +-----------------+
                 |          wanted |
                 |-----------------|
              1. |               1 |
              2. |               0 |
              3. | did not respond |
                 |-----------------|
              4. |               1 |
              5. |               0 |
              6. | did not respond |
                 +-----------------+
            Note that #3 was almost as good as a dataex example, but not quite. See #12 in the FAQ Advice.

            Comment


            • #7
              Thanks a lot, Nick! This is exactly what I want.
              I am just wondering, how does stata read the "did not respond" observations? Is it treated as missing, or do I now have a variable with 3 categories?

              Comment


              • #8
                Do read the code to understand what it does. The extended missing value .a is not equal to the system missing value . so overwrites any such initial value. Then a value label is attached.

                My code does nothing about "not applicable", "refusal", etc You're expected to extend that code yourself. You didn't show the value labels as requested in #4. so I can't write code for anything but .a.

                Comment


                • #9
                  To expand on #8. The code will catch extended missing values .b .c and so on up to .z but you need to define extra value labels yourself. That is, suppose .c means refusal. The the code would read

                  Code:
                    
                   label def wanted .a "did not respond"  .c "refusal"

                  Comment


                  • #10
                    I was not aware of the stata term "extended missing value" until you mentioned it again.
                    I have read somewhere that extended missing values are also excluded from statistical analyses by default, so this answers my question in #7
                    Case closed. Thank you very much for your help.

                    Comment

                    Working...
                    X