Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • a new variable based on two existing variables

    Hi,

    I'm using SHARE dataset where there is a dummy variable for people having played chess/cards in the previous year (ac035d10) and another one for people having played sudoku/puzzles in the previous year (ac035d9).
    I want to generate a new dummy variable "edu_inf" (people involved in informal learning activities in the previous year) based on the two existing variables.
    Obviously some people could have played both chess/cards and sudoku/puzzles in the previous year.
    How can I generate in STATA this new variable?



  • #2
    Fine, but what is the definition of the new indicator (*) variable you want?

    You can have either

    Code:
    gen both = ac035d9 & ac035d10
    or

    Code:
    gen either = ac035d9 | ac035d10 
    or both!

    (*) I have to recommend this term over "dummy".

    Comment


    • #3
      Welcome to Statalist.

      So I assume that both ac035d10 and ac035d9 are coded 1 for having done the specified activity and 0 for having not done it, and I assume you want to know whether the individual has done at least one type of learning activity.

      In what follows, it will help you to understand that when Stata evaluates a logical expression, a "true" result returns a value of 1, and a "false" result returns a value of 0. Conversely, Stata treats values of 0 as "false" and any non-zero value - including the Stata missing values like . - as "true".

      The most general syntax for what you want is
      Code:
      generate edu_inf = 0
      replace edu_inf =1 if  ac035d10==1 | ac035d9==1
      Note the use of the "vertical line" (also called "pipe") character to indicate "or". For any observation for which both variables have missing values, the result will be 0.

      That can be shortened to
      Code:
      generate edu_inf = ac035d10==1 | ac035d9==1
      because if one condition or the other (or both) is true, the result will be a 1, otherwise the result will be a zero. Again, if both are missing the result will be zero.

      Finally, if there are no missing values for either of the variables, this can be further shortened to
      Code:
      generate edu_inf = ac035d10 | ac035d9
      but in this case, if either is missing, the result will be 1 which is not what you want. I am rarely willing to make that assumption in code that I write, because inevitably the code gets applied to data different than what I developed it on, and there are missing values, and the results are not what I would want.

      Comment


      • #4
        Thank you. Both ac035d10 and ac035d9 actually have missing values. Can I also keep missing values for the new variable edu_inf?

        Comment


        • #5
          I want to be sure I understand what "keep the missing values" means. I'd also like to address the question of whether you meant "both" or "at least 1". Please specify the results you want for each of the following possibilities.

          Also, are the missing values in your data the Stata system missing value . or do you have Stata extended missing values like .a .b .c ... .z?

          ac035d10 ac035d9
          0 missing
          1 missing
          missing 0
          missing 1
          missing missing
          0 0
          0 1
          1 0
          1 1
          Last edited by William Lisowski; 28 Feb 2019, 15:57.

          Comment


          • #6
            I've attached the coodbook output for both variables. I think it should be easier to understand. Based on these two variables I want to generate a new one. The missing values in my data is missing value sustem. Thanks in advance.
            Attached Files

            Comment


            • #7
              Adding
              Code:
              replace edu_inf = . if missing(ac035d10, ac035d9)==1
              after any one of the three sets of code in post #2 will replace the calculated value of edu_inf with a missing value if either, or both, of the two input variables are missing.

              With that said, let me offer some advice to improve your future posts. Please ta a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ. Many members here will not download files from unknown sources.

              Comment


              • #8
                Based on these two variables I want to generate a new one.
                This remains insufficient as a precise definition.

                Comment


                • #9
                  Thank you very much. For the next posts I will follow your advice.

                  Comment

                  Working...
                  X