Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding 2 dummy variables to create 1 dummy variable

    Hello,



    These are tabulations of the 2 dummy variables. A13aN3 is coded for women and A13bN3 is coded for men so presumably there should be no overlap.

    tab A13aN3, missing

    A13aN3 | Freq. Percent Cum.
    ------------+-----------------------------------
    0 | 15,623 20.28 20.28
    1 | 10,078 13.08 33.36
    . | 51,343 66.64 100.00
    ------------+-----------------------------------
    Total | 77,044 100.00

    . tab A13bN3, missing

    A13bN3 | Freq. Percent Cum.
    ------------+-----------------------------------
    0 | 9,290 12.06 12.06
    1 | 5,845 7.59 19.64
    . | 61,909 80.36 100.00
    ------------+-----------------------------------
    Total | 77,044 100.00

    I'm trying to add the 2 dummy variables together to create 1 dummy variable. This is the code I used...

    g A13N3 = 0 if A13aN3 == 0 | A13bN3 == 0
    replace A13N3 = 1 if A13aN3 == 1 | A13bN3 == 1

    I'm expecting A13N3=0 to have 24,913 obs and A13N3=1 to have 15,923. This is my output.

    A13N3 | Freq. Percent Cum.
    ------------+-----------------------------------
    0 | 13,919 18.07 18.07
    1 | 11,798 15.31 33.38
    . | 51,327 66.62 100.00
    ------------+-----------------------------------
    Total | 77,044 100.00


    Thanks for any assistance.

    Mary




  • #2
    Welcome to Statalist.

    your output is very hard to read. It would be much better if you used code tags. See the Statalist FAQ, especially the section on asking questions effectively.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      if any observations have a 0 on one of your 2 variables of interest and a 1 on the other variable, then your expectation will clearly be wrong; other than that, it is hard to say exactly what is going on without a data sample (use -dataex-; see the FAQ); also, a cross-tab of the two variable will be more useful than the individual tabulations; please use CODE blocks (see the FAQ)

      Comment


      • #4
        Your first rule is that the result is 0 if either argument is 0 and your second rule is 1 is either argument is 1. The second rule overrides the first if one argument is 1 and the other 0.

        The rules can be combined by asking for max(A13aN3, A13bN3) if that is what you want.

        (You can literally add (0, 1) indicators but the result will be 0, 1, 2. Otherwise it's a matter of what are your rules for combining them.)

        Comment


        • #5
          Thanks Richard. This is a little better, but don't know if I use the code tag properly.

          These are the two variables A13aN3 and A13bN3. I want to create a 3rd variable A13N3 with 24,913 obs of 0's, 15,923 obs of 1's, and 36,208 missing.

          Nick. I'm not sure how to use the max command. Should I still use that based on this additional info.

          [ A13aN3 | Freq. Percent Cum.
          ------------+-----------------------------------
          0 | 15,623 20.28 20.28
          1 | 10,078 13.08 33.36
          . | 51,343 66.64 100.00
          ------------+-----------------------------------
          Total | 77,044 100.00
          ]


          [
          A13bN3 | Freq. Percent Cum.
          ------------+-----------------------------------
          0 | 9,290 12.06 12.06
          1 | 5,845 7.59 19.64
          . | 61,909 80.36 100.00
          ------------+-----------------------------------
          Total | 77,044 100.00
          ]

          Comment


          • #6
            max() is in Stata a function, not a command. So how you should use it is in something like

            Code:
            gen max = max(A13aN3, A13bN3)
            but whether you should use it depends on what you want to do, not on the marginal distributions. Perhaps min() not max() would make more sense for your analyses; I have no way to know what your variables mean or how they should be combined for your purposes.

            Neither function will produce a variable with that marginal distribution. You are confusing the combination of values and the addition of frequencies. Forgetting about missing values consider a 2 x 2 table with

            value1 value2 freq
            1 1 a
            1 0 b
            0 1 c
            0 0 d

            If your rule is max (1 or 1 yields 1), then the new variable has 1 with frequency a + b + c and 0 with frequency d.

            If your rule is min (0 or 1 yields 0). then the new variable has 1 with frequency a and 0 with frequency b + c + d.

            You're asking for a rule that has 1 with frequency 2a + b + c and 0 with frequency b + c + 2d -- which implies double counting.

            NOTE: @Rich Goldstein's reply appeared just seconds before mine in #4 and so was not visible when I posted. I think he's making exactly the same point.
            Last edited by Nick Cox; 07 Sep 2021, 14:10.

            Comment


            • #7
              Hi Mary. No, that posting is still not very legible! But Nick and Rich G. are giving you good advice anyway. Some people will wade through hard to read code and decipher it, but I am usually too lazy to do that.

              See

              https://www.statalist.org/forums/help

              especially section 12 (12.3 discusses code tags).
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment

              Working...
              X