Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • dummy variable excluding missing observations

    Hello,


    I am trying to create a dummy variable based on the data available from two other variables. the dummy variable should be 1 if:

    gen dummyvar = 0
    replace dummyvar = 1 if var1< -1.875 | var1 > 8.325 & var2 < -2.65 | var2 > 10.55


    otherwise it should be 0. This works so far, but stata also generates 1 if there are no observations in both var1 and var2. Is there a way to tell stata to keep the dummy variable at 0 if there are no observations in var1 and var2.




    Thank you for your help!


  • #2
    The Stata jargon here is subtle:

    You do have observations even if they contain missing values.

    You do have some missing values on two variables.

    (An observation is an entire row, case or record in the dataset.)

    The code you want is possibly something like

    Code:
    gen dummyvar = (var1< -1.875 | var1 > 8.325) & (var2 < -2.65 | var2 > 10.55) & !missing(var1, var2)
    which produces values of 1 or 0. Nevertheless, there are many advantages in

    Code:
    gen dummyvar = (var1< -1.875 | var1 > 8.325) & (var2 < -2.65 | var2 > 10.55) if !missing(var1, var2)
    which produces values of 1 or 0 or missing.

    N.B. Please do note the reminder given you in an earlier thread that you are asked to use a full real name here, e,g. given name plus family name. Some members here won't support those who don't respect this request.
    Last edited by Nick Cox; 26 Nov 2014, 06:31.

    Comment


    • #3
      It is working so far but it now replaces 0 with . if either value (var1 or var2) are missing. How can I alter the command that stata only generates "." if both var1 and var2 values are missing?


      Best wishes,

      Martin Pattillo

      Comment


      • #4
        I doubt that you really want that. If one variable only is missing then such observations would be coded 1 or 0 and (either way) lumped with observations that do have non-missing values on both. What possible advantage could that convey?

        Comment


        • #5
          Chances are, you do not want to alter the command that way. If one of the variables is missing and you set the resulting dummy to 1 or 0, you do not take into account your uncertainty about the "true" value behind the missing value. There is probably no way you could know whether value 0 or 1 should be assigned if one of the variables has missing values, so you necessarily make assumptions about this missing value.

          It makes things easier, if you add a little more content. Instead of variable 1 and variable 2, try telling us more about the substantive problem you are facing.

          Best
          Daniel

          Comment


          • #6
            This is part of the data.
            country var1 var2
            Afghanistan 9.4
            Albania 3.8 5
            Algeria 1.9 3.7
            American Samoa
            Andorra 3.2 5.9
            Angola 1.6 11.7
            Antigua and Barbuda 3.5 2.3
            Argentina 4.3 5.2
            Armenia -1.9 7.6
            Aruba 3.9 -0.1
            Australia 3.6 3.1
            Austria 2.6 1.7
            Azerbaijan -6.3 14.8
            Bahamas, The 2.6 0.6
            Bahrain 5 5.4
            (Var1= GDP annual growth % for 1990, Var2= GDP annual growth % for 2012)

            The command:


            gen dummyvar = (var1< -1.875 | var1 > 8.325) & (var2 < -2.65 | var2 > 10.55) if !missing(var1, var2)


            will set my dummy variable to 1 for Afghanistan. Yet since the value is smaller than 10.55, I would like it to be 0. Also for American Samoa it sets a 1 rather than 0.
            Is there a way to fix this?

            Thanks,

            Martin Pattillo

            Comment


            • #7
              I'm not clear yet precisely what the definition of your dummy variable is intended to be. Note the code below (and note too the use of CODE delimiters -- please see the FAQ recommendations about this -- it's how you should have pasted your data snippet):

              Code:
               . list
                     +-----------------------------------+
                   |             country   var1   var2 |
                   |-----------------------------------|
                1. |         Afghanistan      .    9.4 |
                2. |             Albania    3.8      5 |
                3. |             Algeria    1.9    3.7 |
                4. |      American Samoa      .      . |
                5. |             Andorra    3.2    5.9 |
                   |-----------------------------------|
                6. |              Angola    1.6   11.7 |
                7. | Antigua and Barbuda    3.5    2.3 |
                8. |           Argentina    4.3    5.2 |
                9. |             Armenia   -1.9    7.6 |
               10. |               Aruba    3.9    -.1 |
                   |-----------------------------------|
               11. |           Australia    3.6    3.1 |
               12. |             Austria    2.6    1.7 |
               13. |          Azerbaijan   -6.3   14.8 |
               14. |        Bahamas, The    2.6     .6 |
               15. |             Bahrain      5      5 |
                   +-----------------------------------+
                . gen dummyvar1 = (var1< -1.875 | var1 > 8.325) & (var2 < -2.65 | var2 > 10.55) if !missing(var1, var2)
              (2 missing values generated)
                . gen dummyvar2 = 0
                . replace dummyvar2 = 1 if var1< -1.875 | var1 > 8.325 & var2 < -2.65 | var2 > 10.55
              (4 real changes made)
                . gen dummyvar3 = 0
                . replace dummyvar3 = 1 if  (var1< -1.875) | (var1 > 8.325 & var2 < -2.65) |  (var2 > 10.55)
              (4 real changes made)
                . clonevar dummyvar4 = dummyvar3
                . replace dummyvar4 = 0 if missing(var1, var2)
              (1 real change made)
                . list
                     +-------------------------------------------------------------------------------+
                   |             country   var1   var2   dummyv~1   dummyv~2   dummyv~3   dummyv~4 |
                   |-------------------------------------------------------------------------------|
                1. |         Afghanistan      .    9.4          .          0          0          0 |
                2. |             Albania    3.8      5          0          0          0          0 |
                3. |             Algeria    1.9    3.7          0          0          0          0 |
                4. |      American Samoa      .      .          .          1          1          0 |
                5. |             Andorra    3.2    5.9          0          0          0          0 |
                   |-------------------------------------------------------------------------------|
                6. |              Angola    1.6   11.7          0          1          1          1 |
                7. | Antigua and Barbuda    3.5    2.3          0          0          0          0 |
                8. |           Argentina    4.3    5.2          0          0          0          0 |
                9. |             Armenia   -1.9    7.6          0          1          1          1 |
               10. |               Aruba    3.9    -.1          0          0          0          0 |
                   |-------------------------------------------------------------------------------|
               11. |           Australia    3.6    3.1          0          0          0          0 |
               12. |             Austria    2.6    1.7          0          0          0          0 |
               13. |          Azerbaijan   -6.3   14.8          1          1          1          1 |
               14. |        Bahamas, The    2.6     .6          0          0          0          0 |
               15. |             Bahrain      5      5          0          0          0          0 |
                   +-------------------------------------------------------------------------------+
              So, which is the "correct" definition, if any? Observe also the use of parentheses to "bind" different parts of the logical conditions in the definition (I made one assumption -- you made have intended another). Also, be clear(er) about how missing values are intended to be treated (as other posters pointed out)

              Comment


              • #8
                The condition for both variables being missing is simply

                Code:
                 
                missing(var1) & missing(var2)
                At this point I merely flag that Martin has been warned by three people now that what he wants is likely to be a bad idea and let him decide what he does.

                Comment


                • #9
                  will set my dummy variable to 1 for Afghanistan. Yet since the value is smaller than 10.55, I would like it to be 0.
                  Although Nick said it all, as my very last contribution to this, may I ask, what exactly makes you so sure that your first condition, i.e. the "true", yet not observed value for var1 is smaller than -1.875 or greater than 8.325 for Afghanistan? Only if you knew this - which you obviously do not from the data you show - could you justify setting the indicator to something else than missing. Note that I argue that setting it to 1 seems just as unjustified. Of course, you did not tell us what exactly your indicator is supposed to indicate.

                  Best
                  Daniel

                  Comment


                  • #10
                    Thank you all very much for your help. The dummy variable is supposed to identify outliers in GDP growth. So i am not too much concerned with the missing data rather than identifying the countries with exceptional growth. Since the dummy identifies countries with exceptional growth by using 1, it would be not correct for my analysis if stata would also generate a 1 for countries, whose data is not available.

                    Thank you all again!

                    Best wishes,

                    Martin Pattillo

                    Comment

                    Working...
                    X