Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to drop some observations within one variable

    Hi all,

    I'm currently using the IPUMS data on gross rent and income (among other things). Unfortunately, one possible value used in the dataset is "N/A", registered type as long. How would I go about dropping only observations where rentgrs is coded as N/A and keep the other observations? Once N/A is dropped I'll be able to convert the variable to int.

    Attached Files

  • #2
    Hi all,

    I figured it out, I just used "drop if rentgrs < 1"

    Comment


    • #3
      "N/A" has no specific meaning in Stata the way it might in R. drop if rentgrs < 1 only works because we aren't actually looking at the values of these variables, we are looking at the variable labels. This means, somewhat confusingly, when you see the number 1950 in your data window, you are looking at the label for 1950 rather than the value, which could be completely different. In general, it is a very bad sign that you have numbered labels like this.

      The code drop if rentgrs < 1 is meaningless in this context. It is complete nonsense. The fact that it doesn't give an error and seems to drop all of the N/A's does not at all mean what you've done is correct.

      My guess is that you have a deeper issue: you have some data that was generated by someone using R. You read the data in, but since it contained N/A strings rather than the empty string for missing values, Stata interpreted that entire column of data as a string. You then incorrectly used the -encode- command to convert the string to a number, not realizing that -encode- treats strings as categories, assigns them an arbitrary number as a value, then superimposes the original string as a label. Everything you did with this column after that was nonsense. This is a silent error, and will give you results, the results will just be incorrect.

      You need to completely reload your data from the file system. Next, any value that has an N/A in it needs the following:

      Code:
      replace variablename = "" if variablename == "N/A"
      Then use the -destring- command to convert the variable into its numeric equivalent.

      Comment


      • #4
        Here is just a quick demonstration:

        Code:
        clear
        input str20(string)
        "543"
        "1234"
        "475"
        "6854"
        "374"
        "573"
        "NA"
        "NA"
        "57493"
        "234"
        end
        
        encode string, generate(labels)
        gen values = labels
        list
        I start with a string (should be red in the viewer), then I encode the variable in a new variable called "labels" (should be blue in the viewer). Finally, I just copy over the values of the labels over to a new "values" variable (should be black in the editor).

        Code:
        . list
        
             +--------------------------+
             | string   labels   values |
             |--------------------------|
          1. |    543      543        5 |
          2. |   1234     1234        1 |
          3. |    475      475        4 |
          4. |   6854     6854        8 |
          5. |    374      374        3 |
             |--------------------------|
          6. |    573      573        6 |
          7. |     NA       NA        9 |
          8. |     NA       NA        9 |
          9. |  57493    57493        7 |
         10. |    234      234        2 |
             +--------------------------+
        Here we see the string value in the first column, the labels of the encoded variable in the second column, and the actual mathematical quantity that the statistical software will use when doing math in the third column.

        Comment


        • #5
          Originally posted by

          [CODE
          replace variablename = "" if variablename == "N/A"[/CODE]

          Then use the -destring- command to convert the variable into its numeric equivalent.
          Last edited by Benjamin Vu; 03 Oct 2023, 17:44.

          Comment

          Working...
          X