Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deleting observations in long data

    Hi Statalist -- I have data that looks like this:
    id math
    class
    12345 .
    12345 .
    12345 1
    10489 .
    10489 .
    10489 .
    10489 .
    I want to drop id #10489 because it has missing on all 4 rows, but want to keep id #12345 and all of its rows because it has a 1 on the variable math class and switch it's missing values to be 0. I've been reading up on the collapse command, but still unclear about what the appropriate procedure would be. Any help is greatly appreciated!

  • #2
    Do something like

    Code:
    webuse nlswork, clear
    egen nvalid = total(!missing(union)), by(idcode)
    drop if nvalid == 0
    replace union = 0 if missing(union)
    There are various other ways to do this using egen commands.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Another way:
      Code:
      clear all
      set more off
      
      input ///
      id math
      12345     .
      12345     .
      12345     1
      10489     .
      10489     .
      10489     .
      10489     .
      end
      
      list, sepby(id)
      
      *----- what you want -----
      
      bysort id (math): drop if missing(math[1]) & missing(math[_N])
      
      list, sepby(id)
      With Richard's setup:
      Code:
      bysort idcode (union): drop if missing(union[1]) & missing(union[_N])
      You should:

      1. Read the FAQ carefully.

      2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

      3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

      4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

      Comment


      • #4
        Actually, at the risk of some opacity, you can make it even shorter:

        Code:
        by idcode (math), sort: drop if missing(math[1])
        because missing values are, in Stata, greater than all non missing values, so once the data are sorted, if the first is missing, they all are.

        Comment


        • #5
          I thought there were simpler solutions out there but couldn't remember what they were. Nice work.

          This article by Nick Cox shows all sorts of neat tricks for working with panel data: http://www.stata-journal.com/sjpdf.h...iclenum=dm0033

          As a sidelight, I would make a copy of math class and then recode either the original or the copy. Doing things like recoding missings to 0 is something you might regret later, or at least want to double-check and see if it matters. In general if I am making major changes to variables I want to have a way to get back to the original.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          Stata Version: 17.0 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            Actually, at the risk of some opacity, you can make it even shorter:

            Code:
            by idcode (math), sort: drop if missing(math[1])
            because missing values are, in Stata, greater than all non missing values, so once the data are sorted, if the first is missing, they all are.
            Clyde makes a good point. This doesn't seem the case for Tracy Lam, but we must be careful if the variable we're interested is string and not numeric. The sorting of missing strings is the opposite: they sort to the first places and not the last. An example gone bad:

            Code:
            clear all
            set more off
            
            input ///
            id str5 math
            12345     
            12345     
            12345     one
            10489     
            10489     
            10489     
            10489     
            end
            
            list, sepby(id)
            
            *----- what you want -----
            
            by id (math), sort: drop if missing(math[1])
            
            list, sepby(id)
            You should:

            1. Read the FAQ carefully.

            2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

            3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

            4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

            Comment

            Working...
            X