Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping in a Range

    Hello,

    I want to drop missing values in a variable, but only if the variable is missing for all years (2000-2013).

    Code:
    input double ccode1 float(ccode2 year) long pending_beginning
    2 223 2000  .
    2 223 2001  .
    2 223 2002  .
    2 223 2003  .
    2 223 2004  .
    2 223 2005  .
    2 223 2006  .
    2 223 2007  .
    2 223 2008  .
    2 223 2009  .
    2 223 2010  .
    2 223 2011  .
    2 223 2012  .
    2 223 2013  .
    2 225 2000  1
    2 225 2001  1
    2 225 2002  1
    2 225 2003  1
    2 225 2004  1
    2 225 2005  0
    2 225 2006  0
    2 225 2007  .
    2 225 2008  0
    In this case, I want to drop the missing values seen with 2-223 between 2000 and 2013, but what we see in 2-225 between 2000 and 2013 (or 2008 seen here) is okay.

    I tried to play around with the drop in range codes, but can't seem to put my head around how to make a working code for this particular range.


  • #2
    Hi,

    You could "reshape" to wide, then use "egen" with "rownonmiss" to create a new variable, and keep all observations where the new variable is !=0

    After the reshape, the "missings" command with "dropobs" would do the same thing for you.

    Just reshape back to long afterwards and you're good.

    I hope this helps

    Comment


    • #3
      Originally posted by Andy dB View Post
      Hi,

      You could "reshape" to wide, then use "egen" with "rownonmiss" to create a new variable, and keep all observations where the new variable is !=0

      After the reshape, the "missings" command with "dropobs" would do the same thing for you.

      Just reshape back to long afterwards and you're good.

      I hope this helps
      Hi Andy,

      Thanks. It's a bit confusing, though. I did come across this in my search:
      http://www.stata.com/statalist/archi.../msg00948.html

      No idea if it's something similar. Trying to see if there's a simpler way, because it's quite confusing to wrap my head around it.

      Comment


      • #4
        Does this do what you want?
        Code:
        clear
        input double ccode1 float(ccode2 year) long pending_beginning
        2 223 2000  .
        2 223 2001  .
        2 223 2002  .
        2 223 2003  .
        2 223 2004  .
        2 223 2005  .
        2 223 2006  .
        2 223 2007  .
        2 223 2008  .
        2 223 2009  .
        2 223 2010  .
        2 223 2011  .
        2 223 2012  .
        2 223 2013  .
        2 225 2000  1
        2 225 2001  1
        2 225 2002  1
        2 225 2003  1
        2 225 2004  1
        2 225 2005  0
        2 225 2006  0
        2 225 2007  .
        2 225 2008  0
        end
        bysort ccode1 ccode2 (year): egen good = total(!missing(pending_beginning))
        drop if good==0
        Code:
        . list, clean noobs
        
            ccode1   ccode2   year   pendin~g   good  
                 2      225   2000          1      8  
                 2      225   2001          1      8  
                 2      225   2002          1      8  
                 2      225   2003          1      8  
                 2      225   2004          1      8  
                 2      225   2005          0      8  
                 2      225   2006          0      8  
                 2      225   2007          .      8  
                 2      225   2008          0      8

        Comment


        • #5
          The post you sent suggests exactly the same - to reshape and drop those with all variables missing. How about:

          Code:
          reshape wide pending_begin, i(ccode1 ccode2) j(year)
          missings dropobs pending_beginning*
          reshape long

          Comment


          • #6
            Originally posted by William Lisowski View Post
            Does this do what you want?
            Code:
            clear
            input double ccode1 float(ccode2 year) long pending_beginning
            2 223 2000 .
            2 223 2001 .
            2 223 2002 .
            2 223 2003 .
            2 223 2004 .
            2 223 2005 .
            2 223 2006 .
            2 223 2007 .
            2 223 2008 .
            2 223 2009 .
            2 223 2010 .
            2 223 2011 .
            2 223 2012 .
            2 223 2013 .
            2 225 2000 1
            2 225 2001 1
            2 225 2002 1
            2 225 2003 1
            2 225 2004 1
            2 225 2005 0
            2 225 2006 0
            2 225 2007 .
            2 225 2008 0
            end
            bysort ccode1 ccode2 (year): egen good = total(!missing(pending_beginning))
            drop if good==0
            Code:
            . list, clean noobs
            
            ccode1 ccode2 year pendin~g good
            2 225 2000 1 8
            2 225 2001 1 8
            2 225 2002 1 8
            2 225 2003 1 8
            2 225 2004 1 8
            2 225 2005 0 8
            2 225 2006 0 8
            2 225 2007 . 8
            2 225 2008 0 8
            Hi Will,

            Reviewing the dataset and it's looking like it's what I want. Thank you!

            how exactly does
            Code:
            total(!missing(pending_beginning))
            read, though?

            Comment


            • #7
              Originally posted by Andy dB View Post
              The post you sent suggests exactly the same - to reshape and drop those with all variables missing. How about:

              Code:
              reshape wide pending_begin, i(ccode1 ccode2) j(year)
              missings dropobs pending_beginning*
              reshape long
              Hi Andy,

              in between your's and Will's reply, I was trying to fumble with the reshape command, but it wasn't working because there was a problem with the dataset that I originally didn't pickup/overlooked. I tried Will's suggestion after, and it worked. But I'm going to bookmark this one in the meantime as well. Thank you.

              Comment


              • #8
                Start by understanding that when Stata evaluates a logical expression or function, it evaluates true as 1 and false as 0.

                Then from help missing() we see that that missing(pending_beginning) will be true (1) if pending_beginning is missing and 0 (false) otherwise.

                The ! is Stata's not operator, which reverses truth and falsity, so !missing(pending_beginning) is 1 (true) if pending_beginning is not missing and 0 otherwise.

                The egen total function gives the total of the values of !missing(pending_beginning), in other words will be the number of non-missing values.

                Comment


                • #9
                  reshape is a very helpful command for when it is needed, but a double reshape is over-elaborate here. Here is another solution:

                  Code:
                  bysort code1 code2 (pending_beginning) : drop if missing(pending_beginning[1])
                  If after sorting, the first value in each block is missing, then they all are.

                  See also http://www.stata.com/support/faqs/da...ions-in-group/

                  Comment


                  • #10
                    Thank you Nick and William!

                    Comment


                    • #11
                      Hello,

                      I have a similar question. Namely, I want to identify those brands (true as 1 and false as 0 ) for which the variable downward_phase for the whole time period 2003q1-2016q4 was 0 . I used the following command bysort brand neu_qsif ( downward_phase ): egen good = total(downward_phase==0) but the result is not correct. In the example below the downward_phase was not 0 from 2003q1-2016q4 only during the first year. Thank you in advance.
                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input str21 intname float(brand neu_qsif peak_quarter peak_su) byte downward_phase float good
                      "INS.ACTRAPID HM" 3 172 173 533.75 0 1
                      "INS.ACTRAPID HM" 3 173 173 533.75 0 1
                      "INS.ACTRAPID HM" 3 174 173 533.75 0 1
                      "INS.ACTRAPID HM" 3 175 173 533.75 0 1
                      "INS.ACTRAPID HM" 3 176 173 533.75 1 0
                      "INS.ACTRAPID HM" 3 177 173 533.75 1 0
                      "INS.ACTRAPID HM" 3 178 173 533.75 1 0
                      "INS.ACTRAPID HM" 3 179 173 533.75 1 0
                      "INS.ACTRAPID HM" 3 180 173 533.75 1 0
                      "INS.ACTRAPID HM" 3 181 173 533.75 1 0
                      "INS.ACTRAPID HM" 3 182 173 533.75 1 0
                      "INS.ACTRAPID HM" 3 183 173 533.75 1 0
                      "INS.ACTRAPID HM" 3 184 173 533.75 1 0
                      "INS.ACTRAPID HM" 3 185 173 533.75 1 0
                      "INS.ACTRAPID HM" 3 186 173 533.75 1 0
                      "INS.ACTRAPID HM" 3 187 173 533.75 1 0
                      "INS.ACTRAPID HM" 3 188 173 533.75 1 0
                      "INS.ACTRAPID HM" 3 189 173 533.75 1 0
                      "INS.ACTRAPID HM" 3 190 173 533.75 1 0
                      "INS.ACTRAPID HM" 3 191 173 533.75 1 0
                      end
                      format %tq neu_qsif
                      format %tq peak_quarter

                      Comment


                      • #12
                        I just wanted to say that I solved the problem myself.

                        First I replaced 0 values with missing:
                        foreach x of varlist downward_phase {
                        replace `x' = (.) if(`x' == 0)
                        }
                        Then used the solution from the previous post:
                        bysort brand (neu_qsif ): egen good = total (!missing( downward_phase ))


                        Comment

                        Working...
                        X