Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to drop a missing series

    Hello,
    I have a panel data set, extracted using Stata wbopendata package.
    I want to drop a country if a series, X, has no observations for the sample period.

    What is the syntax to do it?
    Thanking you in advance

  • #2

    Code:
    bysort country (X) : drop if missing(X[1]) & missing(X[_N])

    If after sorting on X both first and last observations in a panel have missing values, then all do.

    Comment


    • #3
      For more on the principles, see the FAQ https://www.stata.com/support/faqs/d...ions-in-group/

      Comment


      • #4
        Thank you, Sir. It worked. However, I have the following question regarding dropping observations.
        Since the number of observations differs across IDs in the panel data setup, I want to drop an ID if a variable has less than, say, 20 observations.

        Kindly help.

        Comment


        • #5
          I guess you’re concerned if a panel has less than 20 observations — not whether a variable is so deficient. The number of observations for any variable is the same as the number of observations in the dataset.

          Code:
          bysort country : drop if _N < 20
          may be what you seek.

          Comment


          • #6
            Thank you, Sir, for your fast reply.
            Unfortunately, the code you suggested does not solve my question. Mayve my question was not clear. I want to drop that ID whose variable X has less than 20 observations.

            Comment


            • #7
              You need to say what you mean by observations. I unsurprisingly follow Stata’s definition. An observation in other terms is a row, case or record in the dataset. _N always counts observations.

              What is the Santosh definition? If you’re thinking of counting non-missing values, perhaps, then you need to spell that out.

              As always, explaining what you want with a data example might help. Please read and act on FAQ Advice #12.

              Otherwise put #1 is one question and #4 a different question if answered using Stata’s definitions. It seems that your current question is different again.
              Last edited by Nick Cox; 10 Mar 2019, 15:00.

              Comment


              • #8
                No reply, although our time zones may be quite different. Let's try again. In this made-up example, let's suppose the magic number is 5. Clearly, or so I hope, the principle is the same for 20, or any other magic number.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str1 country float X
                "A" .
                "A" .
                "A" .
                "A" .
                "A" .
                "B" 1
                "B" 2
                "B" 3
                "C" 1
                "C" 2
                "C" 3
                "C" .
                "C" .
                end
                
                list, sepby(country)
                
                     +-------------+
                     | country   X |
                     |-------------|
                  1. |       A   . |
                  2. |       A   . |
                  3. |       A   . |
                  4. |       A   . |
                  5. |       A   . |
                     |-------------|
                  6. |       B   1 |
                  7. |       B   2 |
                  8. |       B   3 |
                     |-------------|
                  9. |       C   1 |
                 10. |       C   2 |
                 11. |       C   3 |
                 12. |       C   . |
                 13. |       C   . |
                     +-------------+
                Country A has all missing values on X. So, (subject to being interested in other variables not mentioned here), that is no use to us, hence the answer in #2::

                Code:
                . bysort country (X) : drop if missing(X[1]) & missing(X[_N])
                (5 observations deleted)
                Try it yourself, and Country A disappears.

                In writing #2 I was relying on your title in #1

                How to drop a missing series
                The text in #1

                no observations for the sample period
                I interpreted as really meaning a problem of missing values. If there are no observations, then there is nothing to drop!

                Then in #4 you insisted that you

                I want to drop an ID if a variable has less than, say, 20 observations.
                So, now a different question. Consider Country B with just 3 observations. (Remember, my threshold is 5, to keep the examples short.)


                Code:
                . bysort country (X) : drop if _N < 5
                (3 observations deleted)
                Try it yourself, and Country B disappears.

                Then in #6 you explain #4 by -- so far as I can see -- just repeating the same idea in very slightly different form.

                As #7 explains, you may be in need of a count of non-missing values. Consider Country C with 5 observations, but only 3 observations with non-missing values. Here is one way to look for such countries.

                Code:
                . egen nmX = count(X), by(country)
                
                . llist
                
                     +-------------------+
                     | country   X   nmX |
                     |-------------------|
                  1. |       C   1     3 |
                  2. |       C   2     3 |
                  3. |       C   3     3 |
                  4. |       C   .     3 |
                  5. |       C   .     3 |
                     +-------------------+
                
                . drop if nmX < 5
                (5 observations deleted)
                Try it yourself, and Country C disappears.

                If you don't give concrete examples, all the weight is on your words, which have to be clear. Terminology, including using terminology correctly, is then crucial.

                Once again, in Stata

                1. An observation in other terms is an entire row, case or record in the dataset. That is, it will include values for one or more variables, the columns, fields or features of the dataset. A value might be missing for a given variable in a given observation, but the observation is all of those values, missing or not.

                2. Observations not in the dataset but that might have been are, in my view, best not described as missing. There isn't, so far as I know, an agreed term for them, but "omitted" or "absent" might serve. But we do not always need special terms, particularly if an example makes the problem clear.
                Last edited by Nick Cox; 11 Mar 2019, 04:21.

                Comment


                • #9
                  Dear Sir,
                  I am extremely sorry for the late reply. In addition to different time zone, I was busy with office work. I admit that I was not clear about the definition of "observation" and hence, the confusion and lack of clarity in my questions. In Qn #1, I wanted to drop a country if it has all missing values on X. The code in # 2 solved that problem. In #4, I wanted to drop a country if the count of non-missing values is less than a threshold. Now your detailed explanation has helped me understand the problem better and your code in #4 has solved the problem.

                  Once again I thank you for taking the time to explain in great detail.

                  Comment


                  • #10
                    Thanks for the closure!

                    Comment

                    Working...
                    X