Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace missing cases within a variable under some circumstances

    Hi Everyone!
    I have a dataset looks like the below. I wanna replace missing values withing "locale" with the first valid value in "locale", while the data are sorted by "endyear" and "unitid". I tried the following command buy it did not work. I appreciate your advice.
    Best,
    Nader



    bysort unitid endyear: replace locale = locale[n+1] if locale==.


    Endyear-unitid-locale
    2000-1-.
    2001-1-.
    2002-1-2
    2003-1-3
    2000-2-.
    2001-2-.
    2002-2-9
    2000-3-5
    2001-3-.
    2002-3-6

    I would like to get the following:

    Endyear-unitid-locale
    2000-1-2
    2001-1-2
    2002-1-2
    2003-1-3
    2000-2-9
    2001-2-9
    2002-2-9
    2000-3-5
    2001-3-6
    2002-3-6














  • #2
    Welcome to Statalist.

    Can you tell us what would you expect to do for the following example?
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int endyear byte(unitid locale)
    2000 4  .
    2001 4 10
    2002 4  .
    2003 4 30
    2004 4  .
    end
    Note also how I posted the sample data - in a way that can be copied and pasted into the do-file editor to recreate the example.

    Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. In particular, please read FAQ #12 and help those whose help you seek by posting example data using the dataex command. If you are running Stata 15.1 or later, it is already installed. For earlier versions of Stata, install dataex by typing ssc install dataex. Type help dataex to read the simple instructions for using it. Using dataex will enable those who want to help you to quickly and easily create a 100% faithful replica of your situation to test their ideas and code on.

    The more you help others understand your problem, the more likely others are to be able to help you solve your problem.



    Comment


    • #3
      Hi William,
      Thanks for your reply. I would like to replace missing values within "locale" with the first next valid value in "locale" across "unitid"s.
      Following is parts of my data that shows how my data looks like and what I need. Hope my new post fits Statalist posting policies.
      Best,
      Nader


      . *How my data looks like
      . input int endyear byte(unitid locale)
      endyear unitid locale
      1. 2000 1 .
      2. 2001 1 .
      3. 2002 1 .
      4. 2003 1 30
      5. 2001 2 1
      6. 2002 2 .
      7. 2003 2 2
      8. 2004 2 .
      9. 2005 2 3

      . *What I need
      . input int _endyear byte(_unitid _locale)
      endyear unitid locale
      1. 2000 1 30
      2. 2001 1 30
      3. 2002 1 30
      4. 2003 1 30
      5. 2001 2 1
      6. 2002 2 2
      7. 2003 2 2
      8. 2004 2 3
      9. 2005 2 3



      Last edited by Mahdie Raj; 10 Dec 2017, 20:36.

      Comment


      • #4
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int endyear byte(unitid locale)
        2000 1  .
        2001 1  .
        2002 1  .
        2003 1 30
        2001 2  1
        2002 2  .
        2003 2  2
        2004 2  .
        2005 2  3
        end
        
        
        gsort unitid -endyear
        by unitid: replace locale = locale[_n-1] if missing(locale)
        sort unitid endyear
        Your original code was, in a sense, on the right track. But Stata processes its commands from the top of the data set to the bottom, so -replace locale = locale[_n+1] if missing(locale- would only result in a change in a single observation immediately above the first missing value of locale, it would not propagate backwards.

        The key is to exploit Stata's top to bottom processing by sorting the data in reverse order on date. Then, when done, restore the original sort order.

        Note: It appears that what you pasted into #3 is the output that you get from reading in the data, as opposed to the output of -dataex- itself. They look somewhat similar, but if you compare what you have to what I show here you will notice that yours has extra numbering at the left edge that corresponds to no variable. That layout is difficult to bring in to Stata. Please read -help dataex- carefully and use it effectively in the future. Thanks!

        Comment


        • #5
          Thanks Clyde. Your trick worked perfectly! Much appreciated!

          Comment


          • #6
            You got excellent advice, but this is also an FAQ. See therefore

            https://www.stata.com/support/faqs/d...issing-values/

            https://www.statalist.org/forums/for...-interpolation

            Comment


            • #7
              For completeness, here is a mipolate (SSC) solution.


              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input int endyear byte(unitid locale)
              2000 1  .
              2001 1  .
              2002 1  .
              2003 1 30
              2001 2  1
              2002 2  .
              2003 2  2
              2004 2  .
              2005 2  3
              end
              
              mipolate locale endyear, by(unitid) backward gen(locale2) 
              
              list, sepby(unitid) 
              
                   +-------------------------------------+
                   | endyear   unitid   locale   locale2 |
                   |-------------------------------------|
                1. |    2000        1        .        30 |
                2. |    2001        1        .        30 |
                3. |    2002        1        .        30 |
                4. |    2003        1       30        30 |
                   |-------------------------------------|
                5. |    2001        2        1         1 |
                6. |    2002        2        .         2 |
                7. |    2003        2        2         2 |
                8. |    2004        2        .         3 |
                9. |    2005        2        3         3 |
                   +-------------------------------------+

              Comment


              • #8
                Thanks Nick! much appreciated!

                Comment

                Working...
                X