Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filling a number of missing variables sharing prefix with "replace"

    Hi Statalisters,

    I've merged two datasets which are cross-sectional(year of 2004) and time series(year from 2012-2015) with student id.
    Because the variables from those two are different, I should fill the missing values in the rest 4 years with the number in year 2004.

    Searching Statalist, I found the following code to fill the BYAID for 2012-2015 with 2004.

    bysort BYSID (BYAID) : replace BYAID = BYAID[_n-1] if missing(BYAID)

    However, there are a bunch of other variables starting with BYA#, BYT#, FY#... (# means series of numbers).

    I've tried with the following code,

    bysort BYSID (BYA*) : replace BYA* = BYA*[_n-1] if missing(BYA*)

    But there was error message : BYA ambiguous abbreviation

    How could I fill all the missing variable with these prefix, without error??

  • #2
    You need to do this with a loop over the variables:

    Code:
    foreach v of varlist BYA* BYT* FY* {
        bysort BYSID (`v'): replace `v' = `v'[_n-1] if missing(`v')
    }
    Note: Because no example data was provided, this code is untested and may contain typos or other errors. Also note that this code only works correctly with numeric variables. String variables will be mishandled with this code.

    In the future, when asking for help with code, it is usually wise to also give sample data so that those who want to help you can test their code on your example before posting it. To show example data, you should use the -dataex- command. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Added: All of that said, the code shown generalizes your approach to a set of variables. But I think the approach itself is wrong. What -bysort BYSID (`v'): replace `v' = `v'[_n-1] if missing(`v') does is sort the observations within BYSID in numeric order by the value of `v'. Because, for numeric variables, missing value is larger than any non-missing number, all the observations with a missing value of `v' are sorted to the end, and they are then replaced by the last non-missing value in the sorted order. But the last non-missing value in the sorted order is the largest value of `v', which may not be the most recent value of `v'. I believe what you want is the most recent, not the largest, right? So for that, the approach is slightly different:

    Code:
    foreach v of varlist BYA* BYT* FY* {
        bysort BYSID (year `v'): replace `v' = `v'[_n-1] if missing(`v')
    }
    This code will use the chronologically most recent non-missing value (and if a given BYSID has more than one observation in the same year, it will use the one of largest value) to fill the missings.

    Finally, to carry the discussion to a somewhat more abstract level, this method is often called last observation carried forward (LOCF). While it enjoyed a long period of great popularity, it has lost much of its appeal in favor of other methods of dealing with missing data. Depending on the nature of your data and what you will be doing with your it, you might want to consider other approaches.
    Last edited by Clyde Schechter; 21 Nov 2021, 17:08.

    Comment


    • #3
      It works well! Thanks a lot, Mr. Schechter.

      Comment

      Working...
      X