Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace missing values with existing observations by firm and year

    Hello,

    I have some data where the number variable has some missing values. I want to replace the missing value with the value of the same firm and year. I tried:
    Code:
    bysort firm year: replace number = number[_n-1] if missing(number)
    It works partially, but not for the case if the first value or first few values of the same firm and year is missing. I wonder how it can be improved. Thanks a lot for any suggestion

    Original data
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long firm float(year lag number)
    1 2019  2  .
    1 2019 10  9
    1 2019  3  .
    1 2019  9  .
    1 2019 32  .
    1 2019  9  .
    1 2019  3  .
    1 2019  8  .
    1 2019  3  9
    1 2020 26 14
    1 2020 24  .
    1 2020 34  .
    1 2020 22 14
    1 2020 27 14
    1 2020 43  .
    1 2020 32  .
    1 2020 25 14
    1 2020 46 14
    1 2020 12  .
    1 2020 32  .
    1 2020 33  .
    1 2020 47  .
    1 2020 24 14
    end
    Here is the data after I apply the code above
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long firm float(year lag number)
    1 2019  2  .
    1 2019 10  9
    1 2019  3  9
    1 2019  9  9
    1 2019 32  9
    1 2019  9  9
    1 2019  3  9
    1 2019  8  9
    1 2019  3  9
    1 2020 26 14
    1 2020 24 14
    1 2020 34 14
    1 2020 22 14
    1 2020 27 14
    1 2020 43 14
    1 2020 32 14
    1 2020 25 14
    1 2020 46 14
    1 2020 12 14
    1 2020 32 14
    1 2020 33 14
    1 2020 47 14
    1 2020 24 14
    end
    Last edited by Yun Cheng; 12 May 2023, 17:48.

  • #2
    This approach only makes sense if all of the non-missing values of number are the same for any given firm-year combination. The code below first verifies this is true in your data set (it is true in the example you posted), and, if so, proceeds to fill in all missing values with the unique non-missing value.
    Code:
    by firm year (number), sort: assert number == number[1] | missing(number)
    by firm year (number): replace number = number[1]

    Comment


    • #3
      Hi Clyde
      Thanks for your help. The code works well. I forget to say that in my data, it is indeed the case that all of the non-missing values of number are the same for any given firm-year combination.

      Comment

      Working...
      X