Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing missing values with zeros (using rangestat sum)

    Dear all,

    I have an unbalanced panel data set with many missing values each firm-year observation. My goal is to create a variable called 'firm size' based on firm budget available in 5 preceding years. Using rangestat, I got the following result. (Please note that all missing years had to be filled for other variables)

    rangestat (sum) budget, interval(year -5 -1) by (firm)
    firm year budget budget_sum
    103 1999 . .
    103 2000 . 0
    103 2001 1.36E+06 0
    103 2002 . 1364182
    103 2003 . 1364182
    103 2004 . 1364182
    103 2005 . 1364182
    103 2006 . 1364182
    103 2007 8.21E+06 0
    103 2008 . 8205689.5
    103 2009 . 8205689.5
    103 2010 . 8205689.5
    103 2011 . 8205689.5
    103 2012 . 8205689.5
    103 2013 . 0
    118 2010 . .
    118 2011 5.36E+06 0
    118 2012 . 5356702.6
    118 2013 . 5356702.6
    118 2014 1.10E+07 5356702.6
    118 2015 . 16356703
    157 2003 1.20E+07 .
    157 2004 . 12000000
    157 2005 . 12000000
    157 2006 . 12000000
    157 2007 . 12000000
    157 2008 . 12000000
    157 2009 . 0
    157 2010 . 0
    157 2011 . 0
    157 2012 . 0
    The problem I have here is that those zeros highlighted in red should be replaced with missing values (.) because 0 indicates that there was no budget, whereas . indicates that information on budget was unavailable over the prior 5 years for each firm. For example, for firm 157, there is only one budget value available for 10 years, and its last four observations should be missing rather than 'zero'.


    I would like to know if I can achieve this in a simple or better way.


    Thank you in advance for your help.


    +
    The title of this post should be 'Replacing zeros with missing values (using rangestat sum)'.
    I am not allowed to change the title now. Sorry for any confusion.
    Last edited by Anna Pak; 09 Aug 2018, 13:21.

  • #2
    Well, a quick and simple approach would be to just -mvdecode budget_sum, mv(0)-. But that would be wrong if there are any instances where the sum really is zero: that is there are non missing budget values that actually add up to zero. So, anticipating that those might exist, the following approach would be robust to that:

    Code:
    gen byte has_budget = !missing(budget)
    rangestat (sum) budget has_budget, by(firm) interval(year -5 -1)
    replace budget_sum = . if has_budget_sum == 0

    Comment


    • #3
      Dear Clyde,

      Thank you so much for a quick response. I really appreciate the time you took to think about this and I find your comments to be very helpful. Have a great day!


      Best regards,

      Anna

      Comment

      Working...
      X