Dear all,
I have an unbalanced panel data set with many missing values each firm-year observation. My goal is to create a variable called 'firm size' based on firm budget available in 5 preceding years. Using rangestat, I got the following result. (Please note that all missing years had to be filled for other variables)
rangestat (sum) budget, interval(year -5 -1) by (firm)
The problem I have here is that those zeros highlighted in red should be replaced with missing values (.) because 0 indicates that there was no budget, whereas . indicates that information on budget was unavailable over the prior 5 years for each firm. For example, for firm 157, there is only one budget value available for 10 years, and its last four observations should be missing rather than 'zero'.
I would like to know if I can achieve this in a simple or better way.
Thank you in advance for your help.
+
The title of this post should be 'Replacing zeros with missing values (using rangestat sum)'.
I am not allowed to change the title now. Sorry for any confusion.
I have an unbalanced panel data set with many missing values each firm-year observation. My goal is to create a variable called 'firm size' based on firm budget available in 5 preceding years. Using rangestat, I got the following result. (Please note that all missing years had to be filled for other variables)
rangestat (sum) budget, interval(year -5 -1) by (firm)
firm | year | budget | budget_sum |
103 | 1999 | . | . |
103 | 2000 | . | 0 |
103 | 2001 | 1.36E+06 | 0 |
103 | 2002 | . | 1364182 |
103 | 2003 | . | 1364182 |
103 | 2004 | . | 1364182 |
103 | 2005 | . | 1364182 |
103 | 2006 | . | 1364182 |
103 | 2007 | 8.21E+06 | 0 |
103 | 2008 | . | 8205689.5 |
103 | 2009 | . | 8205689.5 |
103 | 2010 | . | 8205689.5 |
103 | 2011 | . | 8205689.5 |
103 | 2012 | . | 8205689.5 |
103 | 2013 | . | 0 |
118 | 2010 | . | . |
118 | 2011 | 5.36E+06 | 0 |
118 | 2012 | . | 5356702.6 |
118 | 2013 | . | 5356702.6 |
118 | 2014 | 1.10E+07 | 5356702.6 |
118 | 2015 | . | 16356703 |
157 | 2003 | 1.20E+07 | . |
157 | 2004 | . | 12000000 |
157 | 2005 | . | 12000000 |
157 | 2006 | . | 12000000 |
157 | 2007 | . | 12000000 |
157 | 2008 | . | 12000000 |
157 | 2009 | . | 0 |
157 | 2010 | . | 0 |
157 | 2011 | . | 0 |
157 | 2012 | . | 0 |
I would like to know if I can achieve this in a simple or better way.
Thank you in advance for your help.
+
The title of this post should be 'Replacing zeros with missing values (using rangestat sum)'.
I am not allowed to change the title now. Sorry for any confusion.
Comment