I have a large and messy dataset of the form
...in which there there are many more products and firms. I know from other information that Firm C discontinued production in year 4, so the subsequent null values are legitimate. But I also know that the nulls and zeroes for Firms A and B are data errors. Are there any strategies to identify the anomalous values such as by looking for nulls or zeroes that are adjacent to nonzero values? Perhaps by excluding data runs?
TIA!
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str1 Product byte(Year FirmA FirmB FirmC) "P" 1 . 0 19 "P" 2 . 26 12 "P" 3 8 14 12 "P" 4 . 12 0 "P" 5 0 12 . "P" 6 12 33 . "P" 7 18 22 . "P" 8 12 13 . "P" 9 16 17 . "P" 10 12 13 . "P" 11 9 18 . "P" 12 10 15 . "P" 13 11 11 . "P" 14 12 27 . "P" 15 13 40 . "P" 16 11 31 . "P" 17 15 23 . "P" 18 17 30 . "P" 19 18 20 . "P" 20 8 21 . "P" 21 13 23 . "P" 22 17 25 . "P" 23 21 23 . "P" 24 27 . . end
TIA!
Comment