Using Stata to identify data errors

Welch Suggs

Join Date: Apr 2015

Posts: 17
#1

Using Stata to identify data errors

03 Oct 2016, 11:45

I have a large and messy dataset of the form

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str1 Product byte(Year FirmA FirmB FirmC) "P" 1 . 0 19 "P" 2 . 26 12 "P" 3 8 14 12 "P" 4 . 12 0 "P" 5 0 12 . "P" 6 12 33 . "P" 7 18 22 . "P" 8 12 13 . "P" 9 16 17 . "P" 10 12 13 . "P" 11 9 18 . "P" 12 10 15 . "P" 13 11 11 . "P" 14 12 27 . "P" 15 13 40 . "P" 16 11 31 . "P" 17 15 23 . "P" 18 17 30 . "P" 19 18 20 . "P" 20 8 21 . "P" 21 13 23 . "P" 22 17 25 . "P" 23 21 23 . "P" 24 27 . . end

...in which there there are many more products and firms. I know from other information that Firm C discontinued production in year 4, so the subsequent null values are legitimate. But I also know that the nulls and zeroes for Firms A and B are data errors. Are there any strategies to identify the anomalous values such as by looking for nulls or zeroes that are adjacent to nonzero values? Perhaps by excluding data runs?

TIA!
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#2

03 Oct 2016, 12:48

See this FAQ by Nick Cox and Vince Wiggins on how to identify runs in a time series.

http://www.stata.com/support/faqs/da...-observations/

Or alternatively, install tsspell from SSC and read the help files

Code:

ssc install tsspell help tsspell

You then just have to check conditions such as

Code:

list if runA< 2 & FirmA==. list if runA< 2 & FirmA==0

etc.
Comment
Welch Suggs

Join Date: Apr 2015

Posts: 17
#3

03 Oct 2016, 12:58

Thank you!
Comment

Announcement

Using Stata to identify data errors

Comment

Comment