Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data validation through looping

    Dear StataList.

    As I'm new to STATA and haven't used any commands (mainly been copy-pasting from Excel to STATA), then I am looking for a way to make some automatic data validation of my data.

    When copying from one source to another a "," or "." could get misplaced, and make a value such as -1.4652 to -14652 - hence I'm looking for a way to loop through all my data and check for values over/under X-value.

    Pseudo-code in Python:

    for rows in data:
    for cell in rows:
    if cell > X:
    print "this value is probably wrong"

    Same with columns of course, but you get the picture.

    How would I go about this?

    Thanks in advance.

    - Martin.

  • #2
    I would start with codebook, compact, this will among others give you the minimum and maximum value for each variable, so you can quickly trim down your list of suspicious variables.

    Alternatively you can do something like this:
    Code:
    sysuse auto, clear
    ds, has(type numeric)
    local lb = -10000
    local ub =  10000
    gen byte mark = 0
    foreach var of varlist `r(varlist)' {
        replace mark = !inrange(`var', `lb', `ub') if mark == 0 & !missing(`var')
    }
    list if mark
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thank you for the quick answer, Maarten.
      I'll try your solution when I find time.

      Comment


      • #4
        Bill Rising from StataCorp actually put together a program for building validation rules into your datasets: http://www.stata.com/meeting/13uk/ri...alk.beamer.pdf

        Comment

        Working...
        X