Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Check datasets for variables

    (Apologies if this is a double post, I just tried to post this but I can't find my topic.)


    Hi everyone,

    I'm working on a project using 23 years of data, each with its own dataset. There have been differences in the data collected over the years, and this is reflected in the variables recorded. I've spent loads of time manually creating a spreadsheet to illustrate which variables are available in which years, as in the screenshot below.

    I imagine that surely there must be a way to do this automatically, to have Stata produce a table like this (albeit without my fancy conditional formatting), but I have no idea how.

    Any suggestions that might help me avoid this tedium in future?

    Click image for larger version

Name:	Screen Shot 2016-06-09 at 19.38.56.png
Views:	2
Size:	31.7 KB
ID:	1344690

  • #2
    So something like this:

    Code:
    local filelist: dir "." files "*.dta" // MODIFY TO YOUR SITUATION
    
    clear
    tempfile building
    save `building', emptyok
    
    foreach f of local filelist {
        use `"`f'"', clear
        // HERE INSERT CODE TO EXTRACT
        // YEAR FROM FILENAME `f' INTO
        // LOCAL MACRO year
        describe, replace
        gen year = `year'
        append using `building'
        save, `"`building'"', replace
    }
    
    use `building', clear
    keep name year
    gen byte present_ = 1
    reshape wide present, i(year) j(name) string
    rename present_* *
    
    export excel using variable_year_crosswalk.xlsx, clear firstrow(variables)
    The resulting spreadsheet variable_year_crosswalk.xlsx will have a matrix of years X variable names along the lines you showed. It will not be color coded: I don't think that can be managed within Stata. But there will be a 1 in cells corresponding to a year when a variable is present, and blank otherwise. I guess you can convert that to color-coding from within Excel. (I don't use Excel much, and my Excel skills are rather limited.)

    Comment


    • #3
      See also missingplot (http://www.stata.com/statalist/archi.../msg00154.html) once you've sorted on year.

      EDIT: No, that is for single datasets. Not the problem set!
      Last edited by Nick Cox; 09 Jun 2016, 15:19.

      Comment

      Working...
      X