Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reapplying variable labels after reshape wide

    Hello!

    My variable labels do not "carry over" to the new variables after I reshape my datasets from long to wide, and I am having trouble saving, retrieving, and reapplying the variable labels to the reshaped data. My value labels are retained; it is only the variable labels that are not. I would like, for example, the variable label for each original variable ev901_ to be applied to each new variable ev901_1...ev901_n. Ideally, it would be great to have the original label plus a suffix or prefix identifying whether it is _1, _2,..._n, but I would be satisfied with the same original variable label applied to all reshaped variables. I am hoping someone here can help me trouble shoot my coding or suggest an alternative approach.

    I am reshaping, in a loop, 12 datasets (each for a different country). The number of reshaped variables (the _n above) ranges from 6-11 depending on the dataset. My basic structure of my do file is as follows:
    Code:
    **********************
    *****    SET UP
    **********************
    
    **Set local variables for country names, then file numbers of IR and EV files,
    
    local cnames  "KH EG HN JO KE KY MW TJ TZ UG ZM ZW"
    local IRnames "72 61 62 6C 70 61 61 61 63 60 61 62"
    local EVnames "72 61 62 6C 70 61 61 61 63 60 61 62"
    
    **Set index
    local i = 0
    
    **Set number of countries - here = 12 /*WILL NEED TO UPDATE IF ADDING GUATEMALA*/
    forvalues x = 1/12 {
        local i = `i' + 1
        ** get names of country, IR, and EV files for country number x
        local  c : word `i' of `cnames'
        local IR : word `i' of `IRnames'
        local EV : word `i' of `EVnames'
    
    *Start from EV file
    use "$path`c'EV12`EV'.dta", clear
    
    ********************
    ***RESHAPE: long to wide EV file
    ********************
    
    *Drop unnecessary variables
    drop v005 v007 v008 v011 v017 v018 v019 v101 v102 v106 ///
        ev906-ev917 ev913a ev902a
    
    **Add suffix "_" to var stubs
    rename (ev* cmcclock numevents endmo startmo1 startmo obsmo obsmoend obsdur mfporno start startstate) =_
    rename (evid_) (evid)
    
    **Reshape
    reshape wide ev004_ ev9* cmcclock_ eventorder_ evinwindow_ numevents_ endmo_ start* obs* mfporno_, i(survey v000 v001 v002 v003 caseid) j(evid)
    
    save "$path`c'EVw.dta", replace    
    }
    I have tried a couple of approaches, unsuccessfully, to apply the variable labels, as follows.

    First, I adapted some code that I found in a response to a post (which I can no longer locate) in which someone had the same problem when reshaping their data from wide to long. What I did was add this code BEFORE the reshape command:
    Code:
    **Save variable labels in local macro to reapply after reshaping
    ds ev* survey cmcclock numevents endmo startmo1 startmo obsmo obsmoend obsdur mfporno start startstate
    local ev_vars `r(varlist)'
        foreach v of varlist `ev_vars' {
            local var_label_`v': var label `v'
        }
    (I should note that at the time that I ran this code, all my variables were named as listed immediately above; I had not added the "_" suffix to them as I now do in the first code box.) Then I ran the reshape command, and then added this code AFTER the reshape process:
    Code:
    **Retrieve and reapply var labels
        foreach d of local ev_vars {
            foreach v of varlist `e'* {
                local number: subinstr local v "`e'" ""
                label var `v' `"`var_label_`e'' `number'"'
            }
        }
    This appeared to work, in that Stata returned no error messages when I ran the code. However, the variable labels were not applied to the reshaped variables.

    Next, I tried to adapt some code described in this FAQ: http://www.stata.com/support/faqs/da...after-reshape/
    However, I had difficulty separating the code relevant to applying value labels (which I don't need) from the code for applying variable labels and abandoned my attempt.

    Third, I tried a bit more of a brute force approach. I defined a program with the variable labels in it as follows:
    Code:
    program define make_labels
    local li=1
    while `li'<=maxli{
        lab var ev004_`li'="Event number"
        lab var ev900_`li'="CMC event begins"
        lab var ev901_`li'="CMC event ends"
        lab var ev901a_`li'="Duration of event"
        lab var ev902_`li'="Event code"
        lab var ev903_`li'="Discontinuation code"
        lab var ev904_`li'="Previous event"
        lab var ev905_`li'="Next event"
        lab var cmcclock_`li'="CMC start of observation period"
        lab var eventorder_`li'="timing of event relative to clock start (12mos)"
        lab var evinwindow_`li'="Event occurs in observation period"
        lab var numevents_`li'="Total number of events woman experiences in observation period"
        lab var endmo_`li'="Month before interview in which event ended"
        lab var startmo1_`li'="Month before interview in which event started"
        lab var startmo_`li'="Month before interview within observation period in which event started"
        lab var obsmo_`li'="Month in observation period in which event started"
        lab var obsmoend_`li'="Month in observation period in which event ended"
        lab var obsdur_`li'="Duration of event within observation period"
        lab var mfporno_`li'="Event is modern temporary contraception or not"
        lab var startstate_`li'="Modern contraceptive use is state at clock start (12mos)"
        
        li=`li'+1
        }
    end
    The program is defined BEFORE the loop opens, around the place that I set the locals for the file names (cnames, IRnames, EVnames). I also added this line where I set those locals:
    Code:
    **Set local for max li value (number of episodes--evid--in datafile) for labeling vars
    local linames "7 10 11 10 7 9 7 9 7 9 7 6"
    And this line following the forvalues statement, following the similar line for local EV:
    Code:
        ** get max li for country number x
        local maxli : word `i' of `linames'
    The defined program is then run after the reshape command.

    This strategy didn't work. When it is run exactly this way, I get the error "maxli not found" when the make_labels program tries to run. I tried removing "word" from the line of code reading "local maxli : word `i' of `linames'" (since I want this to be a value and not a string), but then I get the error "1 not allowed". I also commented out both the local linames "..." and the local maxli : word... lines of code and changed the program definition to read while `li'<=11{ (since 11 is the highest maxli in any of the datasets). Doing this gave me an invalid syntax error.

    I tried a couple of other ways to set maxli as well. I tried:
    Code:
    *Set maxli for each survey (used in make_labels program)
        /*Will need to UPDATE if Guatemala is added*/
    if v000=="ZW"{
        scalar maxli=6
        }
    if inlist(v000,"KH","KE","MW","TZ","ZM"){
        scalar maxli=7
        }
    if inlist(v000,"KY","TJ","UG"){
        scalar maxli=9
        }
    if inlist(v000,"EG","JO"){
        scalar maxli=10
        }
    if v000=="HN" {
        scalar maxli=11
        }
    and
    Code:
    scalar maxli=6 if `c'==ZW
    scalar maxli=7 if `c'==KH | `c'==KE | `c'==MW | `c'==TZ | `c'==ZM
    scalar maxli=9 if `c'==KY | `c'==TJ | `c'==UG
    scalar maxli=10 if `c'==EG | `c'==JO
    scalar maxli=11 if `c'==HN
    and this (which is essentially the same thing)
    Code:
    scalar maxli=6 if v000=="ZW"
    scalar maxli=7 if v000=="KH" | v000=="KE" | v000=="MW" | v000=="TZ" | v000=="ZM"
    scalar maxli=9 if v000=="KY" | v000=="TJ" | v000=="UG"
    scalar maxli=10 if v000=="EG" | v000=="JO"
    scalar maxli=11 if v000=="HN
    In each of these cases, Stata balked at the if statement.

    Any guidance or suggestions would be most welcome!

    Thanks,
    Kerry MacQuarrie
    Last edited by K MacQuarrie; 05 May 2016, 09:28.

  • #2
    Kerry, Here's how I would approach this problem:

    Code:
    clear
    webuse reshape1
    reshape long inc ue, i(id) j(year)
    
    
    lab var id "Id number"
    lab var year "Year"
    lab var sex "Gender"
    lab var inc "Monthly Income"
    lab var ue "Don't Know this Variable"
    
    
    **NOW YOU HAVE A LABELED WIDE DATASET
    
    *below is a varlist of the variables to be reshaped
    *you can use unab or some other way to get this varlist
    
    local vlist "inc ue"
    local j "year"
    
    *Create label for each variable in vlist for each level of J
    
    levelsof `j', local(J)
    foreach var of varlist `vlist' {
        foreach j of local J {
            local newlist `newlist' `var'`j'
            local lablist "`lablist' `"`:variable label `var'' (`j')"'"
            }
        }
    
    *RESHAPE
    
    reshape wide inc ue, i(id) j(year)
    noi di "`newlist'"
    noi di `"`lablist'"'
    
    *LABEL the new variables
    
    foreach new of local newlist {
        gettoken lab lablist : lablist
        lab var `new'  "`lab'"
        }
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Hi Carole
      thank you for your kind instructions
      can you please share the same codes to preserve the variable labels while you are reshaping your wide data to long one
      thanks in advance for your kind response

      Comment


      • #4
        I would recommend trying greshape instead of reshape.

        Code:
        ssc install gtools

        Comment

        Working...
        X