Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linking variables that are named differently across different datasets

    Hi There,

    I have been trying to solve this problem for a while and just can't figure it out. I am using household surveys for the last 15 years published each year with the same variables but they are the same variable. I would like to perform an analysis per year on a long list of variables but in order to do this I have to rename each variable in year dataset that is different. For example if I want to rename one variable I would write the code

    global years "10 11 12 13 14 15 16 17 18 19 20 21"

    foreach X of global years {
    use "$Data\GHS\GHS_merged_20`X'.dta", clear
    numlabel, add

    * Connection to mains
    cap rename Q330aMains eng_mains
    cap rename Q327aMains eng_mains
    cap rename Q328AMAINS eng_mains
    cap rename Q528aMains eng_mains
    cap rename ENG_MAINS eng_mains
    tab eng_mains
    gen byte mains=eng_mains==1
    tab mains

    save "$Data\GHS\GHS_cleaned_20`X'.dta", replace

    }

    These variables are all the same and there are several variables that I have to rename. The label names seem to be aligned but not the variable names. I wanted to know if there was a simple way to rename all variables with a specific label of if there was a way to link the same variable that has been named differently across the same household survey in order to link the datasets over time.


  • #2
    If you know the variable label, you can use ds to identify the variable. Only issue is that variable names are distinct whereas variable labels need not be.

    Code:
    sysuse auto, clear
    qui ds, has(varlab "Car type")
    rename `r(varlist)' whatever
    Res.:

    Code:
    . desc whatever
    
                  storage   display    value
    variable name   type    format     label      variable label
    -------------------------------------------------------------------------------------------------------------------------------------
    whatever        byte    %8.0g      origin     Car type

    Comment


    • #3
      Here I am guessing that label means "variable label" not "value label". The command ds will find all variables with a variable label matching a pattern. Here is an example.

      Code:
      .  sysuse auto, clear
      (1978 automobile data)
      
      . d
      
      Contains data from C:\Program Files\Stata17\ado\base/a/auto.dta
       Observations:            74                  1978 automobile data
          Variables:            12                  13 Apr 2020 17:45
                                                    (_dta has notes)
      -----------------------------------------------------------------------
      Variable      Storage   Display    Value
          name         type    format    label      Variable label
      -----------------------------------------------------------------------
      make            str18   %-18s                 Make and model
      price           int     %8.0gc                Price
      mpg             int     %8.0g                 Mileage (mpg)
      rep78           int     %8.0g                 Repair record 1978
      headroom        float   %6.1f                 Headroom (in.)
      trunk           int     %8.0g                 Trunk space (cu. ft.)
      weight          int     %8.0gc                Weight (lbs.)
      length          int     %8.0g                 Length (in.)
      turn            int     %8.0g                 Turn circle (ft.)
      displacement    int     %8.0g                 Displacement (cu. in.)
      gear_ratio      float   %6.2f                 Gear ratio
      foreign         byte    %8.0g      origin     Car origin
      -----------------------------------------------------------------------
      Sorted by: foreign
      
      . ds, has(varlabel *origin*)
      foreign
      Other way round, if your problem is about value labels, then there are options for that but findname (Stata Journal) has even more.

      It is marginal to your question but many Stata programmers would write

      Code:
      global years "10 11 12 13 14 15 16 17 18 19 20 21"
      
      foreach X of global years {
      as

      Code:
      forval X = 10/21 {
      but that is a question of style only.

      Comment

      Working...
      X