Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge while keeping value labels in master and using dataset unchanged

    Hello!

    I am trying to merge two datasets from a survey which contains household ((HH) level data with different variables and thus value labels. First of all, I had to convert these datasets from SPSS to Stata, and then merging them through its common identificator name as, let's say, id. Thus, I used the following command:

    use household.dta, clear

    merge 1:1 id using using.dta

    This succesfully merges the two datasets, but the problem is that the labels of the variables are changed. For example, before merging the datasets, we could have two variables as follows:

    HH_A: Categorical variable in master dataset that takes the values of
    1 = "X"
    2 = "Y"
    3 = "Z"

    HH_1: Categorical variable in using dataset that takes the values of
    1 = "A"
    2 = "B"
    3 = "C"

    When both datasets are merged, HH_1 wrongly takes the value labels of the variable in the master dataset, so that for the values that are equal to 1 the label showed in the merged dataset is X, for 2 "Y" and for 3 "Z", which does not make any sense. How could I fix this? I already tried to reformulate the command so that:

    use using.dta, clear

    merge 1:1 id using household.dta, nolabel

    as nolablel is supposed to order Stata not to copy value-label definitions from using, but this does not work either and the value labels are mixed again but the other way around.

    I thank you for the guidance in advance!

    Best,
    Michelle

  • #2
    Here is a toy example of what might be the problem that you describe:

    Code:
    // the toy-using data
    clear
    input id HH_1
    1 1
    2 2
    3 3
    end
    
    label define HH 1 "A" 2 "B" 3 "C"
    label values HH_1 HH
    
    tempfile using
    save "`using'"
    
    // the toy master
    clear
    input id HH_A
    1 1
    2 2
    3 3
    end
    
    label define HH 1 "X" 2 "Y" 3 "Z"
    label values HH_A HH
    
    // merge
    merge 1:1 id using "`using'"
    
    list
    The final output is

    Code:
    . list
    
         +--------------------------------+
         | id   HH_A   HH_1        _merge |
         |--------------------------------|
      1. |  1      X      X   matched (3) |
      2. |  2      Y      Y   matched (3) |
      3. |  3      Z      Z   matched (3) |
         +--------------------------------+
    What happens is that the master and using dataset share value labels with the same name.


    One possible solution is to change the value label names in one of the datasets before merge-ing. elabel (SSC) makes this easy.

    Code:
    elabel rename (*) (*_master)
    will add _master as a suffix to all value label names in the dataset. Here is our example revised:

    Code:
    // install -elabel- from SSC; do this once only
    *ssc install elabel
    
    // the toy-using data
    clear
    input id HH_1
    1 1
    2 2
    3 3
    end
    
    label define HH 1 "A" 2 "B" 3 "C"
    label values HH_1 HH
    
    /* new */
    // make sure there are no value labels named with suffix _master
    elabel dir *_master
    assert ("`r(names)'" == "")
    /* *** */
    
    tempfile using
    save "`using'"
    
    // the toy master
    clear
    input id HH_A
    1 1
    2 2
    3 3
    end
    
    label define HH 1 "X" 2 "Y" 3 "Z"
    label values HH_A HH
    
    /* new */
    // add suffix _master to all value label names
    elabel rename (*) (*_master)
    /* *** */
    
    // merge
    merge 1:1 id using "`using'"
    
    list
    And the final output is

    Code:
    . list
    
         +--------------------------------+
         | id   HH_A   HH_1        _merge |
         |--------------------------------|
      1. |  1      X      A   matched (3) |
      2. |  2      Y      B   matched (3) |
      3. |  3      Z      C   matched (3) |
         +--------------------------------+

    Comment


    • #3
      Thank you very much Daniel! Your example fit exactly within my question and thus solved the problem of mixing value labels when merging both databases

      Best,
      Michelle

      Comment

      Working...
      X