Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Modifying labels when appending datasets

    I am appending datasets in which the same variable have different label values. In the example below is a simplification of the problem. In the first dataset the variable -prod_id- has values according to the label cerais__id (which ranges from 1 to 22) whereas this variable has values according to the label acucar__id in the second dataset (which ranges from 114 to 115). How can I have a code that attaches, after appending, a value label to variable -prod_id- that contains all values defined in both datasets (e.g. 1-22 as well as 114-115)?
    Many thanks!!

    Code:
    . use cerais_2, clear
    
    . des prod_id
    
                  storage   display    value
    variable name   type    format     label      variable label
    ----------------------------------------------------------------------------------------------------------------------------------------
    prod_id         double  %6.2f      cerais__id
                                                  Roster instance identifier
    
    . label list cerais__id
    cerais__id:
               1 Arroz local de bolanha
               2 Arroz local de Npampam
               3 Arroz importado perfumado
               4 Arroz importado simples
               5 Milho bacil em espiga
               6 Milho bacil em grão
               7 Milho preto
               8 Milho cavalo/Sorgo
              10 Fundo
              12 Farinha de milho bacil
              13 Farinha de milho preto
              14 Farinha de trigo local ou importado
              16 Massas alimentícias
              17 Pão moderno
              18 Pão tradicional
              19 Croissants
              20 Bolachas
              21 Bolos
              22 Donetes/panguetes, panquecas
    
    . 
    . use acucar_2, clear
    
    . des prod_id
    
                  storage   display    value
    variable name   type    format     label      variable label
    ----------------------------------------------------------------------------------------------------------------------------------------
    prod_id         double  %6.2f      acucar__id
                                                  Roster instance identifier
    
    . label list acucar__id
    acucar__id:
             114 Açúcar (em pó ou cubos)
             115 Mel
    
    . 
    . use cerais_2, clear
    
    . append using acucar_2
    
    . des prod_id
    
                  storage   display    value
    variable name   type    format     label      variable label
    ----------------------------------------------------------------------------------------------------------------------------------------
    prod_id         double  %6.2f      cerais__id
                                                  Roster instance identifier
    
    . 
    end of do-file
    
    .

  • #2
    This is a common problem.

    The simplest solution is to -decode- the variable in each data set separately, creating string variables out of them. Drop the numeric variables. Then append the data sets. Then -encode- the string variable and you will have one combined value label covering all the values, and assuring that it is consistently labeled across the two data sets.

    Comment


    • #3
      Although I would not be sure about simplicity, Clyde's approach is probably the preferred one. If you wanted to keep the original coding, you could also do something like

      Code:
      use acucar_2 , clear // dataset to be appended
      tempname mylabels // temporary do-file for value label definitions
      label copy acucar__id cerais__id // create a "renamed" value label
      label save cerais__id using "`mylabels'" // save value lable definitions
      label drop cerais__id // optionally drop the "renamed" label
      use cerais_2 , clear // master dataset
      append using acucar_2 // append
      do "`mylabels'" // modify the value label
      Best
      Daniel

      Comment


      • #4
        Many thanks, Clyde and Daniel!! I will use Daniel's proposition as I want to use the original code values. When decoding and then coding, new code values are generated.

        As I have 10 datasets which I would like to append, I try writing Daniel's code inside a foreach. However, I got error r(950) even though my data is not large. Could you please help me find what I did wrong?

        Code inside a foreach:
        Code:
        local index a b c d e f g h i k
        local j = 1
        
        foreach k in cerais carne peixe leite oleos fruta legumes leguminosa acucar bebidas {
              use `k', clear
        
              * Standarizing variable names across datasets
              rename `k'__id prod_id
              local J : word `j' of `index'
              rename (s18Bq02`J' s18Bq02`J'_uni s18Bq02`J'_tam s18Bq03`J' s18Bq04`J' s18Bq04`J'_uni s18Bq04`J'_tam s18Bq05`J') ///
              (s18Bq02 s18Bq02_uni s18Bq02_tam s18Bq03 s18Bq04 s18Bq04_uni s18Bq04_tam s18Bq05)
              save `k'_2.dta, replace
          
              tempname mylabels // temporary do-file for value label definitions
              label copy `k'__id cerais__id, replace // create a "renamed" value label
              label save cerais__id using "`mylabels'", replace // save value lable definitions
              label drop cerais__id // optionally drop the "renamed" label
        
              local ++j
        }
        
        
        use cerais_2 , clear // master dataset
        
        foreach k in carne peixe leite oleos fruta legumes leguminosa acucar bebidas {
              append using `k'_2 // append
              do "`mylabels'" // modify the value label
        }
        
        save consumption.dta, replace

        Stata output
        Code:
        . local index a b c d e f g h i k
        
        . local j = 1
        
        .
        . foreach k in cerais carne peixe leite oleos fruta legumes leguminosa acucar bebidas {
          2.       use `k', clear
          3.
        .           rename `k'__id prod_id
          4.           local J : word `j' of `index'
          5.           rename (s18Bq02`J' s18Bq02`J'_uni s18Bq02`J'_tam s18Bq03`J' s18Bq04`J' s18Bq04`J'_uni s18Bq04`J'_tam s18Bq05`J') ///
        >           (s18Bq02 s18Bq02_uni s18Bq02_tam s18Bq03 s18Bq04 s18Bq04_uni s18Bq04_tam s18Bq05)
          6.           save `k'_2.dta, replace
          7.  
        .           tempname mylabels // temporary do-file for value label definitions
          8.           label copy `k'__id cerais__id, replace // create a "renamed" value label
          9.           label save cerais__id using "`mylabels'", replace // save value lable definitions
         10.           label drop cerais__id // optionally drop the "renamed" label
         11.
        .           local ++j
         12. }
        file cerais_2.dta saved
        r(950);
        
        end of do-file
        
        r(950);
        
        .
        Last edited by Paula de Souza Leao Spinola; 16 Apr 2019, 06:34.

        Comment


        • #5
          Edit

          Forget most of my previous (this, now edited) post. I think I might have messed this up. Look into your working directory. You will find lots of files like

          Code:
          __000000.do
          __000001.do
          Erase all those files. Well, inspect one or two to see that you do not need them, then erase them.

          I told you to code

          Code:
          tempname mylabels // <- temporary _name_
          label save ... using `mylabels'
          when I should have coded

          Code:
          tempfile mylabel // <- temporary _file_
          label save ... `mylabels'
          Also, use more than one temporary file name. In your main code try something along the lines

          Code:
          foreach k in cerais carne peixe leite oleos fruta legumes leguminosa acucar bebidas {
                use `k', clear
                ...
                save `k'_2.dta, replace
                
                local ++j // move up
                
                tempfile mylabels`j' // <- make the tempfile names unique
                ...
                label save cerais__id using "`mylabels`j''" // <- note placement of `j'
          }
          then later

          Code:
          local j 1 // <- reset j
          foreach k in carne peixe leite oleos fruta legumes leguminosa acucar bebidas {
                append using `k'_2 // append
                do "`mylabels`++j''" // <- note placement of `j'
          }
          Best
          Daniel
          Last edited by daniel klein; 16 Apr 2019, 07:17.

          Comment


          • #6
            Many thanks Daniel. I found out the problem. The issue was that I should not include the master dta (cerais.dta) in the first loop. The code below works perfectly!


            Code:
            . local index b c d e f g h i k
            
            . local j = 1
            
            . 
            . foreach k in carne peixe leite oleos fruta legumes leguminosa acucar bebidas {
              2.       use `k', clear
              3. 
            .           * Standarizing variable names across datasets
            .           rename `k'__id prod_id
              4.           local J : word `j' of `index'
              5.           rename (s18Bq02`J' s18Bq02`J'_uni s18Bq02`J'_tam s18Bq03`J' s18Bq04`J' s18Bq04`J'_uni s18Bq04`J'_tam s18Bq05`J') ///
            >           (s18Bq02 s18Bq02_uni s18Bq02_tam s18Bq03 s18Bq04 s18Bq04_uni s18Bq04_tam s18Bq05)
              6.           save `k'_2.dta, replace 
              7.           local ++j
              8. 
            .           tempname mylabels_`k' // temporary do-file for value label definitions
              9.           label copy `k'__id cerais__id, replace // create a "renamed" value label
             10.           label save cerais__id using "`mylabels_`k''", replace // save value lable definitions
             11.           label drop cerais__id // optionally drop the "renamed" label
             12. }
            file carne_2.dta saved
            file __000000.do saved
            file peixe_2.dta saved
            file __000001.do saved
            file leite_2.dta saved
            file __000002.do saved
            file oleos_2.dta saved
            file __000003.do saved
            file fruta_2.dta saved
            file __000004.do saved
            file legumes_2.dta saved
            file __000005.do saved
            file leguminosa_2.dta saved
            file __000006.do saved
            file acucar_2.dta saved
            file __000007.do saved
            file bebidas_2.dta saved
            file __000008.do saved
            
            . 
            . use cerais , clear // master dataset
            
            .       rename cerais__id prod_id
            
            .           rename (s18Bq02a s18Bq02a_uni s18Bq02a_tam s18Bq03a s18Bq04a s18Bq04a_uni s18Bq04a_tam s18Bq05a) ///
            >           (s18Bq02 s18Bq02_uni s18Bq02_tam s18Bq03 s18Bq04 s18Bq04_uni s18Bq04_tam s18Bq05)
            
            .           
            . foreach k in carne peixe leite oleos fruta legumes leguminosa acucar bebidas {
              2.           append using `k'_2 // append
              3.       do "`mylabels_`k''" // modify the value label
              4. }
            
            . label define cerais__id 23 `"Carne de vaca"', modify
            
            . label define cerais__id 25 `"Carne de carneiro"', modify
            
            . label define cerais__id 26 `"Carne de cabra"', modify
            
            . label define cerais__id 27 `"Miudezas e tripas (fígado, rins, etc.)"', modify
            
            . label define cerais__id 28 `"Carne de porco"', modify
            
            . label define cerais__id 29 `"Galinha vivo"', modify
            
            . label define cerais__id 30 `"Carne de galinha"', modify
            
            . 
            end of do-file
            
            . label define cerais__id 35 `"Tainha fresco"', modify
            
            . label define cerais__id 36 `"Djafal fresco"', modify
            
            . label define cerais__id 37 `"Carapau fresco"', modify
            
            . label define cerais__id 38 `"Bagre fresco"', modify
            
            . label define cerais__id 39 `"Bagre fumado"', modify
            
            . label define cerais__id 40 `"Bentaninha fumado (selebesonhe)"', modify
            
            . label define cerais__id 41 `"Peixe seco"', modify
            
            . label define cerais__id 42 `"Caranguejos, camarão e outros mariscos"', modify
            
            . 
            end of do-file
            
            . label define cerais__id 44 `"Leite fresco"', modify
            
            . label define cerais__id 45 `"Leite coalhado, iogurte"', modify
            
            . label define cerais__id 48 `"Leite em pó"', modify
            
            . label define cerais__id 49 `"Queijo"', modify
            
            . label define cerais__id 52 `"Ovos"', modify
            
            . 
            end of do-file
            
            . label define cerais__id 53 `"Manteiga"', modify
            
            . label define cerais__id 54 `"Manteiga de karite"', modify
            
            . label define cerais__id 55 `"Óleo de palma"', modify
            
            . label define cerais__id 56 `"Óleo de mancarra"', modify
            
            . label define cerais__id 58 `"Óleo de soja"', modify
            
            . 
            end of do-file
            
            . label define cerais__id 60 `"Manga"', modify
            
            . label define cerais__id 61 `"Ananás"', modify
            
            . label define cerais__id 63 `"Banana doce"', modify
            
            . label define cerais__id 64 `"Limão"', modify
            
            . label define cerais__id 66 `"Abacate"', modify
            
            . label define cerais__id 67 `"Melancia, melão"', modify
            
            . label define cerais__id 68 `"Cajú"', modify
            
            . label define cerais__id 69 `"Noz de coco"', modify
            
            . 
            end of do-file
            
            . label define cerais__id 72 `"Salada (alface)"', modify
            
            . label define cerais__id 73 `"Repolho"', modify
            
            . label define cerais__id 74 `"Cenoura"', modify
            
            . label define cerais__id 75 `"Feijão verde"', modify
            
            . label define cerais__id 76 `"Pepino"', modify
            
            . label define cerais__id 77 `"Beringela, abóbora/curgete"', modify
            
            . label define cerais__id 78 `"Pimento fresco"', modify
            
            . label define cerais__id 79 `"Tomate fresco"', modify
            
            . label define cerais__id 81 `"Candja fresco"', modify
            
            . label define cerais__id 82 `"Candja seco"', modify
            
            . label define cerais__id 83 `"Cebola fresca"', modify
            
            . label define cerais__id 84 `"Alho"', modify
            
            . label define cerais__id 85 `"Folhas de azeda (badjique)"', modify
            
            . label define cerais__id 86 `"Folhas de cabaceira"', modify
            
            . label define cerais__id 91 `"Concentrado de tomate"', modify
            
            . 
            end of do-file
            
            . label define cerais__id 92 `"Ervilhas"', modify
            
            . label define cerais__id 93 `"Ervilhas secas"', modify
            
            . label define cerais__id 96 `"Mancarra frescos com casca"', modify
            
            . label define cerais__id 97 `"Mancarra secos com casca"', modify
            
            . label define cerais__id 98 `"Mancarra sem casca ou pilada"', modify
            
            . label define cerais__id 99 `"Mancarra torrada"', modify
            
            . label define cerais__id 100 `"Mancarra ralada"', modify
            
            . label define cerais__id 101 `"Sésamo"', modify
            
            . label define cerais__id 102 `"Castanha de cajú"', modify
            
            . label define cerais__id 104 `"Mandioca"', modify
            
            . label define cerais__id 105 `"Inhame"', modify
            
            . label define cerais__id 106 `"Banana plantain"', modify
            
            . label define cerais__id 107 `"Batata inglesa"', modify
            
            . label define cerais__id 108 `"Taro, manfafa"', modify
            
            . label define cerais__id 109 `"Batata-doce"', modify
            
            . label define cerais__id 111 `"Farinhas de mandioca"', modify
            
            . 
            end of do-file
            
            . label define cerais__id 114 `"Açúcar (em pó ou cubos)"', modify
            
            . label define cerais__id 115 `"Mel"', modify
            
            . 
            end of do-file
            
            . label define cerais__id 129 `"Café"', modify
            
            . label define cerais__id 130 `"Chá"', modify
            
            . label define cerais__id 131 `"Chocolate em pó"', modify
            
            . label define cerais__id 133 `"Sumos de fruta (laranja, hibisco, cajú, etc.)"', modify
            
            . label define cerais__id 134 `"Água filtrada"', modify
            
            . label define cerais__id 137 `"Cervejas e Vinhos tradicional ( Vinho de caju)"', modify
            
            . label define cerais__id 138 `"Cervejas industriais"', modify
            
            . 
            end of do-file
            
            . 
            . save consumption.dta, replace
            file consumption.dta saved
            
            . 
            end of do-file
            
            .

            Comment


            • #7
              I am happy that your code works for you. However, it will leave all those

              Code:
              __000000.do
              files on disk, which is probably not what you want. Use tempfile instead of tempname to prevent this.

              One other potential problem here might arise because label save always adds option modify to the value label definitions. This might be dangerous when the same values are associated with different text/labels in different datasets. It does not seem to be the case here, but we could imagine situations where such overlaps might occur. If such overlaps occur, you would overwrite labels without even noticing. My elabel package (SSC) implements a variation of label save to overcome this problem. You can replace your label save commands with

              Code:
              * ssc install elabel
              elabel save cerais__id using "`mylabels'" , option(add)
              where option(add) specifies that, instead of modify, option add be added to the label define commands that are saved to disk.

              Best
              Daniel

              Comment


              • #8
                A comment on the potential problem of same values associated with different text/labels mentioned in #7:

                One strategy may be to read the different value labels definitions into a data file using -uselabel- repeatedly, then validate and resolve any conflicts:
                Code:
                * assuming 3 dta files: carne peixe leite
                
                local names carne peixe leite
                
                foreach fn in `names' {
                
                    use *_id in 1 using `fn' , clear
                    ds
                    assert 1 == `: word count  `r(varlist)' '
                    uselabel `: value label `r(varlist)' '
                    tempfile `fn'
                    save ``fn'' , replace
                    clear
                }    
                
                foreach fn in `names' {
                    
                    append using ``fn''
                }
                
                isid value label
                isid value     
                isid label
                Example record:
                Code:
                     +------------------------------------------+
                     |    lname   value           label   trunc |
                     |------------------------------------------|
                  1. | carne_id      23   Carne de vaca       0 |
                     +------------------------------------------+
                The new value label expression could then be build from the variables value and label after necessary corrections and validation.

                Comment


                • #9
                  Originally posted by Bjarte Aagnes View Post
                  A comment on the potential problem of same values associated with different text/labels mentioned in #7:

                  One strategy may be to read the different value labels definitions into a data file using -uselabel- repeatedly, then validate and resolve any conflicts
                  That might indeed be one strategy. Let me add some minor comments on the code.

                  Code:
                  ...
                  use *_id in 1 using `fn' , clear
                  ds
                  assert 1 == `: word count `r(varlist)' '
                  uselabel `: value label `r(varlist)' '
                  ...
                  could be simplified to

                  Code:
                  use *_id in 1 using `fn' , clear
                  assert c(k) == 1
                  uselabel // will use the one value label
                  while in

                  Code:
                  isid value label
                  isid value
                  isid label
                  the third condition might not always be desired but I would add a check for a single value label name; perhaps

                  Code:
                  assert lname==lname[1]
                  Originally posted by Bjarte Aagnes View Post
                  The new value label expression could then be build from the variables value and label after necessary corrections and validation.
                  I will use the opportunity to mention that elabel has a convenient subcommand to do this. Say, you store the appended files from uselabel as mylabel.dta. You define the value label(s) from that file typing

                  Code:
                  elabel load using mylabel
                  Best
                  Daniel
                  Last edited by daniel klein; 16 Apr 2019, 13:14.

                  Comment


                  • #10
                    Hi everyone,
                    I'm dealing with 10 data set that i want to append, in one variable, district I have the problem mention in #7. For each data set the variable district have same value label for different names. I tried the code:

                    Code:
                    use "dataset1.dta", clear
                    
                    tempname mylabels_dist // temporary do-file for value label definitions
                    elabel save distrito using "`mylabels_dist'", option(add) // save value lable definitions
                    label drop distrito // optionally drop the "renamed" label
                    
                    use "dataset2.dta", clear // master dataset
                    append "using dataset1.dta" // append
                    do "`mylabels_dist'" // modify the value label
                    And I get the error:

                    invalid attempt to modify label
                    r(180);
                    Can one help me with that, thanks.
                    Last edited by Jorge Mouco; 16 Dec 2020, 03:19.

                    Comment

                    Working...
                    X