Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Changing variable labels using another dataset

    Hi All,

    I need to change the labels of the following variables using another file:

    Variables in the first file are:

    v1: 1A1a1LV ab_dv
    v2: 1A1a2IM ab_dv
    v3: 1A1a3LV ab_dv


    The second file:

    +-------------------------------+
    | ab_id ab_name |
    |-------------------------------|
    1. | 1A1a1 //Oral Comprehension |
    2. | 1A1a1 //Oral Comprehension |
    3. | 1A1a2 //Written Comprehension |
    4. | 1A1a2 //Written Comprehension |
    5. | 1A1a3 //Oral Expression |
    |-------------------------------|
    6. | 1A1a3 //Oral Expression |
    7. | 1A1a4 //Written Expression |
    8. | 1A1a4 //Written Expression |
    9. | 1A1b1 //Fluency of Ideas |
    10. | 1A1b1 //Fluency of Ideas |
    +-------------------------------+



    The values in the second column of the second file (ab_name) should be the labels of the variables in the first file. Variable names in the first file begin with values in the first column of the second file (ab_id).

    For example the label for v1 should be Oral Comprehension since it begins with 1A1a1 and the label for v2 should be Written Comprehension since it begins with 1A1a2IM

    Is there a way to do that? I appreciate your help with it.
    Last edited by Monica Muller; 07 Jul 2015, 13:16.

  • #2
    The description of your second dataset seems pretty clear (although it would be even better if you use code delimiters). Are there only ten observations, or is this just a part of the dataset?

    Please clarify what you mean by this notation

    v1: 1A1a1LV ab_dv
    Is the variables' name v1? Is its current label "1A1a1LV ab_dv"? Or does this notation mean something different? If so, what does it mean?

    Best
    Daniel

    Comment


    • #3
      Sorry if I wasn't clear. These are just part of my data. I have about 200,000 observations.
      By v1: 1A1a1LV ab_dv I mean 1A1a1LV ab_dv is the current variable label for v1.

      Comment


      • #4
        Here is one approach

        Code:
        /*
            we start with the second dataset
        */
        
        u second_dataset
        
        /*
            we combine the relevant information ...
        */
        
        g id_name = ab_id + char(32) + ab_name
        
        /*
            and get rid of the double-slashes (//)
        */
        
        replace id_name = subinstr(id_name, "//", "", 1)
        
        /*
            now we put the values
            (i.e. id and labels) into a local
            
            in local id_name each element has the form
                <id><space><label>
        */
        
        duplicates drop id_name , force
        qui levelsof id_name , l(id_name)
        
        /*
            next we load the first dataset
        */
        
        u first_dataset , clear
        
        /*
            now we loop through the local id_name
            
            remember, each element has the form
                <id><space><label>
                
            we select all variables that have a
            variable label starting with <id>
            (i.e. we assume that any variable that
            has a variable label starting with <id>
            and followed by anything is supposed to
            get <label> as its new label attached)
            
            we then loop through these variables
            and apply <label> as the new variable label
        */
        
        foreach el of loc is_names {
            loc id : word 1 of `el'
            ds , has(varl `id'*)
            loc vars `r(varlist)'
            
            foreach v of loc varlist {
                loc label : subinstr loc el "`id " ""
                la var `v' `"`label'"'
            }
        }
        (code is not tested)

        Best
        Daniel
        Last edited by daniel klein; 07 Jul 2015, 13:56.

        Comment


        • #5
          Hi Daniel,

          Thanks for taking the time to answer my question. Unfortunately it doesn't work on my computer. What's the has function you used in ds, has(var1 `id'*). And I have about 1500 variables, you put var1 in the formula but how about the rest of the variables? Sorry, if my questions are stupid. I am very new to Stata.

          Thanks

          Comment


          • #6
            Monica,

            please copy and paste my complete code. Note that contrary to what you think, I have not written

            ds, has(var1 `id'*).
            but

            Code:
            ds , has(varl `id'*)
            There is no 1 (i.e. number 1) but an l (i.e. an lowercase L) and varl is an abbreviation for varlabel.

            Since I am using local macros, please do also make sure to execute the complete code at once.

            For phrases like

            it doesn't work on my computer.
            please review the FAQs again, especially section 12. Please explain what exactly you typed, and what exactly Stata responded. Use code delimiters, as I did, to show what you typed and what you got back from Stata.

            What's the has function you used in ds, has(var1 `id'*)
            has in the above is called an option not function in Stata (I am stressing this, because it will make communication easier and increases your chances of helpful replies if the correct terms are used). You can read about that option if you type

            Code:
            help ds
            Here is an toy example illustrating the point

            Code:
            sysuse nlsw88 , clear
            ds , has(varlabel l*)
            Note that ds finds all three variables with variable labels starting with an lowercase l.

            I am using this very same approach. ds will find out of your 1500 variables all those with variable labels starting with, say1A1a1 the first time through the loop. These will then receive the appropriate new label. ds will then move on to the next id, say 1A1a3, and repeat the process.

            Best
            Daniel
            Last edited by daniel klein; 08 Jul 2015, 00:55.

            Comment


            • #7
              Hi Daniel,
              Thank you very much for your thorough response. It makes sense now. I really appreciate it.

              Comment


              • #8
                Dear Daniel,

                I tried the code but nothing happened. The labels are exactly what they were before. Here is the code I used:

                Code:
                use secondfile, replace
                gen id_names = ab_ei+ char(32)+ab_en
                quietly levelsof id_names, local (id_names)
                
                use firstfile, replace
                foreach element of local id_names {
                    local id : word 1 of `element'
                    ds , has(varl `id'*)
                    local vars `r(varlist)'
                    
                    foreach v of local varlist {
                        local label : subinstr local element "`id'" ""
                        label var `v' `"`label'"'
                    }
                }
                Do you mind taking a look and see where I am going wrong? Thanks a lot.
                p.s. the double slashes in my first post do not exist in the real files. I used them here to separate two columns. So, I didn't used that part of your code.

                Comment


                • #9
                  Hm, what exactly do you mean by

                  but nothing happened
                  Did Stata respond nothing at all?

                  Could you provide a sample of both your datasets? If not, please show the output of

                  Code:
                  use secondfile , clear
                  d , s
                  l ab_ei ab_n in 1/5
                  
                  use firstfile , clear
                  d
                  Best
                  Daniel

                  Comment


                  • #10

                    Hi Daniel,

                    After running that code, Stata did not give me any errors. But the variable labels did not change at all. The variables had exactly the same labels as they had before running the code . It think something in this part of the code doesn't work :
                    Code:
                    foreach v of local varlist {
                            local label : subinstr local element "`id'" ""
                            label var `v' `"`label'"'
                    Here is the second file which only has two columns of id and name:

                    Code:
                     obs:            53                          
                     vars:             2                          8 Jul 2015 19:56
                     size:         1,908                          
                    Sorted by:  ab_ei
                    
                    
                         +-----------------------------------+
                         |     ab_ei                   ab_en |
                         |-----------------------------------|
                      1. |                                   |
                      2. | 1.A.1.a.1      Oral Comprehension |
                      3. | 1.A.1.a.2   Written Comprehension |
                      4. | 1.A.1.a.3         Oral Expression |
                      5. | 1.A.1.a.4      Written Expression |
                         +-----------------------------------+
                    Here is a small part of the first file which is the file I need to change the labels of its variables:

                    Code:
                                        
                        storage    display    value
                    variable name    type    format    label    variable label
                                        
                    
                    ab_dv1A1a1IM    float    %9.0g        1A1a1IM ab_dv
                    ab_se1A1a1IM    float    %9.0g        1A1a1IM ab_se
                    ab_lb1A1a1IM    float    %9.0g        1A1a1IM ab_lb
                    ab_ub1A1a1IM    float    %9.0g        1A1a1IM ab_ub
                    ab_dv1A1a1LV    float    %9.0g        1A1a1LV ab_dv
                    ab_se1A1a1LV    float    %9.0g        1A1a1LV ab_se
                    ab_lb1A1a1LV    float    %9.0g        1A1a1LV ab_lb
                    ab_ub1A1a1LV    float    %9.0g        1A1a1LV ab_ub
                    ab_dv1A1a2IM    float    %9.0g        1A1a2IM ab_dv
                    ab_se1A1a2IM    float    %9.0g        1A1a2IM ab_se
                    ab_lb1A1a2IM    float    %9.0g        1A1a2IM ab_lb
                    ab_ub1A1a2IM    float    %9.0g        1A1a2IM ab_ub
                    Thank you very much

                    Comment


                    • #11
                      Well, I am not surprised. Since we are working with strings (text) here, we need to be very precise. For example, 1A1a1 might look very similar to 1.A.1.a.1 to you, but to Stata these are completely different things. You have never mentioned that there were dots in ab_ei in the second file. Instead you have inserted slashes (//) when there were none. The takeaway message here is you need to post much more carefully in the future, especially talking about strings.

                      Here is the tweak that should make it work:

                      Code:
                      use secondfile , clear
                      g ab_ei2 = strtrim(stritrim(subinstr(ab_ei, ".", "", .)))
                      gen id_names = ab_ei + char(32) + ab_en
                      quietly levelsof id_names , local(id_names)
                      
                      use firstfile , clear
                      foreach element of local id_names {
                          local id : word 1 of `element'
                          quietly ds , has(varlabel `id'*)
                          local vars `r(varlist)'
                          
                          foreach v of loc varlist {
                              local label : subinstr local element "`id'" ""
                              label variable `v' `"`label'"'
                          }
                      }
                      Best
                      Daniel

                      Comment


                      • #12
                        Oh, sorry, sorry! I already dropped the dots in my file. My bad. Here is the code I used to drop the dots:

                        Code:
                         drop if _n==1
                        ** the first line is because the first row is empty
                        replace ab_ei=subinstr(ab_ei,".","",.)
                        because I run the whole code including these edits and don't save on the original file, when I only ran the description code my result still showed the dots. Sorry about that. Here is the file I worked on:
                        Code:
                            +-------------------------------+
                            ab_ei                   ab_en 
                            -------------------------------
                        1.    1A1a1      Oral Comprehension 
                        2.    1A1a2   Written Comprehension 
                        3.    1A1a3         Oral Expression 
                        4.    1A1a4      Written Expression 
                        5.    1A1b1        Fluency of Ideas 
                            +-------------------------------+
                        What I said in the previous post is based on the correct file.
                        Last edited by Monica Muller; 09 Jul 2015, 09:47.

                        Comment


                        • #13
                          Typo on my part.

                          Code:
                          foreach v of local varlist
                          should be

                          Code:
                          foreach v of local vars
                          Alternatively you can change

                          Code:
                          local vars `r(varlist)'
                          to

                          Code:
                          local varlist `r(varlist)'
                          Best
                          Daniel


                          Edit

                          Note that such things tend to happen because my programming style is not exactly "clean". I tend to abbreviate a lot, use (very) short names for locals etc. writing my (a)do-files. That is fine if I review my code, but tends to confuse others (e.g. #6 above). When posting to the list, however, I try not to write so cryptic (but see the abbreviation g for generate in #11 for an example of how hard it is to change styles).
                          Last edited by daniel klein; 09 Jul 2015, 10:10.

                          Comment


                          • #14
                            Thanks for the explanation. I could figure out your abbreviations. My advisor uses the same abbreviations so I am used to that.
                            But even with the edits the labels are still the same
                            Last edited by Monica Muller; 09 Jul 2015, 10:27.

                            Comment


                            • #15
                              I am sorry, you need to provide more information. I cannot reproduce this problem. This works perfect for me

                              Code:
                              cd c:/ado
                              
                              capture use firstfile.dta
                              if !(_rc) {
                                  exit
                              }
                              capture use secondfile.dta
                              if !(_rc) {
                                  exit
                              }
                              
                              clear
                              
                              input str5 ab_ei str7 ab_en
                              "1A1a1" "foo"
                              "1A1a2" "bar"
                              "1A1a3" "foo bar"
                              end
                              
                              list
                              
                              save secondfile.dta
                              
                              clear
                              
                              input float ab_dv1A1a1IM float ab_dv1A1a2IM float ab_dv1A1a3IM
                              42 23 4223
                              end
                              
                              label variable ab_dv1A1a1IM "1A1a1IM ab_dv"
                              label variable ab_dv1A1a2IM "1A1a2IM ab_dv"
                              label variable ab_dv1A1a3IM "1A1a3IM ab_dv"
                              
                              describe
                              
                              save firstfile.dta
                              
                              use secondfile.dta , clear
                              generate id_label = ab_ei + char(32) + ab_en
                              quietly levelsof id_label , local(id_label)
                              
                              use firstfile.dta , clear
                              describe
                              
                              foreach x of local id_label {
                                  local id : word 1 of `x'
                                  quietly ds , has(varlabel `id'*)
                                  local varlist `r(varlist)'
                                  
                                  foreach var of local varlist {
                                      local label : subinstr local x "`id'" ""
                                      label variable `var' `"`label'"'
                                  }
                              }
                              
                              describe
                              
                              erase firstfile.dta
                              erase secondfile.dta
                              and gives

                              Code:
                              . cd c:/ado
                              c:\ado
                              
                              .
                              . capture use firstfile.dta
                              
                              . if !(_rc) {
                              .         exit
                              . }
                              
                              . capture use secondfile.dta
                              
                              . if !(_rc) {
                              .         exit
                              . }
                              
                              .
                              . clear
                              
                              .
                              . input str5 ab_ei str7 ab_en
                              
                                       ab_ei      ab_en
                                1. "1A1a1" "foo"
                                2. "1A1a2" "bar"
                                3. "1A1a3" "foo bar"
                                4. end
                              
                              .
                              . list
                              
                                   +-----------------+
                                   | ab_ei     ab_en |
                                   |-----------------|
                                1. | 1A1a1       foo |
                                2. | 1A1a2       bar |
                                3. | 1A1a3   foo bar |
                                   +-----------------+
                              
                              .
                              . save secondfile.dta
                              file secondfile.dta saved
                              
                              .
                              . clear
                              
                              .
                              . input float ab_dv1A1a1IM float ab_dv1A1a2IM float ab_dv1A1a3IM
                              
                                   ab_dv~1IM  ab_dv~2IM  ab_dv~3IM
                                1. 42 23 4223
                                2. end
                              
                              .
                              . label variable ab_dv1A1a1IM "1A1a1IM ab_dv"
                              
                              . label variable ab_dv1A1a2IM "1A1a2IM ab_dv"
                              
                              . label variable ab_dv1A1a3IM "1A1a3IM ab_dv"
                              
                              .
                              . describe
                              
                              Contains data
                                obs:             1                          
                               vars:             3                          
                               size:            12                          
                              --------------------------------------------------------------------------------------------------------------------------------------
                                            storage  display     value
                              variable name   type   format      label      variable label
                              --------------------------------------------------------------------------------------------------------------------------------------
                              ab_dv1A1a1IM    float  %9.0g                  1A1a1IM ab_dv
                              ab_dv1A1a2IM    float  %9.0g                  1A1a2IM ab_dv
                              ab_dv1A1a3IM    float  %9.0g                  1A1a3IM ab_dv
                              --------------------------------------------------------------------------------------------------------------------------------------
                              Sorted by:  
                                   Note:  dataset has changed since last saved
                              
                              .
                              . save firstfile.dta
                              file firstfile.dta saved
                              
                              .
                              . use secondfile.dta , clear
                              
                              . generate id_label = ab_ei + char(32) + ab_en
                              
                              . quietly levelsof id_label , local(id_label)
                              
                              .
                              . use firstfile.dta , clear
                              
                              . describe
                              
                              Contains data from firstfile.dta
                                obs:             1                          
                               vars:             3                          9 Jul 2015 19:37
                               size:            12                          
                              --------------------------------------------------------------------------------------------------------------------------------------
                                            storage  display     value
                              variable name   type   format      label      variable label
                              --------------------------------------------------------------------------------------------------------------------------------------
                              ab_dv1A1a1IM    float  %9.0g                  1A1a1IM ab_dv
                              ab_dv1A1a2IM    float  %9.0g                  1A1a2IM ab_dv
                              ab_dv1A1a3IM    float  %9.0g                  1A1a3IM ab_dv
                              --------------------------------------------------------------------------------------------------------------------------------------
                              Sorted by:  
                              
                              .
                              . foreach x of local id_label {
                                2.         local id : word 1 of `x'
                                3.         quietly ds , has(varlabel `id'*)
                                4.         local varlist `r(varlist)'
                                5.        
                              .         foreach var of local varlist {
                                6.                 local label : subinstr local x "`id'" ""
                                7.                 label variable `var' `"`label'"'
                                8.         }
                                9. }
                              
                              .
                              . describe
                              
                              Contains data from firstfile.dta
                                obs:             1                          
                               vars:             3                          9 Jul 2015 19:37
                               size:            12                          
                              --------------------------------------------------------------------------------------------------------------------------------------
                                            storage  display     value
                              variable name   type   format      label      variable label
                              --------------------------------------------------------------------------------------------------------------------------------------
                              ab_dv1A1a1IM    float  %9.0g                   foo
                              ab_dv1A1a2IM    float  %9.0g                   bar
                              ab_dv1A1a3IM    float  %9.0g                   foo bar
                              --------------------------------------------------------------------------------------------------------------------------------------
                              Sorted by:  
                              
                              .
                              . erase firstfile.dta
                              
                              . erase secondfile.dta
                              
                              .
                              end of do-file
                              Best
                              Daniel
                              Last edited by daniel klein; 09 Jul 2015, 12:02.

                              Comment

                              Working...
                              X