Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • import codebook output

    Hello.

    I am participating in a recruitment process where I have been given a dataset with around 370 variables and a txt with the output of the command codebook.

    The problem is that the dataset came without the variables' lables. However, the variables' names where included in the codebook output txt file they sent me. It seems like they had a dataset with the variables' labels, they ran the command codebook and then erased the variables' labels in order to send me.

    Is there a way I would import the codebook output in txt as a dictionary so that I can have the variables labeled in my dataset?

    Many thanks


    Below I provide an example of my dta and txt files:

    The dta
    Code:
    . des
    
    Contains data
      obs:           321                          
     vars:           377                          
     size:     1,429,413                          
    ---------------------------------------------------------------------------------------------------------
                  storage   display    value
    variable name   type    format     label      variable label
    ---------------------------------------------------------------------------------------------------------            
    Savsg1c6_1      str27   %27s                  
    Savsg1c6_2      str26   %26s                  
    Savsg1c6_3      str26   %26s                  
    Savsg1c6_4      byte    %8.0g                 
    Savsg1c6_5      byte    %8.0g                 
    Savsg1c6_6      byte    %8.0g                 
    Savsg1c6_7      byte    %8.0g                 
    Savsg1c6_8      byte    %8.0g

    The txt containing the output of the command codebook.
    Code:
    Savsg1c6_1                            Where would you save the amount of money? (#1/8)
    --------------------------------------------------------------------------------------
    
                      type:  numeric (byte)
                     label:  save
    
                     range:  [1,8]                        units:  1
             unique values:  7                        missing .:  110/231
    
                tabulation:  Freq.   Numeric  Label
                                10         1  In a safe place in my house
                                38         2  In a bank account
                                67         3  In my mobile money account
                                 2         4  In my Kibubu box
                                 1         5  In a chama
                                 2         7  In a VICOBA
                                 1         8  Other
                               110         .  
    
    --------------------------------------------------------------------------------------
    Savsg1c6_2                            Where would you save the amount of money? (#2/8)
    --------------------------------------------------------------------------------------
    
                      type:  numeric (byte)
                     label:  save
    
                     range:  [2,7]                        units:  1
             unique values:  4                        missing .:  218/231
    
                tabulation:  Freq.   Numeric  Label
                                 2         2  In a bank account
                                 9         3  In my mobile money account
                                 1         4  In my Kibubu box
                                 1         7  In a VICOBA
                               218         .  
    
    --------------------------------------------------------------------------------------
    Savsg1c6_3                            Where would you save the amount of money? (#3/8)
    --------------------------------------------------------------------------------------
    
                      type:  numeric (byte)
                     label:  save
    
                     range:  [3,3]                        units:  1
             unique values:  1                        missing .:  230/231
    
                tabulation:  Freq.   Numeric  Label
                                 1         3  In my mobile money account
                               230         .  
    
    --------------------------------------------------------------------------------------
    Savsg1c6_4                            Where would you save the amount of money? (#4/8)
    --------------------------------------------------------------------------------------
    
                      type:  numeric (byte)
                     label:  save
    
                     range:  [.,.]                        units:  .
             unique values:  0                        missing .:  231/231
    
                tabulation:  Freq.   Numeric  Label
                               231         .  
    
    --------------------------------------------------------------------------------------
    Savsg1c6_5                            Where would you save the amount of money? (#5/8)
    --------------------------------------------------------------------------------------
    
                      type:  numeric (byte)
                     label:  save
    
                     range:  [.,.]                        units:  .
             unique values:  0                        missing .:  231/231
    
                tabulation:  Freq.   Numeric  Label
                               231         .  
    
    --------------------------------------------------------------------------------------
    Savsg1c6_6                            Where would you save the amount of money? (#6/8)
    --------------------------------------------------------------------------------------
    
                      type:  numeric (byte)
                     label:  save
    
                     range:  [.,.]                        units:  .
             unique values:  0                        missing .:  231/231
    
                tabulation:  Freq.   Numeric  Label
                               231         .  
    
    --------------------------------------------------------------------------------------
    Savsg1c6_7                            Where would you save the amount of money? (#7/8)
    --------------------------------------------------------------------------------------
    
                      type:  numeric (byte)
                     label:  save
    
                     range:  [.,.]                        units:  .
             unique values:  0                        missing .:  231/231
    
                tabulation:  Freq.   Numeric  Label
                               231         .  
    
    --------------------------------------------------------------------------------------
    Savsg1c6_8                            Where would you save the amount of money? (#8/8)
    --------------------------------------------------------------------------------------
    
                      type:  numeric (byte)
                     label:  save
    
                     range:  [.,.]                        units:  .
             unique values:  0                        missing .:  231/231
    
                tabulation:  Freq.   Numeric  Label
                               231         .

  • #2
    I copied and pasted your codebook output into Stata as a single string variable. Then

    [CODE]
    replace var1 = itrim(var1)
    keep if strpos(var, "#")
    replace var1 = "label var " + var1
    replace var1 = subinstr(var1, word(var1, 4), `"""' + word(var1, 4), 1)
    replace var1 = var1 + `"""'
    [/CODE

    gives me text that could be used as a do-file.

    I imagine the # trick only works for your example, but treating text as string data often works for me.

    A more general trick is selecting lines given that the previous or following line is all dashes.
    Last edited by Nick Cox; 18 Jun 2017, 04:25.

    Comment


    • #3
      Many thanks, Nick!

      In the example I sent you the code would work because all variables have "#" in their labels. However, this symbol does not appear for other variables' labels. I sent another subset of my 370 variables below. Is there a way to keep the roles in between:

      Code:
      --------------------------------------------------------------------------------------
      
      --------------------------------------------------------------------------------------
      ?

      Code:
      var1
      --------------------------------------------------------------------------------------
      admin_infoid                                                   Enter surveyID (Enter )
      --------------------------------------------------------------------------------------
      type:  numeric (long)
      range:  [107,99714]                  units:  1
      unique values:  231                      missing .:  0/231
      mean:   47560.4
      std. dev:   31089.9
      percentiles:        10%       25%       50%       75%       90%
      7933     18595     44404     77000     93005
      --------------------------------------------------------------------------------------
      admin_infoa1                                                Date and Time of Interview
      --------------------------------------------------------------------------------------
      type:  numeric (double)
      range:  [1.790e+12,1.794e+12]        units:  10000
      unique values:  221                      missing .:  0/231
      mean:   1.8e+12
      std. dev:   6.7e+08
      percentiles:        10%       25%       50%       75%       90%
      1.8e+12   1.8e+12   1.8e+12   1.8e+12   1.8e+12
      --------------------------------------------------------------------------------------
      admin_infoa2                                                           Enumerator Name
      --------------------------------------------------------------------------------------
      type:  numeric (byte)
      label:  fo
      range:  [2,10]                       units:  1
      unique values:  6                        missing .:  0/231
      tabulation:  Freq.   Numeric  Label
      36         2  crlhr kfjhj
      39         3  gjvqjlc qyjruj
      37         7  yslhdx rcjipxa
      41         8  kdhwqcr odichomm
      33         9  jbggo sxpv
      45        10  hmvhrdw ziqxjx
      --------------------------------------------------------------------------------------
      admin_infoa3                                                               (unlabeled)
      --------------------------------------------------------------------------------------
      type:  string (str16)
      unique values:  6                        missing "":  0/231
      tabulation:  Freq.  Value
      39  "gjvqjlc qyjruj"
      45  "hmvhrdw ziqxjx"
      37  "yslhdx rcjipxa"
      36  "crlhr kfjhj"
      33  "jbggo sxpv"
      41  "kdhwqcr odichomm"
      warning:  variable has embedded blanks
      --------------------------------------------------------------------------------------
      admin_infoloc                                                                 Location
      --------------------------------------------------------------------------------------
      type:  numeric (byte)
      label:  location
      range:  [1,3]                        units:  1
      unique values:  3                        missing .:  0/231
      tabulation:  Freq.   Numeric  Label
      43         1  Dar es Salaam
      95         2  Arusha
      93         3  Dodoma
      --------------------------------------------------------------------------------------
      admin_infoconsent                        Did the respondent consent to be interviewed?
      --------------------------------------------------------------------------------------
      type:  numeric (byte)
      label:  yesno
      range:  [1,1]                        units:  1
      unique values:  1                        missing .:  0/231
      tabulation:  Freq.   Numeric  Label
      231         1  Yes
      --------------------------------------------------------------------------------------
      admin_infonoconsent             Please write the reason the respondent did not consent
      --------------------------------------------------------------------------------------
      type:  numeric (byte)
      range:  [.,.]                        units:  .
      unique values:  0                        missing .:  231/231
      tabulation:  Freq.  Value
      231  .
      --------------------------------------------------------------------------------------
      demob1                                                                 Respondent Name
      --------------------------------------------------------------------------------------
      type:  string (str31)
      unique values:  231                      missing "":  0/231
      examples:  "dgcqa cabocjun"
      "wbkei jpodnmxv plxaz"
      "mlzzzq rmmhwc xaufv"
      "ffcljiuk kxwtvr"
      warning:  variable has embedded blanks
      --------------------------------------------------------------------------------------
      demob2                                                               Respondent Gender
      --------------------------------------------------------------------------------------
      type:  numeric (byte)
      label:  gender
      range:  [0,1]                        units:  1
      unique values:  2                        missing .:  0/231
      tabulation:  Freq.   Numeric  Label
      114         0  Male
      117         1  Female
      --------------------------------------------------------------------------------------
      demob3                                                                  Respondent Age
      --------------------------------------------------------------------------------------
      type:  numeric (byte)
      range:  [19,74]                      units:  1
      unique values:  47                       missing .:  0/231
      mean:   35.9351
      std. dev:   11.2401
      percentiles:        10%       25%       50%       75%       90%
      22        27        35        42        51
      --------------------------------------------------------------------------------------
      demob4                                                                  Marital Status
      --------------------------------------------------------------------------------------
      type:  numeric (byte)
      label:  marital
      range:  [1,6]                        units:  1
      unique values:  6                        missing .:  0/231
      tabulation:  Freq.   Numeric  Label
      71         1  Single
      128         2  Married
      8         3  Divorced/Separated
      10         4  Cohabiting, but not married
      2         5  Relationship, but not cohabiting
      12         6  Widowed
      Last edited by sladmin; 18 Jun 2017, 21:15. Reason: Anonymize data.

      Comment


      • #4
        I already answered this, but to make it concrete you might select lines


        Code:
        keep if strpos(var1[_n-1], 42 * "*") & strpos(var1[_n+1], 42 * "*")
        42 is arbitrary here: we just need a sufficient condition.

        Comment


        • #5
          Many thanks Nick!

          So would it be possible to write a code which would use this output to label the respective variables in my dataset without copying and pasting this output to my do-file?


          Comment


          • #6
            This approach generates text you should copy into a do-file. Whether the original data are also in memory is up to you. Either way, running the do-file when the data are in memory seems the obvious next step. I don't understand the implication that it is problematic.

            Comment

            Working...
            X