Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • value label does not display text for one value

    In a generally well-documented and prepared data file (with labels, missing values, etc.), I find that for variables that share a particular value label, the label associated with value = 1 does not display.

    Code:
    . tab varbad
       Has a Bad |
           Label |      Freq.     Percent        Cum.
    -------------+-----------------------------------
               1 |         30       42.25       42.25
       only some |         34       47.89       90.14
      hardly any |          7        9.86      100.00
    -------------+-----------------------------------
           Total |         71      100.00
    I had assumed someone forgot to define a label for the value 1, but this is not the problem:
    Code:
    label list LABZ
    LABZ:
              .i iap
               1 a great deal
               2 only some
               3 hardly any
              .d dk
              .n na
    I thought that perhaps LABZ contained a bad character, so I inspected the text as extracted into a local, and it shows, as does -tab-, a blank label:
    Code:
    . local label1: label LABZ 1
    . di "`label1'"
    1
    Any guesses what might be wrong with this label? As it happens, this comes from the U.S. General Social Survey, so I'd like to figure this out and point out the problem to them, as I've seen it before in their files. I've attached a 100 observation sample for this variable.
    Attached Files
    Last edited by Mike Lacy; 08 Sep 2015, 16:45. Reason: uploaded wrong file

  • #2
    Was your original file (from the U.S. General Social Survey) a Stata 14 dataset? Because your uploaded file is, and as a Stata 13 user I can't read it. This led me to think that the problem could result from Stata 14 reading a Stata 13 dataset.

    I downloaded a recent release and it opened nicely in Stata 13.1. It contains a LABZ label, but with more values than yours shows, so I expect I did not get the same file as yours. Thus, the fact that in my setup your extract-and-display example yields the same result as the label list command does not actually contradict your experience on a different file.

    It would be good if you could test the original data on a system running, say, Stata13, or point us to the actual download URL (assuming you're not running with the sensitive data files) so others could start from the source.

    Comment


    • #3
      What strikes me when I look at this label (and, in fact, many of the labels in the file William points at) is that .i appears first in the output of label list. How can this happen? If we type

      Code:
      label define foo .i "dot i" 1 "one"
      label list foo
      we see that Stata sorts the values before storing them. The same thing is true if we use Mata

      Code:
      mata : st_vlmodify("bar", (.i, 1)', ("dot i", "one")')
      label list bar
      I am almost sure that this is what causes the problem. The .i in the first place messes up other things, too. Try

      Code:
      generate newvar:LABZ = .i
      tabulate newvar , missing
      Or

      Code:
      label define LABZ .i "dot i" , modify
      which will yield a set of value labels, where .i has two different texts assigned.

      I have no idea how the label was defined and stored in this way in the first place. I also have no idea why exactly it only seems to affect the text for .i and 1.

      I do have a work-around

      Code:
      quietly label dir
      local labels `r(names)'
      
      foreach lab of local labels {
          mata : st_vlload("`lab'", v = ., t = "")
          mata : st_vldrop("`lab'")
          mata : st_vlmodify("`lab'", v, t)
      }
      Best
      Daniel

      Comment


      • #4
        In Stata 13.1 I was able to reproduce Daniel's results. Specifically,
        • label list shows the label for .i out of order at the top
        • the tabulate example does not use the label for .i
        • if I replace some of the .i values with 1 the tabulate example uses the label for 1, unlike Mike's experience
        • label define LABZ .i "dot i" , modify creates a label with two entries for .i, the second of which is used when I run the tabulate example

        Comment


        • #5
          Sorry to be slow to follow up; my day job got in the way. Anyway, I'm virtually certain I noticed this same thing last with the GSS data files loaded under Stata 13, before I even had Stata 14. I will have to check this, which I won't be able to do until next week, as I'm away from the relevant machine. However, if this is right, maybe the original problem is not a v.13 vs. v.14 issue. Regards, Mike
          Last edited by Mike Lacy; 09 Sep 2015, 21:00.

          Comment


          • #6
            This is possibly a problem with the GSS data. I emailed Stata support about the issue and they suggested a solution. I then contacted NORC about the issue and they acknowledged the problem with no time frame to fix it.

            That said, you can read through the correspondence and see the solution I used (suggested by Stata support) that might be help full for others. Here is my email to NORC:
            Code:
            To whom it may concern,  
              I downloaded the cumultive 1972-2014 GSS in Stata format from http://www3.norc.org/GSS+Website/Dow...ionunpublished and noticed that some of the variables have a problem with value labels.
              
              You can see from the Stata code below the issue with the variable conpress:
              
               . /* Show the problem using conpress*/
              . describe conpress
               
                            storage   display    value
              variable name   type    format     label      variable label
              -----------------------------------------------------------------------------------------------------------------------
              conpress        byte    %8.0g      LABBJ      confidence in press
               
              . label list LABBJ
              LABBJ:
                        .i iap
                         1 a great deal
                         2 only some
                         3 hardly any
                        .d dk
                        .n na
               
              . tabulate conpress, missing
               
                confidence |
                  in press |      Freq.     Percent        Cum.
              -------------+-----------------------------------
                         1 |      6,265       10.51       10.51
                 only some |     21,127       35.45       45.96
                hardly any |     12,210       20.49       66.45
                        dk |        819        1.37       67.82
                        .i |     19,063       31.99       99.81
                        na |        115        0.19      100.00
              -------------+-----------------------------------
                     Total |     59,599      100.00
               
             
              You can see that the value label is correctly created but only partially attached to the variable.  
              
              I contacted Stata technical support, thinking this is a Stata 14 issue, but it also occurs in Stata 13.  Their response was:
              
            "Was this dataset originally created in Stata, or was it transferred from another software package? Sometimes value labels will get dropped from values if the transfer to Stata doesn't match up 100%. It is difficult to know where the matching of value labels fails since I don't know the original source of the dataset, ..."
            In the interim, I solved the issue by writing out the labels and recreating them. Here is what it looks like with the variable conpress: . /* Fix the problem for conpress */ . label save LABBJ using labels, replace (note: file labels.do not found) file labels.do saved . label drop LABBJ . do labels.do . label define LABBJ .i `"iap"', modify . label define LABBJ 1 `"a great deal"', modify . label define LABBJ 2 `"only some"', modify . label define LABBJ 3 `"hardly any"', modify . label define LABBJ .d `"dk"', modify . label define LABBJ .n `"na"', modify . end of do-file . label values conpress LABBJ . tabulate conpress, missing confidence | in press | Freq. Percent Cum. -------------+----------------------------------- a great deal | 6,265 10.51 10.51 only some | 21,127 35.45 45.96 hardly any | 12,210 20.49 66.45 dk | 819 1.37 67.82 iap | 19,063 31.99 99.81 na | 115 0.19 100.00 -------------+----------------------------------- Total | 59,599 100.00 . erase labels.do And this is a more general solution: *** Stata code with a general solution *** label save using labels, replace label drop _all do labels.do erase labels.do I thought you would be interested in knowing about the issue and seeking a solution. Please let me know if you find out if the problem is in the production of the dataset. Sincerely, Alan Neustadtl *** Stata code *** /* Show the problem using conpress*/ describe conpress label list LABBJ tabulate conpress, missing /* Fix the problem for conpress */ label save LABBJ using labels, replace label drop LABBJ do labels.do label values conpress LABBJ tabulate conpress, missing erase labels.do *** Stata code with a general solution *** label save using labels, replace label drop _all do labels.do erase labels.do
            And this is the reply from NORC:
            Code:
            Dear GSS user:
            Thank you for reporting a problem (and even suggesting a solution). This problem occurs when we convert the SPSS format to the Stata format. We are also thinking of the similar solution you suggested, but, unfortunately, we do not have an immediate plan to implement it, as we are tied to many other things. In the meantime, please refer to the Codebook, questionnaires, release notes, or the SPSS-formatted data for correct (complete) labels.
             
            Best,
             
            General Social Survey (GSS)
            NORC at the University of Chicago
            1155 E. 60th St. 2nd Floor, Chicago, IL 60637
            [email protected] | gss.norc.org
            Best,
            Alan Neustadtl

            Comment

            Working...
            X