Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with -reshape long- applying value label to j values (2023 version)

    Based on a similar discussion in Problem with -reshape long- applying value label to j values - Statalist, the j(newvar) variable after -reshape long- takes on strange value labels.

    The following is reduced dataset in long format after reshaping:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(panelid2 wave) str6 hhid str8 id float var
    1 1 "000064" "00006401"   .6229963
    1 2 "000064" "00006401"   .6187591
    1 3 "000064" "00006401"   .0485496
    1 4 "000064" "00006401"  .21651682
    1 5 "000064" "00006401"   .6109347
    2 1 "000064" "00006402"   .3473928
    2 2 "000064" "00006402" .007756131
    2 3 "000064" "00006402"   .6543282
    2 4 "000064" "00006402" .008100444
    2 5 "000064" "00006402"   .8569513
    3 1 "000214" "00021401"   .4994775
    3 2 "000214" "00021401"   .3270546
    3 3 "000214" "00021401"  .07871727
    3 4 "000214" "00021401"   .6700594
    3 5 "000214" "00021401"   .1014784
    4 1 "000214" "00021402"   .7776048
    4 2 "000214" "00021402"  .05866977
    4 3 "000214" "00021402"  .26950312
    4 4 "000214" "00021402"   .6614817
    4 5 "000214" "00021402"  .13838032
    end
    label values wave agecl
    label def agecl 1 "51-52", modify
    label def agecl 2 "53+", modify
    label def agecl 3 "tot", modify
    Before reshaping, the data looked like this:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str6 hhid str8 id float(var1 var2 var3 var4 var5) byte panelid2
    "000064" "00006401" .7488847 .4792806 .29489282  .789392  .3904327 1
    "000064" "00006402" .7396652  .409892   .474054 .4045655  .5660811 2
    "000214" "00021401"   .83459 .3540865  .9541889 .6604541  .9541239 3
    "000214" "00021402" .1537002 .8376353  .8663253 .2670887 .51788366 4
    end
    To which I applied the simple command, as I usually do:
    HTML Code:
    reshape long var, i(panelid2) j(wave)
    I do not understand why in the dataset above, value labels "51-52", "53+", and "tot" come from. These values are nowhere in my code, nor in my dataset. However, I now see that dataex reports the section:
    HTML Code:
    label values wave agecl
    label def agecl 1 "51-52", modify
    label def agecl 2 "53+", modify
    label def agecl 3 "tot", modify
    Which I did not write myself. This seems to be a bug in stata reshape.

  • #2
    Your problem does not reproduce. I start with your data extract and then type your reshape command, and I do not get any value labels.

    Code:
    . d
    
    Contains data
     Observations:            20                  
        Variables:             5                  
    ------------------------------------------------------------------------------------------------------------------------------
    Variable      Storage   Display    Value
        name         type    format    label      Variable label
    ------------------------------------------------------------------------------------------------------------------------------
    panelid2        byte    %8.0g                
    wave            byte    %10.0g                
    hhid            str6    %9s                  
    id              str8    %9s                  
    var             float   %9.0g                
    ------------------------------------------------------------------------------------------------------------------------------
    Sorted by: panelid2  wave
         Note: Dataset has changed since last saved.
    It is likely the labels were already in your Stata memory from some previous code you executed.

    You may want to try
    Code:
    label drop _all
    to eliminate any labels straggling in your Stata memory, or even just

    Code:
    clear

    Comment


    • #3
      Thanks. Executing #2 before reshape does not solve the issue. I tried
      HTML Code:
      set trace on
      and found the relevant section here:
      HTML Code:
           - $ReS_Call sort $ReS_i $ReS_j
            = version 17: sort panelid wave
            - }
            - restore, not
            - local isstr : char _dta[ReS_str]
            - local labn : char _dta[__JValLabName]
            - if "`labn'" != "" & `"`isstr'"' == "0" {
            = if "agecl" != "" & `"0"' == "0" {
            - local lab : char _dta[__JValLab]
            - capture label define `labn' `lab'
            = capture label define agecl  0 `"-50"'  1 `"51-52"'  2 `"53+"'  3 `"tot"' 
              -------------------------------------------------------------- begin label ---
              - version 10.0
              - local vv : display "version " string(_caller()) ", missing:"
              - gettoken val : 0
              - if (strpos("`val'", "val") > 0 ) {
              = if (strpos("define", "val") > 0 ) {
                gettoken val 0 : 0
                syntax anything [, nofix]
                if "`fix'" != "" {
                local fix ", nofix"
                }
                gettoken var rest : anything
                while `"`rest'"' != "" {
                gettoken lab rest : rest
                local label "`lab'"
                }
                local vlist : list anything - lab
                if "`lab'" == "." {
                local lab ""
                }
                foreach var of varlist `vlist' {
                `vv' _label `val' `var' `lab' `fix'
                }
                }
              - else {
              - `vv' _label `macval(0)'
              = version 5, missing: _label define agecl  0 `"-50"'  1 `"51-52"'  2 `"53+"'  
      > 3 `"tot"' 
              - }
              ---------------------------------------------------------------- end label ---
            - label values $ReS_j `labn'
            = label values wave agecl
      So I am puzzled where this issue comes from, because it is data I have generated from within Stata.
      Anyway, if there is no solution to this, relabeling the values manually solves the issue. Important to note is that this is only a labeling issue, not a data issue.

      Comment


      • #4
        The example in the other thread suggests that the problem comes up when a second reshape is attempted. Is that what is happening with you?

        Also, and this is certainly unsatisfactory and hack-y, but doing the following between the two reshapes will obviate the problem:

        Code:
        char _dta[__JValLabName]
        char _dta[__JValLab]
        char _dta[__JVarLab]

        Comment


        • #5
          I do not do two reshapes. however, maybe the data was reshaped previously, leaving some info inside the dataset somewhere.
          The variable label of "wave" also is named "coverscreen id of respondent" after the reshape. So the issue may be more complicated.
          Anyway, the issue lies in the data in such a case - I stop persuing correction from here onwards. Thanks for the help anyway.

          Comment


          • #6
            I would consider this a bug, and I think you should inform technical support.

            But in the meantime, could you try putting the commands in #4 at the top of your code and see if it fixes the problem?

            Comment


            • #7
              Yes, #4 works, also without -label drop _all-

              Comment

              Working...
              X