Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reshape and (value) labels

    A recent item on the Wishlist for Stata 18 claims that reshape does not preserve (value) labels. The problem is not exactly clear from that post. However, a probably related problem was reported here.

    According to this FAQ, in earlier versions of Stata, reshape did not preserve (value) labels. New versions of Stata should do that and, as far as I am concerned, they do.

    I will use the example dataset from the linked FAQ:

    Code:
    clear
    input id year answer inc
    1 80 0 5000
    1 81 1 5500
    1 82 0 6000
    2 80 1 2000
    2 81 0 2200
    2 82 1 3300
    3 80 0 3000
    3 81 1 2000
    3 82 1 1000
    end
    
    label define answer 0 "Yes" 1 "No"
    label define year 80 "1980" 81 "1981" 82 "1982"
    label values answer answer
    label values year year
    label variable id "Identification"
    label variable year "Year of study"
    label variable answer "Answer to question"
    label variable inc "value of inc"

    Here is what I get with Stata/SE 17.0 for Windows (64-bit x86-64), update level 06 Apr 2022 (outdated) on Windows 11:

    Code:
    . describe
    
    Contains data
     Observations:             9                  
        Variables:             4                  
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Variable      Storage   Display    Value
        name         type    format    label      Variable label
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    id              float   %9.0g                 Identification
    year            float   %9.0g      year       Year of study
    answer          float   %9.0g      answer     Answer to question
    inc             float   %9.0g                 value of inc
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Sorted by: 
         Note: Dataset has changed since last saved.
    
    . label list
    year:
              80 1980
              81 1981
              82 1982
    answer:
               0 Yes
               1 No
    
    . reshape wide inc answer, i(id) j(year)
    (j = 80 81 82)
    
    Data                               Long   ->   Wide
    -----------------------------------------------------------------------------
    Number of observations                9   ->   3           
    Number of variables                   4   ->   7           
    j variable (3 values)              year   ->   (dropped)
    xij variables:
                                        inc   ->   inc80 inc81 inc82
                                     answer   ->   answer80 answer81 answer82
    -----------------------------------------------------------------------------
    
    . describe
    
    Contains data
     Observations:             3                  
        Variables:             7                  
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Variable      Storage   Display    Value
        name         type    format    label      Variable label
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    id              float   %9.0g                 Identification
    answer80        float   %9.0g      answer     80 answer
    inc80           float   %9.0g                 80 inc
    answer81        float   %9.0g      answer     81 answer
    inc81           float   %9.0g                 81 inc
    answer82        float   %9.0g      answer     82 answer
    inc82           float   %9.0g                 82 inc
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Sorted by: id
    
    . label list
    answer:
               0 Yes
               1 No
    
    . 
    end of do-file
    I cannot see any problem here.

    Now, let's reshape long. You may or may not know that reshape stores lots of information in characteristics. The information facilities quickly change between wide and long. We will get rid of all those characteristics and pretend that we have started with the wide layout.

    Code:
    mata : st_local("reshape_chars", invtokens(st_dir("char", "_dta", "*")'))
    foreach c of local reshape_chars {
        char define _dta[`c'] // void
    }

    If I now reshape long, again, I get

    Code:
    . reshape long inc answer , i(id) j(year)
    (j = 80 81 82)
    
    Data                               Wide   ->   Long
    -----------------------------------------------------------------------------
    Number of observations                3   ->   9           
    Number of variables                   7   ->   4           
    j variable (3 values)                     ->   year
    xij variables:
                          inc80 inc81 inc82   ->   inc
                 answer80 answer81 answer82   ->   answer
    -----------------------------------------------------------------------------
    
    . describe
    
    Contains data
     Observations:             9                  
        Variables:             4                  
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Variable      Storage   Display    Value
        name         type    format    label      Variable label
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    id              float   %9.0g                 Identification
    year            byte    %10.0g                
    answer          float   %9.0g      answer     
    inc             float   %9.0g                 
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Sorted by: id  year
         Note: Dataset has changed since last saved.
    
    . label list
    answer:
               0 Yes
               1 No
    
    . 
    end of do-file

    I cannot find any problems with (value) labels after reshape. Perhaps, I am missing the problem; maybe the problem only occurs in a specific edition of Stata and/or on a specific OS. I encourage everyone, but especially those claiming that there is a problem, to try and replicate my results. Please state the version, edition, and update level of Stata that you are using. Also, please state the OS you are on.

  • #2
    Your results are replicated on my setup: Stata/MP 17.0 for Windows (64-bit x86-64), revision 10 May 2022 (up to date), Windows 10 .

    Comment


    • #3
      Dear Daniel,

      The example you have provided works perfectly but this is not the issue.

      Have a look at this example:

      Code:
      sysuse auto, clear
      keep make mpg foreign
      Let's say we want to reshape by "foreign" which has a value label of 0 = Domestic and 1 = Foreign:

      Code:
      reshape wide mpg, i(make) j(foreign)

      After the reshape, the value label for "foreign" gets dropped. If it is still existed somewhere, it would be then easy to rename these variables or assign them variable labels.

      Currently what one needs to do is preserve the labels information somewhere (either locals or external files), do the reshape, and call back the information to "fix" the variables.
      So the suggestion is to let the variable labels stay wherever they are stored. Maybe as an option in the reshape command, for example:
      Code:
      reshape wide mpg, i(make) j(foreign) preservelabel

      Hope this clarifies the issue.
      Last edited by Asjad Naqvi; 31 May 2022, 11:32.

      Comment


      • #4
        This is a more specific description of what you seek. Thanks.

        You might want to type

        Code:
        char list
        after reshape. You will find, among other information,

        Code:
         _dta[__JValLab]:             0 `"Domestic"'  1 `"Foreign"'
        from which you could reconstruct the value label information.

        Unfortunately, the various characteristics are not documented and, thus, might be subject to change in the future.

        Comment

        Working...
        X