Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • the max value of a variable doesn't appear in the summary stats but it is in the dataset

    Dear Statalist,

    I am seeking your help.
    I have a variable that I call Var1 for the purpose of this post.
    The maximum value of Var1 is 56 in the dataset, but when I compute the summary statistics
    Stata displays 48. What am I doing wrongly?

    Thank you.

    Chwen Chwen

    --------------------

    Code

    desc Var1

    Variable Storage Display Value
    name type format label Variable label
    ------------------------------------------------------------------------------------------------------------------------
    Var1 long %8.0g


    summarize Var1

    Variable | Obs Mean Std. dev. Min Max
    -------------+---------------------------------------------------------
    Var1 | 2,375 9.389895 12.50568 1 48



  • #2
    can you find the 56 in data view mode? (sort var1)

    Comment


    • #3
      Hi George,

      yes, I can see the value 56 in data view mode. My question was why I couldn't see that value in the summary stats.

      After several trials, I finally found out that using the real function, as shown in the line below, resolves the issue.

      gen new_Var1=real(Var1)

      Indeed computing the summary stats on new_Var1 Stata displays the correct max value.



      Comment


      • #4
        #3 is not a coherent explanation -- because if Var1 was a string variable as implied by your code in #3 it would have not been possible to get summarize Var1 to work at all as reported in #1 Indeed your describe result in #1 shows that Var1 was then ,a long variable, and so that it was numeric.

        Something else is going on. Perhaps you are moving back and forth between different versions of the same dataset. Or simplifying a more complicated history for us in the belief that some details are not important. Either way, the story is still contradictory and although you're reporting success it's possible that you have mangled your data with good intentions.

        Note that it is especially dangerous to encode string variables to numeric if that was done some point in the past More on that at https://journals.sagepub.com/doi/pdf...867X1801800413

        Comment


        • #5
          Originally posted by Nick Cox View Post
          Note that it is especially dangerous to encode string variables to numeric if that was done some point in the past
          Given just how often people get confused over encode and destring, I wonder whether encode would benefit from issuing a note when used on variables that contain only numeric characters; something like:
          Code:
          encode stringvar , generate(numvar)
          note: stringvar contains only numeric characters; consider using destring
          where the word destring should link to the help file (or to a section in the help that explains the difference between encode and destring).

          Comment

          Working...
          X