Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wrong Summary Statistics

    Hi,
    I have encountered a strange problem today.
    Stata 16.1 is giving me wrong summary statistics !!
    Seems like it can't identify minimum value?
    I get following statistics with "sum id esg" command but the minimum value is actually 1.02 !


    Variable Obs Mean Std. Dev. Min Max

    id 3,443 204.336 136.5416 1 517
    esg 3,443 46.75641 19.3107 .88 94.51

    Any idea what's happening here?
    I am attaching the file.
    Attached Files

  • #2
    What do you think is wrong here? The results for id or the results for esg?

    Comment


    • #3
      Sohan:
      it does not seem so:
      Code:
      . use "C:\Users\user\Downloads\stata-data.dta" 
      
      . sum
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
                id |      3,443     204.336    136.5416          1        517
              year |      3,443    2013.682    4.500988       2003       2019
               esg |      3,443    46.75641     19.3107        .88      94.51
      
      . tabstat id year esg,stat(N mean sd p50 min max)
      
         Stats |        id      year       esg
      ---------+------------------------------
             N |      3443      3443      3443
          Mean |   204.336  2013.682  46.75641
            SD |  136.5416  4.500988   19.3107
           p50 |       186      2015  46.45364
           Min |         1      2003       .88
           Max |       517      2019     94.51
      ----------------------------------------
      
      .
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        I get following statistics with "sum id esg" command but the minimum value is actually 1.02 !
        No.

        Code:
        use "C:\Users\user\Downloads\stata-data.dta" 
        list in 2554
        
              +------------------+
              |  id   year   esg |
              |------------------|
        2554. | 467   2018   .88 |
              +------------------+

        Comment


        • #5
          Just a meta- comment here.

          It is striking how often people think Stata is doing something wrong when the problem is actually that the data is wrong (or at least not as expected). I'm not saying Stata is completely bug-free; no large program is. But errors in data sets are extremely common, and bugs in professionally developed software are uncommon. Whenever one gets unexpected results, the first thought should be "what is wrong with my data," not "what is wrong with Stata [or any other commercial statistics package]?"

          Comment


          • #6
            Originally posted by Sergiy Radyakin View Post

            No.

            Code:
            use "C:\Users\user\Downloads\stata-data.dta"
            list in 2554
            
            +------------------+
            | id year esg |
            |------------------|
            2554. | 467 2018 .88 |
            +------------------+
            Thanks Sergiy.
            But the Min value of esg is actually 1.02.

            . list in 1622

            +-------------------+
            | id year esg |
            |-------------------|
            1622. | 303 2014 1.02 |
            +-------------------+

            Comment


            • #7
              Originally posted by Nick Cox View Post
              What do you think is wrong here? The results for id or the results for esg?
              Hi Nick,
              Its about ESG. Storage type and format are same for both year and esg. While the min value is correct for year, it is not correct for esg !

              sum

              Variable | Obs Mean Std. Dev. Min Max
              ------------- +-----------------------------------------------------------------
              id | 3,443 204.336 136.5416 1 517
              year | 3,443 2013.682 4.500988 2003 2019
              esg | 3,443 46.75641 19.3107 .88 94.51

              . list in 1622

              +-----------------------+
              | id year esg |
              |------------------------|
              1622. | 303 2014 1.02 |
              +-----------------------+

              Comment


              • #8
                Others have answered your question already. A value of 0.88 exists in observation 2554 so I don't,know why you continue to insist that that is not the minimum.

                As Clyde Schechter underlines, there are two possibilities, (1) you found a bug in a very often used command or (2) you're misunderstanding your data. I am not a betting person but if I were I know which way I would bet.

                Here is another way to see it. I downloaded your .dta file (many users would not do this: see FAQ Advice #12) and used extremes from SSC.

                Code:
                . extremes esg
                
                  +----------------+
                  |  obs:      esg |
                  |----------------|
                  | 2554.      .88 |
                  | 1622.     1.02 |
                  | 2145.     3.05 |
                  | 1466.     3.11 |
                  | 2400.   3.2475 |
                  +----------------+
                
                  +---------------+
                  | 2760.   92.69 |
                  | 1794.   93.51 |
                  | 3238.   93.72 |
                  | 2359.   93.91 |
                  | 2809.   94.51 |
                  +---------------+

                Comment


                • #9
                  sohan sust as #4 has already pointed out to you, you seem to be wrong and Stata is right. I'm not clear why you think 1.02 is the minimum value in your data, when you have been shown that 0.88 is the minimum.

                  To check this, just do
                  Code:
                  sort esg
                  and look at the first observation.

                  I am showing you the top 2 observations after this sort, with their original observation numbers. 1.02 is the second-lowest value, not the lowest.

                  Code:
                  gen obs = _n
                  sort esg
                  list obs id year esg in 1/2, noobs
                  
                    +--------------------------+
                    |  obs    id   year    esg |
                    |--------------------------|
                    | 2554   467   2018    .88 |
                    | 1622   303   2014   1.02 |
                    +--------------------------+
                  Last edited by Hemanshu Kumar; 16 Sep 2022, 03:58.

                  Comment


                  • #10
                    thanks. identified my mistake.

                    Comment


                    • #11
                      Sohan:
                      could you please explain your mistake for the benefit of those who may have the same/a similar issue in the future? Thanks.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        I am guessing here, but the reason for confusion could be that the value as listed appears as ".88", which is not a universally-accepted notation for 0.88. The leading dot is easily overlooked then.

                        There is a discussion here (https://math.stackexchange.com/quest...al-less-than-1) which goes into different contexts, but I myself had a shocker some 30 years ago to realize that I can omit that leading zero. I think the experience is very similar to the people who are used to have comma as a decimal delimiter, whichever the country or language.

                        Whether this is indeed the reason for the confusion I don't know, but in the posted data .88 is also the only value which is less than 1, so it can be easily overlooked as 88.

                        Comment

                        Working...
                        X