Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variable mean in panel data

    Hello everyone!
    I have a panel model - 14 individuals over a time period of 31 years.
    I need to perform some simple descriptive statistics: in particular, I need to observe the mean for each variable, specific to each individual (e.g. mean of variable A for individual 1, 2, 3 and so on).

    By typing the following command

    sum if id == 1

    I obtain the following output:

    ------------------------------------------------------------------------------------------------------------------------
    Variable | Obs Mean Std. Dev. Min Max
    -------------+---------------------------------------------------------
    Tab | 0
    id | 31 1 0 1 1
    t | 31 16 9.092121 1 31
    A | 31 205.4194 149.3472 2 432
    B | 31 300.4194 99.29444 126 412
    ---------------+---------------------------------------------------------
    C | 31 139 15.79029 114 165
    D | 31 257.4839 18.91185 216 282
    E | 31 150.7742 125.7905 7 389
    F | 31 111.8387 78.85011 8 236
    G | 31 180.4516 146.6683 3 403
    ---------------+---------------------------------------------------------
    H | 31 37.80645 25.08442 4 73
    I | 31 33.32258 22.42972 3 78
    --------------------------------------------------------------------------------------------------------------------------

    The same result is obtained also running the command

    bysort id: egen mean_A = mean(A)

    However, these results are not the actual mean.
    I can determin it by simply observing values in browse table, where, for instance, the mean value of variable B for individual 1 is around 5000.

    Is there a different way in order to get accurate results about variable means, specified for single individual?

    Thank you in advance!

  • #2
    I can assure you that Stata's computation of the mean is correct. What are your rules? Do you want missing values to be treated as zero values? That is a restriction, but not necessarily a correct one. See the variable "wanted2" below.

    Code:
    webuse grunfeld, clear
    keep company time invest
    keep if time<5
    keep if company<4
    replace invest=. if time<3
    bys company: egen wanted1= mean(invest)
    bys company: egen wanted2= mean(cond(!missing(invest), invest, 0))
    Res.:

    Code:
    . l, sepby(company)
    
         +---------------------------------------------+
         | company   invest   time   wanted1   wanted2 |
         |---------------------------------------------|
      1. |       1        .      1    334.15   167.075 |
      2. |       1        .      2    334.15   167.075 |
      3. |       1    410.6      3    334.15   167.075 |
      4. |       1    257.7      4    334.15   167.075 |
         |---------------------------------------------|
      5. |       2        .      1     366.1    183.05 |
      6. |       2        .      2     366.1    183.05 |
      7. |       2    469.9      3     366.1    183.05 |
      8. |       2    262.3      4     366.1    183.05 |
         |---------------------------------------------|
      9. |       3        .      1      60.9     30.45 |
     10. |       3        .      2      60.9     30.45 |
     11. |       3     77.2      3      60.9     30.45 |
     12. |       3     44.6      4      60.9     30.45 |
         +---------------------------------------------+
    Last edited by Andrew Musau; 05 May 2022, 15:36.

    Comment


    • #3
      Thank you for your answer and for your clarification about missing values. I didn’t think about it.

      My concern is due to the fact that, for example, my A variable is GDP, which is a very big number, with no missing values. However the final result of the mean is 205, as you can see.

      I have difficulties in figuring out how to obtain a more representative result.

      thank you again!

      Comment


      • #4
        Have you browsed through the data? The mean for variable "B" and id=1 is indicated as 300.4194. If you believe this not to be correct, copy and paste the result of the following:

        Code:
        dataex id t B if id==1

        Comment


        • #5
          Very weird results sometimes arise because data that arrived as string variables were put through encode when destring should have been used.
          ​​​​​​.

          Comment

          Working...
          X