Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtsum unbalanced panels

    Hi everyone,

    I am new in Statalist. Thanks in advance for helping me!

    I would like to ask you about the interpretation of the following xtsum table. My sample contains 3449 firms and T = 12 time periods. The panel is unbalanced since a firm that exit the sample never reenters; moreover a firm exit the sample because it makes no sales in a given year (possible attrition bias). Finally firms are observed starting from different time periods: it could be that a firm enters the sample in t=5, in t=1 or whatever.
    What I do not understand is the reason why N, n and T are different for each variable in xtsum. Is it simply due to the unbalancedness of the panel?

    P.s. in order to give complete info, I also copy paste the result from STATA directly.

    Code:
    Variable         |      Mean   Std. Dev.       Min        Max |    Observations
    -----------------+--------------------------------------------+----------------
    t        overall |  6.765197   3.439797          1         12 |     N =   28360
             between |             2.449812        1.5         12 |     n =    3449
             within  |             3.089224   1.265197    12.2652 | T-bar = 8.22267
                     |                                            |
    id_firm  overall |  3253.188   2150.782          1      10070 |     N =   28360
             between |              2258.51          1      10070 |     n =    3449
             within  |                    0   3253.188   3253.188 | T-bar = 8.22267
                     |                                            |
    salesf~m overall |  69405.53   637158.5   .0100969   1.91e+07 |     N =   27542
             between |             496200.5   .0101238   1.44e+07 |     n =    3449
             within  |             194191.3   -8507032    5134997 | T-bar =  7.9855
                     |                                            |
    age      overall |  27.27154   23.67263          1        107 |     N =   28360
             between |             22.49973          1      101.5 |     n =    3449
             within  |             3.089224   21.77154   32.77154 | T-bar = 8.22267
                     |                                            |
    newmol   overall |  .0422916   .2012575          0          1 |     N =   25537
             between |             .1127242          0          1 |     n =    3381
             within  |             .1647175  -.8667993   .9513825 | T-bar = 7.55309
                     |                                            |
    newmol~m overall |  .2407753    .427563          0          1 |     N =   24093
             between |              .300853          0          1 |     n =    3050
             within  |             .2978449  -.6683156   1.149866 | T-bar = 7.89934
                     |                                            |
    number   overall |  27.82434    94.3868          1       2045 |     N =   27542
             between |             77.40525          1   1948.333 |     n =    3449
             within  |             12.00478  -275.1757   331.8243 | T-bar =  7.9855
                     |                                            |
    inflow   overall |   1.75221   6.222528          0        199 |     N =   24093
             between |             4.839961          0   104.7273 |     n =    3050
             within  |             2.978054  -41.24779   116.3886 | T-bar = 7.89934
                     |                                            |
    salesi~t overall |  686.9673   8009.415          0   494231.2 |     N =   24093
             between |             4221.593          0   95303.28 |     n =    3050
             within  |               6283.8  -84693.08   431932.1 | T-bar = 7.89934
                     |                                            |
    outflow  overall |  1.793176   7.088856          0        259 |     N =   24093
             between |             5.553401          0        136 |     n =    3050
             within  |             2.953235  -58.75228   124.7932 | T-bar = 7.89934
                     |                                            |
    atcmain  overall |  .6957304   .2972208   .0494371          1 |     N =   27542
             between |             .2690053   .0744431          1 |     n =    3449
             within  |             .0960921   .2015287   1.431768 | T-bar =  7.9855
                     |                                            |
    deathf~m overall | -14.92422   35.46399       -100          0 |     N =   28360
             between |             29.92316      -99.5      -8.25 |     n =    3449
             within  |             31.30806  -105.6742   51.40911 | T-bar = 8.22267
                     |                                            |
    lratio   overall | -.0295049   .5119096  -2.989596   1.998053 |     N =   24093
             between |             .4940006  -2.956451   1.974946 |     n =    3050
             within  |             .4153589  -2.802703   2.688072 | T-bar = 7.89934
                     |                                            |
    nrecall  overall |  .1480285   .9719849          0         59 |     N =   27542
             between |             .5405977          0      21.75 |     n =    3449
             within  |             .7198124  -21.60197   53.23136 | T-bar =  7.9855
                     |                                            |
    s_it     overall |  .9711566   .1673693          0          1 |     N =   28360
             between |             .1424335         .5          1 |     n =    3449
             within  |             .1435737   .0544899   1.471157 | T-bar = 8.22267
    Thank you,

    Federico
    Attached Files

  • #2
    The number of observations differs because each of your variables has a different number of missing values, and Stata excludes observations with missing values from its calculations.

    The misstable summarize command will report counts of missing values for each variable. See the output of help misstable for more details and other reporting options.

    Added in edit: by "missing values" I mean observations in the data where the value of the variable contains a Stata missing value code, not observations that are entirely omitted from the (unbalanced) data.

    Comment


    • #3
      Many thanks William! So can I correctly say that this is not an issue (it is normal in unbalanced panel data)?

      Comment


      • #4
        Without knowing why the missing values are missing, I cannot comment on whether it is an issue or not.

        Of your 28,360 observations overall, 4,267 observations have missing values for inflow, outflow, and several other variables - I'd wager they're the same 4,267 observations.

        Whether or not the data is panel data, you need to understand why these values are missing and what the implications are for your analysis.

        Are these observations that you say "exit the sample because it makes no sales in a given year"? I believe it is customary to exclude from the dataset those observations that are not intended to be in the sample. They will be omitted in any analysis that makes use of the variables with the missing values - for example, your xtsum. Right now, the summary for the other variables includes values for observations that "exit the sample".

        Comment

        Working...
        X