Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data: Multiple observations per year

    Hi,

    I'm new to STATA and its commands so excuse my inexperience. My dataset contains multiple observations per ID. In a simplistic way, it looks like this (with made-up numbers):
    ID Time ROE Turnover Age Board_gender
    1 2012 25,4 1234 43 M
    1 2012 53 M
    1 2012 34 F
    1 2013 24,1 3402 45 M
    1 2013 54 F
    1 2013 43 F
    1 2013 44 F
    1 2013 34 M
    1 2014 33,1 3500 63 M
    1 2014 52 F
    1 2015 32,2 3478 41 M
    1 2015 38 M
    1 2015 57 M
    1 2015 42 F
    2 2012 24,5 4350 36 F
    2 2012 61 M
    2 2013 33,4 4590 43 M
    2 2013 45 M
    2 2013 51 M
    ...And so on. I have +5000 ID's.

    I really want to do some pooled OLS, FE, IV ect., but my data is highly unbalanced and I get "repeated time values within panel".
    My parameters of interest are ROE and Turnover, and I want to know the effect of Board_gender and Age on these parameters. Do I have to replace all Board_gender variables with a number, such that I only have one observation per year? (same with gender). Or am I able to apply STATA-tricks without deleting rows?

    Thank you so much in advance!

  • #2
    Johanne:
    welcome to this forum.
    If you have repeated time values within panel, you can simply -xtset- your data with -panelid- only:
    Code:
    xtset panelid
    This trick comes at the cost of Stata not supporting time-series related commands, such as lags and leads, that you might be interested in.
    Besides:
    - Stata can handle both balanced and unbalanced panel datasets without any problem;
    - please be advised that OLS,FE,IV are pretty different beasts and hardly interchangeable.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you! And thank you for your quick response!

      Won't I miss the whole idea of panel data, if I don't include my time variable - or can STATA handle this? :-)

      (I do not need time-series related commands, but would like to explore the evolution throughout the years and compare the results. If I drop the time variable, won't I get all my data mixed up - thus, still sorted by ID of course)

      Kind regards,
      Johanne

      Comment


      • #4
        Johanne:
        not quite.
        See the foreword of the Remarks and examples section, -xtset- entry, Stata .pdf manual (pages 504-505).
        That said, a toy-example might be helpful:
        Code:
        . use "https://www.stata-press.com/data/r16/nlswork.dta"
        (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
        
        . xtset idcode
               panel variable:  idcode (unbalanced)
        
        . xtreg ln_wage age, fe
        
        Fixed-effects (within) regression               Number of obs     =     28,510
        Group variable: idcode                          Number of groups  =      4,710
        
        R-sq:                                           Obs per group:
             within  = 0.1026                                         min =          1
             between = 0.0877                                         avg =        6.1
             overall = 0.0774                                         max =         15
        
                                                        F(1,23799)        =    2720.20
        corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000
        
        ------------------------------------------------------------------------------
             ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 age |   .0181349   .0003477    52.16   0.000     .0174534    .0188164
               _cons |   1.148214   .0102579   111.93   0.000     1.128107     1.16832
        -------------+----------------------------------------------------------------
             sigma_u |  .40635023
             sigma_e |  .30349389
                 rho |  .64192015   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(4709, 23799) = 8.81                 Prob > F = 0.0000
        
        . xtset idcode year
               panel variable:  idcode (unbalanced)
                time variable:  year, 68 to 88, but with gaps
                        delta:  1 unit
        
        . xtreg ln_wage age, fe
        
        Fixed-effects (within) regression               Number of obs     =     28,510
        Group variable: idcode                          Number of groups  =      4,710
        
        R-sq:                                           Obs per group:
             within  = 0.1026                                         min =          1
             between = 0.0877                                         avg =        6.1
             overall = 0.0774                                         max =         15
        
                                                        F(1,23799)        =    2720.20
        corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000
        
        ------------------------------------------------------------------------------
             ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 age |   .0181349   .0003477    52.16   0.000     .0174534    .0188164
               _cons |   1.148214   .0102579   111.93   0.000     1.128107     1.16832
        -------------+----------------------------------------------------------------
             sigma_u |  .40635023
             sigma_e |  .30349389
                 rho |  .64192015   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(4709, 23799) = 8.81                 Prob > F = 0.0000
        
        . help xtset
        
        .
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Since I work with this kind of data, let me add a little bit to Carlo's helpful comments.

          The reason you have multiple observations in a given year is because you have multiple executives listed for given company in a given year.

          However, your dependent variable is almost certainly at the firm level. It would make most sense to collapse your data to the firm level. I don't think you can neatly collapse string data, but that's not much of a problem.

          If you need to count the number of males and females for each firm year, you can do this with by sort firmyear and then the appropriate egan option.
          bysort firm year: egen numfem=count(Board_gender=="F")

          Comment

          Working...
          X