Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dealing with unbalanced panel data

    Hi,

    I am intending to analyze company payout policy determinants and have a set of panel data. My data set is unbalanced due to companies entering and dropping out of the stock exchange I am analyzing. My independent variables consist of financial information for the companies in my data set. The data collected for the companies that are included in the exchange any given year is complete. I would ideally not drop the companies that do not have data for all years due to bias. I have understood from a previous thread that Stata is able to handle unbalanced panel data but I am not sure what this in practice means, i.e are missing observations dropped. I saw some suggestions as to using multiple imputation when individual values are missing but I am not sure if this can or should be done when entire years are missing for certain companies.

    Code:
    . xtset Conumber Year
           panel variable:  Conumber (unbalanced)
            time variable:  Year, 2005 to 2016, but with gaps
                    delta:  1 unit
    
    . xtdescribe
    
    Conumber:  1, 2, ..., 128                                    n =        128
        Year:  2005, 2006, ..., 2016                             T =         12
               Delta(Year) = 1 unit
               Span(Year)  = 12 periods
               (Conumber*Year uniquely identifies each observation)
    
    Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                             1       2       9        12        12      12      12
    
         Freq.  Percent    Cum. |  Pattern
     ---------------------------+--------------
           82     64.06   64.06 |  111111111111
            4      3.13   67.19 |  11..........
            4      3.13   70.31 |  111111......
            3      2.34   72.66 |  ...........1
            3      2.34   75.00 |  ...111111111
            3      2.34   77.34 |  ..1111111111
            3      2.34   79.69 |  111111111...
            2      1.56   81.25 |  ..........11
            2      1.56   82.81 |  .........111
           22     17.19  100.00 | (other patterns)
     ---------------------------+--------------
          128    100.00         |  XXXXXXXXXXXX
    I would be very thankful for suggestion in dealing with the unbalanced data. As this is my first time posting I hope I got the instructions for posting correct and would be happy to provide any additional information if necessary.

    (Stata version 14.2 SE)

  • #2
    Jerker:
    welcome to this forum.
    Your way of posting your first question is perfect.
    As far as your question is concerned:
    .- you're correct that Stata can handle both balanced and unbalanced panel dataset; that is, it is not necessary that all the panels have tha same number of observations;
    - missing values: Stata always adopts listwise deletion and focuses on complete case analysis; that is observations with missing values in any variables (regressand or regressors) will be excluded from the statistical analysis;
    - you're correct that -mi- and, I would add, -ipolate-, are useful to deal with missing data. However, there's a relevant step upstream to take, that is investigating whether the missingness underlying your data is ignorable or not (see -help mi- for more on this topic).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo,

      Thank you for your reply and advice. I looked into -mi- as you recommended. Although my knowledge in the matter is limited I believe that in this case the missingness should be ignorable, as the missingess in my case should not depend on the missing values themselves.

      I am still not completely sure how the unbalanced panel data will work with my model. I intend to use an ordered logit model and given that the lowest number of years for a given company is one, will this not become a problem? Below is an example of the amount of missing values.

      Code:
      . tabulate Logitrank, missing
      
       Logit rank |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |        324       21.09       21.09
                1 |        799       52.02       73.11
                2 |        121        7.88       80.99
                . |        292       19.01      100.00
      ------------+-----------------------------------
            Total |      1,536      100.00
      I was also wondering if it is possible to perform multiple imputation when complete observations are missing or can it only be done when individual values are missing?

      Comment


      • #4
        Jerker:
        as far as I can remember, there's no -xt- command for -ologit-. Hence, the only available option is to run -ologit- with standard errors clustered on -panelid-, since you have nonindependent observations.
        If you have missing values for all the observations in a given variable, I would consider deleting it.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Jerker:
          as an amendment to my previous reply, -xtologit- do exists, with -re- specification only.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment

          Working...
          X