Dealing with unbalanced panel data

Jerker Salokivi

Join Date: Jul 2018
Posts: 2

Dealing with unbalanced panel data

07 Jul 2018, 06:26

Hi,

I am intending to analyze company payout policy determinants and have a set of panel data. My data set is unbalanced due to companies entering and dropping out of the stock exchange I am analyzing. My independent variables consist of financial information for the companies in my data set. The data collected for the companies that are included in the exchange any given year is complete. I would ideally not drop the companies that do not have data for all years due to bias. I have understood from a previous thread that Stata is able to handle unbalanced panel data but I am not sure what this in practice means, i.e are missing observations dropped. I saw some suggestions as to using multiple imputation when individual values are missing but I am not sure if this can or should be done when entire years are missing for certain companies.

Code:

. xtset Conumber Year
       panel variable:  Conumber (unbalanced)
        time variable:  Year, 2005 to 2016, but with gaps
                delta:  1 unit

. xtdescribe

Conumber:  1, 2, ..., 128                                    n =        128
    Year:  2005, 2006, ..., 2016                             T =         12
           Delta(Year) = 1 unit
           Span(Year)  = 12 periods
           (Conumber*Year uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         1       2       9        12        12      12      12

     Freq.  Percent    Cum. |  Pattern
 ---------------------------+--------------
       82     64.06   64.06 |  111111111111
        4      3.13   67.19 |  11..........
        4      3.13   70.31 |  111111......
        3      2.34   72.66 |  ...........1
        3      2.34   75.00 |  ...111111111
        3      2.34   77.34 |  ..1111111111
        3      2.34   79.69 |  111111111...
        2      1.56   81.25 |  ..........11
        2      1.56   82.81 |  .........111
       22     17.19  100.00 | (other patterns)
 ---------------------------+--------------
      128    100.00         |  XXXXXXXXXXXX

I would be very thankful for suggestion in dealing with the unbalanced data. As this is my first time posting I hope I got the instructions for posting correct and would be happy to provide any additional information if necessary.

(Stata version 14.2 SE)

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

07 Jul 2018, 07:32

Jerker:
welcome to this forum.
Your way of posting your first question is perfect.
As far as your question is concerned:
.- you're correct that Stata can handle both balanced and unbalanced panel dataset; that is, it is not necessary that all the panels have tha same number of observations;
- missing values: Stata always adopts listwise deletion and focuses on complete case analysis; that is observations with missing values in any variables (regressand or regressors) will be excluded from the statistical analysis;
- you're correct that -mi- and, I would add, -ipolate-, are useful to deal with missing data. However, there's a relevant step upstream to take, that is investigating whether the missingness underlying your data is ignorable or not (see -help mi- for more on this topic).

Kind regards,
Carlo
(Stata 19.0)
Comment
Jerker Salokivi

Join Date: Jul 2018

Posts: 2
#3

07 Jul 2018, 12:16

Dear Carlo,

Thank you for your reply and advice. I looked into -mi- as you recommended. Although my knowledge in the matter is limited I believe that in this case the missingness should be ignorable, as the missingess in my case should not depend on the missing values themselves.

I am still not completely sure how the unbalanced panel data will work with my model. I intend to use an ordered logit model and given that the lowest number of years for a given company is one, will this not become a problem? Below is an example of the amount of missing values.

Code:

. tabulate Logitrank, missing Logit rank | Freq. Percent Cum. ------------+----------------------------------- 0 | 324 21.09 21.09 1 | 799 52.02 73.11 2 | 121 7.88 80.99 . | 292 19.01 100.00 ------------+----------------------------------- Total | 1,536 100.00

I was also wondering if it is possible to perform multiple imputation when complete observations are missing or can it only be done when individual values are missing?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

07 Jul 2018, 15:44

Jerker:
as far as I can remember, there's no -xt- command for -ologit-. Hence, the only available option is to run -ologit- with standard errors clustered on -panelid-, since you have nonindependent observations.
If you have missing values for all the observations in a given variable, I would consider deleting it.

Kind regards,
Carlo
(Stata 19.0)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#5

07 Jul 2018, 23:27

Jerker:
as an amendment to my previous reply, -xtologit- do exists, with -re- specification only.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Dealing with unbalanced panel data

Comment

Comment

Comment

Comment