Making panel dataset balanced - "filling down"

Chinmay Sharma

Join Date: Nov 2015

Posts: 351
#1

Making panel dataset balanced - "filling down"

22 Jul 2020, 14:53

Hi All,

I had posted this earlier, but had incorrectly conveyed my qualm. The dataset I have resembles the following:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float country str7 Gender str12 Education float(AverageValue year) 1 "Male" "Not Educated" 2000 2000 1 "Female" "Educated" 3000 2000 2 "Male" "Educated" 3000 2000 3 "Female " "Not Educated" 4000 2000 1 "Male" "Educated" 3000 2001 1 "Female" "Educated" 3000 2001 2 "Female" "Educated" 3000 2001 3 "Male" "Educated" 3000 2001 3 "Male" "Not Educated" 2000 2001 3 "Female" "Educated" 3000 2001 3 "Female" "Not Educated" 2000 2001 end

Here, I have data by country and year, on the average wages by education level (educated or not) of males and females. For expositional purposes, there are only 3 countries, and 2 years. A balanced panel consists of data on male and female average wages, for both education levels (educated or not). A complete set of observations is for country 3, in year 2001.

I wish to make this panel dataset balanced, i.e. fill in place holders even for combinations of missing observations. This would mean that for 2000, I would expand "down" for country 1, have two more cells (one for educated males and one for not educated females), but with missing values for average value. For these missing values, I will be using an econometric model to impute them. But, in order to perform the imputation, I need to have this panel dataset balanced.

Any guidance on this is much appreciated.

Many Thanks,
CS
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#2

23 Jul 2020, 01:23

Chinmay:
trying to convert an unbalanced panel into a balanced one is, in general, a bad idea, because, by ignoring the mechanisms and the patterns underlying data missingness, in all likelihood you will end up with a panel that is far from the original one.
At the top of that, Stata can handle both unbalanced and balanced panle datasets without any problem.

Kind regards,
Carlo
(Stata 19.0)
Comment
Paul Dickman

Join Date: Apr 2014

Posts: 294
#3

23 Jul 2020, 01:34

Code:

help fillin

fillin adds observations with missing data so that all interactions of varlist exist, thus making a complete rectangularization of varlist.

I’m not suggesting you should use this command (ignoring Carlo’s sound advice) but this is the command that does what you want.

Last edited by Paul Dickman; 23 Jul 2020, 01:38.
Comment

Attaullah Shah

Join Date: Aug 2014
Posts: 1669

23 Jul 2020, 05:17

As pointed out by the other members, one should be very clear and theoretically correct in filling missing values. If you are, then you can use fillmissing program, that is available on the SSC, once the filling command creates the empty observations

Code:

ssc install fillmissing

clear
webuse fillin1
fillin sex race age_group
list

     +----------------------------------------------------+
     |    sex    race   age_gr~p      x1     x2   _fillin |
     |----------------------------------------------------|
  1. | female   white      20-24   20393   14.5         0 |
  2. | female   white      25-29       .      .         1 |
  3. | female   white      30-34       .      .         1 |
  4. | female   black      20-24       .      .         1 |
  5. | female   black      25-29       .      .         1 |
     |----------------------------------------------------|
  6. | female   black      30-34   39399   14.2         0 |
  7. |   male   white      20-24       .      .         1 |
  8. |   male   white      25-29   32750   12.7         0 |
  9. |   male   white      30-34       .      .         1 |
 10. |   male   black      20-24       .      .         1 |
     |----------------------------------------------------|
 11. |   male   black      25-29       .      .         1 |
 12. |   male   black      30-34       .      .         1 |
     +----------------------------------------------------+

. bys sex: fillmissing x1
(9 real changes made)

. list

     +----------------------------------------------------+
     |    sex    race   age_gr~p      x1     x2   _fillin |
     |----------------------------------------------------|
  1. | female   white      20-24   20393   14.5         0 |
  2. | female   white      25-29   20393      .         1 |
  3. | female   white      30-34   20393      .         1 |
  4. | female   black      20-24   20393      .         1 |
  5. | female   black      25-29   20393      .         1 |
     |----------------------------------------------------|
  6. | female   black      30-34   39399   14.2         0 |
  7. |   male   white      20-24   32750      .         1 |
  8. |   male   white      25-29   32750   12.7         0 |
  9. |   male   white      30-34   32750      .         1 |
 10. |   male   black      20-24   32750      .         1 |
     |----------------------------------------------------|
 11. |   male   black      25-29   32750      .         1 |
 12. |   male   black      30-34   32750      .         1 |
     +----------------------------------------------------+

If want to fillmissing with mean value, then

Code:

bys sex: fillmissing x1, with(mean)

Last edited by Attaullah Shah; 23 Jul 2020, 05:22.

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

Comment

Chinmay Sharma

Join Date: Nov 2015

Posts: 351
#5

23 Jul 2020, 07:12

Thank you everyone, for the sound advice!

Best,
CS
Comment

Announcement