Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • unbalanced panel - missing

    Hi! everyone

    I have an unbalanced panel for 2006-2014 period, implying that observations are missing for some years. Hypothetically, the following is an example of observation with ID "15", which has missings for some years (2006-2009)


    year ID X
    2010 15 .0029651
    2011 15 .0021118
    2012 15 .0011135
    2013 15 .0022467
    2014 15 .0022368
    Is there any way to convert this dataset in a way that it shows missing values for 2006-2009 instead of the empty cells (below)

    Code:
    year    ID     X
    2006    15     .
    2007    15     .
    2009    15     .
    2010    15    .0029651
    2011    15    .0021118
    2012    15    .0011135
    2013    15    .0022467
    2014    15    .0022368
    Thanks for any tips!!

    Mina

  • #2
    Mina:
    you can recode -.- as an extended missing -.a- or "implausible" customary numerical value, say, 9999. (see -help missing- for further details).
    Code:
    replace X=.a if X==.
    Code:
    replace X=9999 if X==.
    In the latter case, you shoud remember to exclude them from your analysis via an -if- qualifier:
    Code:
    tabstat X if X!=9999, stat(count mean sd p50 min max)
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Dear Carlo

      Thank you for your prompt response!

      I have tried both options and got the following message

      (0 real changes made)
      Basically, nothing happens
      in my example, observations 2006-2009 for variable X are missing in the data (they do not exist).I would like them to be missing observations with value "." or even "0". Any of the two is good, as long as that takes me to "balanced panel".

      Thank you for any further help!

      Comment


      • #4

        Mina:
        I cannot replicate your problem:
        Code:
        . input year    ID     X
        
                  year         ID          X
          1.
        . 2006    15     .
          2.
        . 2007    15     .
          3.
        . 2009    15     .
          4.
        . 2010    15    .0029651
          5.
        . 2011    15    .0021118
          6.
        . 2012    15    .0011135
          7.
        . 2013    15    .0022467
          8.
        . 2014    15    .0022368
          9. end
        
        . replace X=9999 if X==.
        (3 real changes made)
        
        . list
        
             +----------------------+
             | year   ID          X |
             |----------------------|
          1. | 2006   15       9999 |
          2. | 2007   15       9999 |
          3. | 2009   15       9999 |
          4. | 2010   15   .0029651 |
          5. | 2011   15   .0021118 |
             |----------------------|
          6. | 2012   15   .0011135 |
          7. | 2013   15   .0022467 |
          8. | 2014   15   .0022368 |
             +----------------------+
        That said, please note that Stata can handle both balanced and unbalanced panel datasets without any problem.
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Dear Carlo

          That is the crux of the issue. 2006-2009 are missing in the data. When I browse my data, there are no observations with ".".


          my data looks like this

          Code:
            
               +----------------------+
               | year   ID          X |
               |----------------------|
            4. | 2010   15   .0029651 |
            5. | 2011   15   .0021118 |
               |----------------------|
            6. | 2012   15   .0011135 |
            7. | 2013   15   .0022467 |
            8. | 2014   15   .0022368 |
               +----------------------+
          Nothing appears by the year 2006-2009. But i would like to have this format:

          Code:
               +----------------------+
               | year   ID          X |
               |----------------------|
            1. | 2006   15       .    |
            2. | 2007   15       .    |
            3. | 2009   15       .    |
            4. | 2010   15   .0029651 |
            5. | 2011   15   .0021118 |
               |----------------------|
            6. | 2012   15   .0011135 |
            7. | 2013   15   .0022467 |
            8. | 2014   15   .0022368 |
               +----------------------+
          Or, let me put it differently.In 2007, these firms have values X.


          Code:
          input float ID int year float
           1 2007          0
           9 2007          0
          11 2007          0.1
          12 2007          0
          13 2007          .2
          14 2007          0
          15 2007          0
          What I want is to see is ALL firms in 2007 (those with ID 2-8 and ID-10, as well!) in the data, but with missing value for variable X.

          Code:
          input float ID int year float
           1 2007          0
          2 2007           .
          3 2007           .
          4 2007           .
          5 2007           .
          6 2007          .
          7 2007          .
          8 2007          .
           9 2007          0
          10 2007         .
          11 2007          0.1
          12 2007          0
          13 2007          0.2
          14 2007          0
          15 2007          0
          Sorry if I caused confusion and thank you so much for any help!
          Last edited by Mina Wu; 17 Apr 2017, 11:17.

          Comment


          • #6
            You can use fillin to balance the sample. However, there are only a few Stata routines that demand balanced data. I suspect most if not all such techniques require you have data on all the observations considered to be balanced.

            Arbitrarily filling in with 0 will almost certainly mess up your estimation - you're arbitrarily adding a pile of data errors.

            Comment


            • #7
              I completely agree with you. I am not using this data for estimation, but to copy-past it in my excel for other analysis.
              Do you know how dataset "filling" works?

              Thanks!!

              Comment

              Working...
              X