Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to balance a panel data with missing values?

    Dears,
    I have a panel data(strongly balanced) with some missing values. I need to run thresholdreg for which I need a balance panel with no missing values. Can anyone please help me how to get red of the missing values and still keep the data balanced?

    Thanks,

    Arshad

  • #2
    Get a value for how many sections your panel has (how many repeated observations). Let's say 3, for example. If it's a very large panel, you may need to code this to count the max number of dates.
    Code:
    bys id: gen pan_num = _N
    drop if pan_num != 3
    Your panel will now be balanced.

    Comment


    • #3
      Forgot I missed a part. You can loop through all the variables and then balance with the code above.
      Code:
      foreach var of varlist _all {
      drop if missing(var)
      }

      Comment


      • #4
        Arshad:
        welcome to the list.
        As an aside, before dealing with missing values, I would spend some time investigating whether the missingness is (or not) informative.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you so much guys. I just read the notifications and your comments. I am going to try it now. Thank you all and Thank you Carlo. Arshad

          Comment


          • #6
            Hello Andrew, I tried your solution, but i still get an unbalanced data. I guess it's maybe because i have more missing values for more variables/years for some countries while less for other countries??

            Comment


            • #7
              Originally posted by Andrew Castro View Post
              Forgot I missed a part. You can loop through all the variables and then balance with the code above.
              Code:
              foreach var of varlist _all {
              drop if missing(var)
              }

              Hello Andrew, I did this for each variable in my data but my data is still unbalanced. I guess it's maybe because i have more missing values for more variables/years for some countries while less for other countries?. Can you please suggest something?
              Thanks
              Arshad

              Comment


              • #8
                I'm not understanding how it's not balancing. Is id the correct variable that identifies your countries? Did you change that in the code to balance the panel? If so, I may need an example of your data with dataex.

                Edit: Ah, I see. Do you have a time variable too that you are using such as "xtset id year?"
                Last edited by Andrew Castro; 30 Dec 2016, 12:33.

                Comment


                • #9
                  Hi Andrew, Thank for reply. Yes I have xtsen id year. the data is panel data.

                  Comment


                  • #10
                    Alright, from my understanding, the panel is weakly balanced after attempting to balance. So, you have some countries with different time-wise observations. If it were strongly balanced, that would imply a problem with missing data. How did you choose the number to use when balancing? Can you run "tab year" or whatever your time variable is and post the output here before and after the balance attempt?

                    Comment


                    • #11
                      Yes exactly it says its weekly balanced. The table year before balancing is...


                      . tab year

                      year | Freq. Percent Cum.
                      ------------+-----------------------------------
                      1996 | 104 5.00 5.00
                      1997 | 104 5.00 10.00
                      1998 | 104 5.00 15.00
                      1999 | 104 5.00 20.00
                      2000 | 104 5.00 25.00
                      2001 | 104 5.00 30.00
                      2002 | 104 5.00 35.00
                      2003 | 104 5.00 40.00
                      2004 | 104 5.00 45.00
                      2005 | 104 5.00 50.00
                      2006 | 104 5.00 55.00
                      2007 | 104 5.00 60.00
                      2008 | 104 5.00 65.00
                      2009 | 104 5.00 70.00
                      2010 | 104 5.00 75.00
                      2011 | 104 5.00 80.00
                      2012 | 104 5.00 85.00
                      2013 | 104 5.00 90.00
                      2014 | 104 5.00 95.00
                      2015 | 104 5.00 100.00
                      ------------+-----------------------------------
                      Total | 2,080 100.00

                      and after balancing it becomes this...




                      . tab year

                      year | Freq. Percent Cum.
                      ------------+-----------------------------------
                      1996 | 68 4.98 4.98
                      1997 | 69 5.05 10.04
                      1998 | 69 5.05 15.09
                      1999 | 71 5.20 20.29
                      2000 | 75 5.49 25.79
                      2001 | 75 5.49 31.28
                      2002 | 74 5.42 36.70
                      2003 | 77 5.64 42.34
                      2004 | 74 5.42 47.77
                      2005 | 76 5.57 53.33
                      2006 | 74 5.42 58.75
                      2007 | 70 5.13 63.88
                      2008 | 72 5.27 69.16
                      2009 | 68 4.98 74.14
                      2010 | 70 5.13 79.27
                      2011 | 66 4.84 84.10
                      2012 | 63 4.62 88.72
                      2013 | 65 4.76 93.48
                      2014 | 60 4.40 97.88
                      2015 | 29 2.12 100.00
                      ------------+-----------------------------------
                      Total | 1,365 100.00

                      Comment


                      • #12
                        So by dropping missing values, there are not equal observations for every country. You'll have to subset your data if you want to work with a balanced panel. The question becomes which countries you should keep and for what years, along with possible bias in your data by doing so.

                        Comment

                        Working...
                        X