Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unbalanced Data

    Hi everybody,

    I need to do POLS and FE analyses with highly unbalanced panel data: 60 countries over 35 years, there is no a single country with full range of observations and even more, observations on the index of income inequality are too few by each country.

    My two questions are the following:

    (1) Is it correct if for my unbalanced data I will use standard commands for panel data regressions, such as
    xtreg ... ... ... ... , robust
    xtreg ... ... ... ... , FE robust

    Is xt-commands enough for Stata to handle data with many missing observations?


    (2) I need to test SE for heteroskedasticity and for FE model I have found a written test "xttest3" to check heteroscedasticity, but I haven't found the similar test for POLS model. What concerns autocorrelation, I don't know how to check it at all ... maybe I need to correct for autocorrelation too and use cluster (id) instead of "robust"? Would you plese tell me how I can test both models in order to decide on clustering / correcting for heteroskedasticity.

    If needed, dataset is attached
    Attached Files
    Last edited by Lana Rais; 10 Sep 2015, 03:56.

  • #2
    Lana:
    your post raised different issues:
    - Stata can handle unbalanced panel data analysis effectively under all the specifications;
    - if you are intended to replace missing observations, you may want to consider -help ipolate- or -help mi- and related entries in Stata .pdf manual;
    - -xt- is the prefix of a suite of commands that deal with different type of panel data analyses; hence, sticking with your question, -xt- is not enough to carry out what you're after, while -xtreg- does;
    - under -xtreg- vce(robust) and vce(cluster) are interchangeable;
    - a POLS implies autocorrelation, (so there's no need to test it), since you have multiple observations on the same units. However, in this instance you should go -vce(cluster)-, as vce(robust) takes heteroskedasticity only into account.
    As a closing-out remark, plese note that -FE- in your second code should be -fe-, as Stata commands are case-sensitive (by the way, I assume that you have already checked via -hausman- what is the best speciication for your panel data regression, i.e.: -fe- or -re-).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you Carlo!

      Comment


      • #4
        I have one more question with regard to my dataset.

        In Stata, can I drop out some ids (countries, in my case) from my dataset to create a smaller subset for some regressions and afterall return dropped coutries back to the initial dataset?

        For the momet I have 3 data sets with different country groups (full dataset, developed and developing countries) but maybe there is a code in Stata which allows to have only one full dataset but transform it into smaller subsets when needed?

        Comment


        • #5
          Lana:
          in my opinion, the the best approach for dealing with this kind of issue is not -drop- the undesired observations but tag them with a dummy, something along the lines of (assuming your -country- variable is in numeric format):
          Code:
          gen undesired_countries=1 if country==<whatisneeded>/// 1 for undesired countries; repeat replacing -gen- with -replace- for each county you want to tag as undesired
          replace undesired_countries=0 if country==.
          If -country- variable is in -string- format, tweak what above as follows:
          Code:
          gen undesired_countries=1 if country=="whatisneeded"/// 1 for undesired countries; repeat replacing -gen- with -replace- for each county you want to tag as undesired
          replace undesired_countries=0 if country==""
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Ah yes, I haven't thought about dummy for this case!

            I use a few regional dummies in my analysis just to check whether there are differencies between Latin- African- and Asian world I have to think about this. Thanks

            Carlo, whould you be so nice to share your opinion on one more issue, it would help me to understand the problem of autocorrelation.

            In my data I assume cross-sectional autocorrelation, because economies are open and capital is free to move between coutnries. On top of that there must be time series correlation.

            Analyzing my panel data with OLS I have computed full time-period averages for cross-sectional variables (arithmetic averages for all variables except for growth, geometric one for growth), and regressing average indicator on the set of other averages I want to get rid of time series autocorrelation. I hope I will.

            But will averraging eliminate as well cross-sectional autocorrelation?

            Comment


            • #7
              Lana:
              I would take a step back.
              Why are you considering a POLS? is it because the F-test at the foot of the -xtreg,fe- table outcome turned out unsignificant?
              If it is not the case, I woud get rid of POLS and focus on -xtreg-, instead, adding a vce(cluster) or vce(robust) option.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Hi, I think Carlo's comments are right to the point and great. I would also ask why you're not considering random effects as well. The problem that I have with fixed effects is that it doesn't allow the inclusion of variables that only vary across panels but not within panels. You can always test which is the more appropriate estimation method with a hausman command.

                Whether to include POLS or not that is always a question of personal choice. Even though the F-test that Carlo mentions may indicate that FE is the right estimation method, I find that sometimes it's interesting to see how similar the coefficients on each variable are or not across estimation methods.
                Alfonso Sanchez-Penalver

                Comment


                • #9
                  Alfonso and Carlo, thank you for your answers.
                  I need to apply POLS and fixed effects methods to my panel data and I need to conduct OLS regression with cross sectional averages just because I want to replicate one study which was peformed many years ago. But I want to do it correctly.

                  In order to do cross sectional analysis with OLS regression I average all the indicators in my panel data and with this averaging I suppose I will get rid of autocorrelation automaticly but I am not sure. Maybe in this regression I need to control for autocorrelation?
                  Last edited by Lana Rais; 11 Sep 2015, 09:03.

                  Comment


                  • #10
                    Lana:
                    as far as I can get your OLS approach, you should en up with single average values for each panel unit across time-periods; hence, I do not think that autocorrelation could be an issue, in that you don't have multiple observations for each panel unit anymore.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      ok, many thanks

                      Comment


                      • #12
                        Hi, my first thought is that the OLS of the across panels' averages is exactly the between estimator you have in xtreg, be. Here's a simple code to make you realize that:
                        Code:
                        clear all
                        set more off
                        
                        webuse nlswork, clear
                        global x "age race collgrad grade hours"
                        keep idcode year ln_wage $x
                        
                        xtset idcode year
                        
                        ** Between estimator
                        * xtreg be
                        xtreg ln_wage $x, be
                        
                        * regress estimation
                        * collapse calculates the means of the variables you include in the varlist
                        * the by option tells collapse to the group identifier you want for the means
                        * cw option takes care of missing values in certain panels
                        collapse ln_wage $x, by(idcode) cw
                        regress ln_wage $x
                        I hope this helps making your life easier.
                        Alfonso Sanchez-Penalver

                        Comment


                        • #13
                          Alfonso thanks, it is very nice of you! I need not just an arithmetic averages but a geometric average for one of the indicators. I have already done the code, but I will check now whether I can do it shorter or simpler.
                          Last edited by Lana Rais; 11 Sep 2015, 12:18.

                          Comment


                          • #14
                            Do you know about egenmore (available in SSC)? It basically has more functions for the egen command. If you download and install it then you can create a new variable that has the geometric mean of whatever variable you want by group. The following code extends the one I sent before calculating the geometric mean of age and doing the between estimation with xtreg, be and with regress like before.
                            Code:
                            clear all
                            set more off
                            
                            webuse nlswork, clear
                            global x "age race collgrad grade hours"
                            keep idcode year ln_wage $x
                            
                            xtset idcode year
                            
                            ** Between estimator
                            * xtreg be
                            xtreg ln_wage $x, be
                            
                            preserve
                            
                            * regress estimation
                            * collapse calculates the means of the variables you include in the varlist
                            * the by option tells collapse to the group identifier you want for the means
                            * cw option takes care of missing values in certain panels
                            collapse ln_wage $x, by(idcode) cw
                            regress ln_wage $x
                            
                            restore, preserve
                            
                            egen gage = gmean(age), by(idcode)
                            global x "gage race collgrad grade hours"
                            xtreg ln_wage $x, be
                            
                            collapse ln_wage $x, by(idcode) cw
                            regress ln_wage $x
                            Alfonso Sanchez-Penalver

                            Comment


                            • #15
                              Many thanks. Afterwards, can I get back the initial content of my dataset? I mean is there any command in Stata for it?

                              I want to do first OLS Regression with averages but then POLS and FE with full range of panel data.
                              Last edited by Lana Rais; 12 Sep 2015, 04:35.

                              Comment

                              Working...
                              X