Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • R Square in Panel Data

    Hi everyone,

    I tried zapping through the search a bit but could no find a correct answer. My problem. I am running FE on quite a large Panel data set (15,000 groups, 30,000 observations) and have around 10 dependent variables. However, when I estimate my model I get a within R-Square of 0.001 or 0.004, so EXTREMLY low. However individual variables seem to be statistically significant (t>3).

    What to do about this? Is my model just garbage? Alternatively I read somewhere that individual significance is much more useful in panel data.

    Any input on this?

    Many thanks in advance!!
    Last edited by Andreas Baltin; 18 Apr 2019, 08:10.

  • #2
    I don't have an answer, but I noticed this blog and related references when I too was interested in R^2 following a mixed model. Has anyone coded these R^2 methods into Stata?

    https://jonlefcheck.net/2013/03/13/r...ffects-models/

    Nakagawa, S., and H. Schielzeth. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution 4(2): 133-142. DOI: 10.1111/j.2041-210x.2012.00261.x

    Johnson, Paul C.D. 2014. Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.12225.

    Comment


    • #3
      Here is a useful link!

      https://stats.idre.ucla.edu/stata/fa...ize-for-mixed/

      Comment


      • #4
        Originally posted by Dave Airey View Post
        Hi Dave,

        many thanks for the reply. I have read through your link but I am not able to resolve my issues.

        Comment


        • #5
          Andreas:
          https://us.sagepub.com/en-us/nam/fix...els/book226025, page 19 says that "The within R2 is just the usual R2 calculated for the regression using the mean deviation variables..." In the example worked out by Paul Allison, the within R2 was 0.033 (that is, not sky-rocketing at all).
          Kind regards,
          Carlo
          (Stata 18.0 SE)

          Comment


          • #6
            Hi Carlo!

            Thanks for your reply, I think I understood it now and this issue is solved. However, I encountered another problem.

            Another general question: As said I am working with panel data and about 10 dependent regressions. However, when I include a variable on income my observations drop to 11500 groups and 18000 observations, so avg. observations per group is 1.6. However if I exclude this variable I get 13000 groups with 23000 observations so a considerable higher average of 1.8. The variable itself is not statistically significant, but it changes the slope of other coefficients quite stronlgy (probably because a lot of observations are dropped). Do you have any advice on how to respond to this issue?

            Should I just not include this variable? Or include it and accept the reduction in observations? Or is there any way around it?

            Many thanks.

            Comment


            • #7
              Andreas:
              I'm not clear with what you mean by
              ...10 dependent regressions.
              Do you mean you have 10 predictors/regressors in the right-hand side of your regression equation or else?
              As far as your last query is concerned, I think what you're experiencing is due to missing values detected in the variable of income (as you call it).
              As you may already be aware of, Stata omits by default observations with missing values in any of the variables.
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment


              • #8
                Originally posted by Carlo Lazzaro View Post
                Andreas:
                I'm not clear with what you mean by
                Do you mean you have 10 predictors/regressors in the right-hand side of your regression equation or else?
                As far as your last query is concerned, I think what you're experiencing is due to missing values detected in the variable of income (as you call it).
                As you may already be aware of, Stata omits by default observations with missing values in any of the variables.
                Hi Carlo,

                I think my concentration is slowly going. Yes, I meant I am using about 10 regressors. And yes, what I am experiencing is indeed due to missing values in my income variable. I have read a bit into 'Imputing Missing Values', is this something I should follow more closely?

                The thing is, I am doing all of this work for my masters thesis in economics, and I feel like I am doing soooo many things that I have never even heard of before. Is it worth getting familiar with Imputing methods or is this just an overkill and I should just accept the lower amount of observations? --> I mean I still have 18000 observations left for my analysis.

                Comment


                • #9
                  Andreas:
                  is everyone's daily experience to learn something that you've never heard about during academic courses!
                  -mi- theoretical issues and applications are obviously challenging; on the other side, a relevant amount of missing values makes your estimates basically meaningless (especially if the missingness is not ignorable or, put differently, should be modelized).
                  I would recommend you to take this issue up with your teacher/supervisor, just to elaborate a shared research strategy and, more important, avoid problems/misunderstandigs as the discussion deadline becomes closer,
                  Kind regards,
                  Carlo
                  (Stata 18.0 SE)

                  Comment


                  • #10
                    Hi Carlo!

                    Many thanks for your post again. The thing is: I have read into imputation and often encounter threads on statalist where everyone says that this is a sophisticated technique.

                    What I have done now: I have shown in my essay that the missingness is uncorrelated to the transient error, and therefore the sub-sample is a random sample from the bigger sample, and I proceed with a complete case analysis.

                    As you say, I am not that happy with this solution either. I am just rolling with this for now as timing is pretty tough, and will look into this again once I have made more progress. I simply have not the time to divert my attention to imputation methods for 2 days+ while I do not even necessarily need them. I am also meeting with my supervisor next week, so will surely ask him about it again!

                    Many thanks again!

                    Comment


                    • #11
                      Andreas:
                      there are community contributed commands (-search mcartest- to spost and install it) that makes the test you have possibly conducted in another way, easier.
                      However, sad as it may sound, please note that missing completely at random (MCAR), as you seem to imply in your post, is possible but pretty rare. In other words, you should produce sound support for that statement in front of your supervisor and discussants.
                      As you're seemingly are under tight time constraints, I would also check with your supervisor the chance to avoid considering missing values in your analysis but highlight in the Discussion/Conclusion of your dissertation that this methodological choice is a limitation of your research.
                      Kind regards,
                      Carlo
                      (Stata 18.0 SE)

                      Comment

                      Working...
                      X