Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linearity in panel data, QQ Plots, Zeros vs. Missing data

    Hi Stata folks,

    With thanks for the help thus far, I've got a few issues I'm still struggling with.
    • I've read that QQ Plots are a good way to assess whether the relationship between a regressor and the DV is linear, but are they applicable to panel data?
    • Attached are QQ Plots I've generated, which seem odd to me given that the data points diverge from the fit-line. I'm also attaching my dataset.
    • Conceptually, I'm also thinking about the best way to represent a government policy. The literature I'm drawing on uses "policy-years" with a numeric variable taking on the value of 1 in year 2004 if the policy was active in 2004 and zero for 2003. But this seems to be creating over dispersion. If I set the zeros as "." does Stata just drop those observations? Does this seem more or less defensible than the approach the article I am citing takes?
    Thanks in advance for your thoughts,

    -nick


    Attached Files

  • #2
    QQ plots are not, in my opinion, a good way to investigate linearity; where did you read that?

    use lowess

    Comment


    • #3
      Sorry, but I don't really understand your last question mentioning policy, except that Stata does not drop any observation just because it includes missings; it just ignores that observation in any model fit, a different thing altogether This is easy enough to determine by experiment. Conversely, the question in return is how you think that Stata could use a missing value in a model fit.

      On the larger question, I can't see that quantile-quantile plots such as you have drawn them bear directly or even indirectly on linearity of relationship, for at least two reasons:

      1, You sort each distribution from smallest to largest, but that discards absolutely all information on which data points are paired with which in the dataset.

      2. The point about a quantile-quantile plot is that identical distributions define a natural reference situation, e.g. that you might expect or rather assess whether a distribution of residuals is close to Gaussian, or that males and females might as a first approximation have the same distribution for something, except that the exact structure of difference is also interesting and important. I can't see that this expectation carries over to quite different responses or predictors, except in the loosest sense that a researcher might expect many distributions all to be skewed, or whatever.

      #1 is really fundamental here, however. It is the same point as saying that knowing marginal distributions tells you nothing directly about relationships between variables.

      I am curious where you read this or heard this.

      All that said, a scatter plot matrix is an immensely more direct way of thinking about linearity. You can be selective and focus on just some relationships if you prefer (using (e.g.) crossplot from SSC).

      Or fit a model, then look at an added variables plot for ideas about which parts of the relationship you got right and wrong. (If all else fails, a set of residual vs predictor plot can be about as useful.)

      Panels: We've touched on this point before in earlier threads. Whether to plot panels separately or to lump them together is your choice. But in general I see no reason why the aggregate of distributions from quite different panels is going to help. The key remains to look at distributions if and only if those distributions bear upon your analysis.
      Last edited by Nick Cox; 23 Jun 2015, 11:24.

      Comment


      • #4
        Hi Rich,

        Thanks for your note. I'm going off of Hamilton's Statistics with Stata and Stata's documentation: http://www.stata.com/manuals13/rdiagnosticplots.pdf

        But I will experiment with some Lowess plots as I see they can be used to smooth too.

        Thanks,

        -nick


        Comment


        • #5
          I looked at the Stata doc you cited but "linear" was not found on a search; what page are you referring to?

          I don't have the Hamilton book

          Comment


          • #6
            Here's one source that says of QQ plots: "If the data are truly sampled from a Gaussian distribution, the QQ plot will be linear" (http://graphpad.com/faq/file/1872QQ%...ty%20plots.pdf).

            However, there is definitely support for your suggestion, which I will try. This source recommends using scatter plots, plotting residuals (both of which I have done) and using a "scatter plot smother such as lowess... to give a visual estimation of the conditional mean" (http://www.ma.utexas.edu/users/mks/statmistakes/modelcheckingplots.html).

            Comment


            • #7
              your quote is talking about normality of the data, not linearity of the relationship

              Comment


              • #8
                That helps a lot to explain. even though at best it only applies to one particular application of QQ plots, in which normal quantiles are on one axis.

                With reference to QQ plots:

                Linearity in the very special sense of linearity and equality y = x implies identical distributions. This is linearity on the specified QQ plot, and nothing to do with linearity as meant in regression models. That last reservation needs to be repeated for all cases.

                Linearity in the slightly more general sense y = x + a implies that distributions differ by at most an additive shift (e.g. that the means are different, but otherwise the distributions are identical).

                Linearity in another slightly more general sense y = bx implies that distributions differ multiplicatively, which in the case of normal distributions might mean only differing SDs. But in general, and with nothing else said, it implies that you should perhaps be working or thinking on logarithmic scale.

                I don't think there is an easy general interpretation of
                y = a + bx on QQ plots, although I may just be feeling tired.

                Chambers, Cleveland, Kleiner and Tukey remains one of the best treatments of QQ plots.
                http://www.amazon.com/Graphical-Analysis-Wadsworth-Statistics-Probability/dp/053498052X



                Comment


                • #9
                  Thanks for your comments. And Nick, I just saw your note from 11:10 am (not sure why it didn't show up before).

                  Just to recap: To best assess whether there's a linear relationship between an X and a Y, I should use scatter plots and, based on Rich's point, try a Lowess plot?

                  Nick: I've been using AV-plots to check for outliers and to assess the impact of regressors on my model, but it seems difficult to assess linearity with them.

                  Comment


                  • #10
                    Note that times you see are just local to your time zone. As a geographer I have to remind you that the planet is spherical and the rest of the world is not in California!

                    The numbering of posts in a thread is less labile. (The fudging there is that there is some scope to posters to delete posts.)

                    Added variable plots are not a panacea any more than anything else. To see patterns it helps to have (1) a large number of data points (2) a model that is not too lousy. These are counsels of perfection. If your model is far from the "true" model, or no such simple "true" model exists even as a rough approximation, then added-variable plots can be cryptic in my experience.

                    From what I recall of your problem, you don't have very clear relationships at all, which is no doubt part of what you are trying to work on. You need to factor out the size of states as a predictor; otherwise lumping California through to Wyoming may not make much sense.

                    Comment


                    • #11
                      Ok, thanks for clarifying re posting time -- I assumed it adjusted UTM for each time zone (hmm... feature request?).

                      I'm attaching the AV plots for my main variables. I find these plots useful at time and cryptic at others!
                      Attached Files

                      Comment

                      Working...
                      X