Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scatter plots

    First of all, I'm just learning STATA. So if my question is too basic, forgive me. I am running a regression of an equation that uses a dummy variable for pre-1992 and post-1992 data. Therefore, I will be running a regression for the equation without the dummy, with the dummy covering the pre-1992 data, and one with the dummy covering post-1992 data. I would like to create a scatter plot showing all three regression lines. How can I do this?

  • #2
    You need to have a look at lfit, not scatter.

    If your regressions are something like this:
    Code:
    regress y x
    regress y x if year<1992
    regress y x if year>=1992
    then this should do the trick:
    Code:
    twoway (lfit y x) (lfit y x if year<1992) (lfit y x if year>=1992)

    Comment


    • #3
      Actually I've created a dummy "pre" with the value of zero and a dummy "post" with the value of one. So my regressions are

      regress y x1 x2 x3 x4
      regress y x1 x2 x3 x4 pre
      regress y x1 x2 x3 x4 post

      So, if I understand your response, my code should read:

      twoway (lfit y x) (lfit y x if pre) (lfit y x if post)?

      Comment


      • #4
        That's legal, but note that there is no connection between the regress commands and the twoway command.

        The first does various (multiple) regressions; the second shows the results of various regressions with one predictor.

        If that's what you want, OK. If it's not, we need a better explanation of what you want to do, but check out margins and marginsplot, possibly.

        (It's my role today to remind you that "Stata" is the way to write it. See the FAQ Advice, all the way to the end, please.)

        Comment


        • #5
          ... except that if all your values of pre are 0, you will see nothing for the second plot!

          i.e. on second thoughts you need just one indicator (what you call dummy).

          Comment


          • #6
            Well, in certain circles in the States, they are called indicators. But, in most of the textbooks, they're called dummies.

            Comment


            • #7
              I did not explain my wording. I've heard of too many occasions when the term "dummy variables" has been wildly misread as offensive or disparaging, which is good enough reason to me to prefer the term "indicator variable".

              It's hard to know what's majority usage across statistical science from sampling several texts in just one application area, which is what most people do. But here majority usage is immaterial to my preference.

              Another area for small debates is what you call dependent and independent variables, that or something else. It wouldn't surprise me if dependent and independent were still the most common terms, but that doesn't stop them being lousy choices.

              Comment


              • #8
                Nick, you might have heard this somewhere, but I don't know of any political scientist or economist who would know what an "indicator" variable is. The standard term in these disciplines is dummy. A Google search on "econometrics dummy variables" leads to lots of links. A Google search on "econometrics indicator variables" leads to lots of links for econometrics and dummy variables. "Type findit dum" leads to a FAQ by Bill Gould on "How do I create dummy variables" (http://www.stata.com/support/faqs/da...mmy-variables/ ) and many other links.


                Ric Uslaner
                Last edited by Euslaner; 09 Jun 2014, 14:29.

                Comment


                • #9
                  I am (a) expressing personal preferences (b) offering a specific argument why "dummy variable" is a lousy term. On (a) anyone else can candidly disagree and express their own personal preferences. On (b) I do have horror stories of "dummy" being misunderstood.

                  I don't know many political scientists and I've never been one. As you are one, Ric, I bow to your impressions on what is common in your field. Perhaps other political scientists will tune in and comment.

                  But I know lots of economists and I think they are generally well educated mathematically and widely aware what an indicator variable is. That is consistent with "dummy" being the majority term.

                  I must work on Bill Gould and try to convince him of my position.

                  This is difficult territory. For example, I dislike words that mix Greek and Latin roots and tried to dissuade StataCorp from the invented word "transmorphic". I failed. At the same time, usages can become entrenched to the extent that protest is silly and futile. On "television", it's too late.

                  Comment


                  • #10
                    I just checked three prominent econometric texts, Nick. Two had no mention of indicator variables. The third had an entry in the index and when you go to the page you see a chapter on "Regression on Dummy Variables." The word "indicator" does not appear in the chapter of this book (Gujarati, widely used since it is less mathematical than others). The other two are Johnston and an older text by Draper and Smith. If you find "dummy" disparaging, what about "regression"?

                    Comment


                    • #11
                      Naturally I agree; "dummy variable" is a (very) widely used term; I never said otherwise.

                      I am sure you warn your students about samples of 3. But your claim was quite different: that economists don't know what "indicator variables" are. I am confident that economists -- and to follow your example, econometricians -- worthy of the name and reputation know lots of mathematical terminology that they never use in their introductory texts or teaching.

                      "regression" is too well established for me to tilt at. Besides, I always enjoy telling the story of where it comes from.

                      Comment


                      • #12
                        To answer the original question, I think what is wanted is

                        Code:
                        reg y x
                        predict y1
                        reg y x pre
                        predict y2
                        reg y x post
                        predict y3
                        twoway (scatter y1 x, sort) (scatter y2 x, sort) (scatter y3 x, sort)
                        Where -pre- and -post- are the dummy indicators.

                        Comment


                        • #13
                          I do imagine that's useful to somebody but it doesn't correspond to the OP's last post, which signalled a prior multiple regression.

                          Comment


                          • #14
                            Originally posted by cjevansaicp View Post
                            Actually I've created a dummy "pre" with the value of zero and a dummy "post" with the value of one. So my regressions are

                            regress y x1 x2 x3 x4
                            regress y x1 x2 x3 x4 pre
                            regress y x1 x2 x3 x4 post

                            So, if I understand your response, my code should read:

                            twoway (lfit y x) (lfit y x if pre) (lfit y x if post)?
                            Well no, now you have multiple x variables and the twoway command only has one.
                            You need to describe your analysis in more detail. If we know what question you are trying to address we may be able to provide more helpful responses.

                            Going back to the first example with only one x.
                            You are fitting a line \(y = mx + c\) where m is the slope of the line and c is the intercept, where the line crosses the y-axis.

                            To identify the two time periods only one dummy variable is needed, not two.

                            Code:
                            gen byte post = year >=1992
                            regress y x
                            regress y x post
                            regress y x##post
                            The first regression ignores the time period.
                            The second regression allows the intercept to be different between time periods, but the slope is the same.
                            The third regression allows both the slope and the intercept to differ between the time periods.

                            Which one do you want?

                            Comment

                            Working...
                            X