Scatter plots

cjevansaicp

Join Date: Jun 2014

Posts: 3
#1

Scatter plots

07 Jun 2014, 16:59

First of all, I'm just learning STATA. So if my question is too basic, forgive me. I am running a regression of an equation that uses a dummy variable for pre-1992 and post-1992 data. Therefore, I will be running a regression for the equation without the dummy, with the dummy covering the pre-1992 data, and one with the dummy covering post-1992 data. I would like to create a scatter plot showing all three regression lines. How can I do this?
Tags: None
Kieran McCaul

Join Date: Apr 2014

Posts: 60
#2

07 Jun 2014, 17:35

You need to have a look at lfit, not scatter.

If your regressions are something like this:

Code:

regress y x regress y x if year<1992 regress y x if year>=1992

then this should do the trick:

Code:

twoway (lfit y x) (lfit y x if year<1992) (lfit y x if year>=1992)
Comment
cjevansaicp

Join Date: Jun 2014

Posts: 3
#3

07 Jun 2014, 23:15

Actually I've created a dummy "pre" with the value of zero and a dummy "post" with the value of one. So my regressions are

regress y x1 x2 x3 x4
regress y x1 x2 x3 x4 pre
regress y x1 x2 x3 x4 post

So, if I understand your response, my code should read:

twoway (lfit y x) (lfit y x if pre) (lfit y x if post)?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35670
#4

08 Jun 2014, 01:47

That's legal, but note that there is no connection between the regress commands and the twoway command.

The first does various (multiple) regressions; the second shows the results of various regressions with one predictor.

If that's what you want, OK. If it's not, we need a better explanation of what you want to do, but check out margins and marginsplot, possibly.

(It's my role today to remind you that "Stata" is the way to write it. See the FAQ Advice, all the way to the end, please.)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35670
#5

08 Jun 2014, 04:24

... except that if all your values of pre are 0, you will see nothing for the second plot!

i.e. on second thoughts you need just one indicator (what you call dummy).
Comment
cjevansaicp

Join Date: Jun 2014

Posts: 3
#6

08 Jun 2014, 23:36

Well, in certain circles in the States, they are called indicators. But, in most of the textbooks, they're called dummies.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35670
#7

09 Jun 2014, 01:37

I did not explain my wording. I've heard of too many occasions when the term "dummy variables" has been wildly misread as offensive or disparaging, which is good enough reason to me to prefer the term "indicator variable".

It's hard to know what's majority usage across statistical science from sampling several texts in just one application area, which is what most people do. But here majority usage is immaterial to my preference.

Another area for small debates is what you call dependent and independent variables, that or something else. It wouldn't surprise me if dependent and independent were still the most common terms, but that doesn't stop them being lousy choices.
Comment
Euslaner

Join Date: Apr 2014

Posts: 186
#8

09 Jun 2014, 14:25

Nick, you might have heard this somewhere, but I don't know of any political scientist or economist who would know what an "indicator" variable is. The standard term in these disciplines is dummy. A Google search on "econometrics dummy variables" leads to lots of links. A Google search on "econometrics indicator variables" leads to lots of links for econometrics and dummy variables. "Type findit dum" leads to a FAQ by Bill Gould on "How do I create dummy variables" (http://www.stata.com/support/faqs/da...mmy-variables/ ) and many other links.

Ric Uslaner

Last edited by Euslaner; 09 Jun 2014, 14:29.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35670
#9

09 Jun 2014, 14:41

I am (a) expressing personal preferences (b) offering a specific argument why "dummy variable" is a lousy term. On (a) anyone else can candidly disagree and express their own personal preferences. On (b) I do have horror stories of "dummy" being misunderstood.

I don't know many political scientists and I've never been one. As you are one, Ric, I bow to your impressions on what is common in your field. Perhaps other political scientists will tune in and comment.

But I know lots of economists and I think they are generally well educated mathematically and widely aware what an indicator variable is. That is consistent with "dummy" being the majority term.

I must work on Bill Gould and try to convince him of my position.

This is difficult territory. For example, I dislike words that mix Greek and Latin roots and tried to dissuade StataCorp from the invented word "transmorphic". I failed. At the same time, usages can become entrenched to the extent that protest is silly and futile. On "television", it's too late.
Comment
Euslaner

Join Date: Apr 2014

Posts: 186
#10

09 Jun 2014, 15:38

I just checked three prominent econometric texts, Nick. Two had no mention of indicator variables. The third had an entry in the index and when you go to the page you see a chapter on "Regression on Dummy Variables." The word "indicator" does not appear in the chapter of this book (Gujarati, widely used since it is less mathematical than others). The other two are Johnston and an older text by Draper and Smith. If you find "dummy" disparaging, what about "regression"?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35670
#11

09 Jun 2014, 15:50

Naturally I agree; "dummy variable" is a (very) widely used term; I never said otherwise.

I am sure you warn your students about samples of 3. But your claim was quite different: that economists don't know what "indicator variables" are. I am confident that economists -- and to follow your example, econometricians -- worthy of the name and reputation know lots of mathematical terminology that they never use in their introductory texts or teaching.

"regression" is too well established for me to tilt at. Besides, I always enjoy telling the story of where it comes from.
Comment
Jeph Herrin

Join Date: Apr 2014

Posts: 335
#12

10 Jun 2014, 11:47

To answer the original question, I think what is wanted is

Code:

reg y x predict y1 reg y x pre predict y2 reg y x post predict y3 twoway (scatter y1 x, sort) (scatter y2 x, sort) (scatter y3 x, sort)

Where -pre- and -post- are the dummy indicators.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35670
#13

10 Jun 2014, 11:59

I do imagine that's useful to somebody but it doesn't correspond to the OP's last post, which signalled a prior multiple regression.
Comment
Kieran McCaul

Join Date: Apr 2014

Posts: 60
#14

10 Jun 2014, 16:00

Originally posted by cjevansaicp View Post

Actually I've created a dummy "pre" with the value of zero and a dummy "post" with the value of one. So my regressions are

regress y x1 x2 x3 x4
regress y x1 x2 x3 x4 pre
regress y x1 x2 x3 x4 post

So, if I understand your response, my code should read:

twoway (lfit y x) (lfit y x if pre) (lfit y x if post)?

Well no, now you have multiple x variables and the twoway command only has one.
You need to describe your analysis in more detail. If we know what question you are trying to address we may be able to provide more helpful responses.

Going back to the first example with only one x.
You are fitting a line \(y = mx + c\) where m is the slope of the line and c is the intercept, where the line crosses the y-axis.

To identify the two time periods only one dummy variable is needed, not two.

Code:

gen byte post = year >=1992 regress y x regress y x post regress y x##post

The first regression ignores the time period.
The second regression allows the intercept to be different between time periods, but the slope is the same.
The third regression allows both the slope and the intercept to differ between the time periods.

Which one do you want?
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment