Dummy Variable

Master Student

Join Date: Sep 2017

Posts: 1
#1

Dummy Variable

15 Sep 2017, 00:52

Hi everyone,
we want to run a regression analysis with a dummy variable as a control variable.
We imported our data from an excel sheet and our variable was already coded into 1 and 0 in excel. We are not sure whether the results of our regression analysis are correct.
Does STATA already recognize this as a dummy variable? Do we have to mark it explicitly with commands in STATA?

Thank you :-)
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

15 Sep 2017, 01:01

Master Student (as per FAQ, please note the strong preference on this forum for real given and family names. Obviously, nobody is forces to comply with that standard, but complying with it seems to increase the poster's chances to get helpful replies. If you decide to follow that road only in part, please avoid nicknames such as -Thomas Bayes-, -Roger Federer-, -Dare Devil-. Thanks).
See -help fvvarlist-.

Kind regards,
Carlo
(Stata 19.0)
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35696

15 Sep 2017, 01:10

Please read the FAQ Advice carefully, especially

http://www.statalist.org/forums/help#realnames why we ask for real names

http://www.statalist.org/forums/help#stata showing examples, and not expecting that we can see precisely what you've done

http://www.statalist.org/forums/help#spelling Stata, not STATA (and for that matter Excel, not excel; it's in your best interests to know how to spell software names correctly in presentations, papers and books you write)

The question is one you can answer yourself. Let's see if Stata cares if you declare a dummy (indicator) variable as such.

Code:

. sysuse auto, clear 
(1978 Automobile Data)

. levelsof foreign 
0 1

. regress mpg foreign

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =     13.18
       Model |  378.153515         1  378.153515   Prob > F        =    0.0005
    Residual |  2065.30594        72  28.6848048   R-squared       =    0.1548
-------------+----------------------------------   Adj R-squared   =    0.1430
       Total |  2443.45946        73  33.4720474   Root MSE        =    5.3558

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     foreign |   4.945804   1.362162     3.63   0.001     2.230384    7.661225
       _cons |   19.82692   .7427186    26.70   0.000     18.34634    21.30751
------------------------------------------------------------------------------

. regress mpg i.foreign 

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =     13.18
       Model |  378.153515         1  378.153515   Prob > F        =    0.0005
    Residual |  2065.30594        72  28.6848048   R-squared       =    0.1548
-------------+----------------------------------   Adj R-squared   =    0.1430
       Total |  2443.45946        73  33.4720474   Root MSE        =    5.3558

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     foreign |
    Foreign  |   4.945804   1.362162     3.63   0.001     2.230384    7.661225
       _cons |   19.82692   .7427186    26.70   0.000     18.34634    21.30751
------------------------------------------------------------------------------

In the first instance, the results are identical, just presented slightly differently. After all, the algebra is identical.

However, there are other contexts in which it's a good idea to be explicit on what is an indicator. See

Code:

help fvvarlist

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#4

15 Sep 2017, 08:10

I can think of one situation that arises fairly often where the use of foreign vs i.foreign makes a difference. If you ask -margins- to calculate marginal effects of this variable, it will use a different definition for i.foreign than it uses for foreign. In a linear model the end result will be the same (at least to within rounding error) but in a non-linear model the results will usually be different.
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#5

15 Sep 2017, 16:56

Building on what Clyde says, Nick made an interesting point about 0/1 variables a while back. If Stata encounters a 0/1 variable, it doesn't know if (a) it really can have only those two values, or (b) it is continuous and can have more values than that but 0/1 were the only ones observed in the sample. So, by default, margins treats it as continuous; unless you use the i. notation, and then it knows to treat it as categorical.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment

Announcement

Comment

Comment

Comment

Comment