Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dummy Variable

    Hi everyone,
    we want to run a regression analysis with a dummy variable as a control variable.
    We imported our data from an excel sheet and our variable was already coded into 1 and 0 in excel. We are not sure whether the results of our regression analysis are correct.
    Does STATA already recognize this as a dummy variable? Do we have to mark it explicitly with commands in STATA?

    Thank you :-)

  • #2
    Master Student (as per FAQ, please note the strong preference on this forum for real given and family names. Obviously, nobody is forces to comply with that standard, but complying with it seems to increase the poster's chances to get helpful replies. If you decide to follow that road only in part, please avoid nicknames such as -Thomas Bayes-, -Roger Federer-, -Dare Devil-. Thanks).
    See -help fvvarlist-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Please read the FAQ Advice carefully, especially

      http://www.statalist.org/forums/help#realnames why we ask for real names

      http://www.statalist.org/forums/help#stata showing examples, and not expecting that we can see precisely what you've done

      http://www.statalist.org/forums/help#spelling Stata, not STATA (and for that matter Excel, not excel; it's in your best interests to know how to spell software names correctly in presentations, papers and books you write)

      The question is one you can answer yourself. Let's see if Stata cares if you declare a dummy (indicator) variable as such.

      Code:
      . sysuse auto, clear 
      (1978 Automobile Data)
      
      . levelsof foreign 
      0 1
      
      . regress mpg foreign
      
            Source |       SS           df       MS      Number of obs   =        74
      -------------+----------------------------------   F(1, 72)        =     13.18
             Model |  378.153515         1  378.153515   Prob > F        =    0.0005
          Residual |  2065.30594        72  28.6848048   R-squared       =    0.1548
      -------------+----------------------------------   Adj R-squared   =    0.1430
             Total |  2443.45946        73  33.4720474   Root MSE        =    5.3558
      
      ------------------------------------------------------------------------------
               mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
           foreign |   4.945804   1.362162     3.63   0.001     2.230384    7.661225
             _cons |   19.82692   .7427186    26.70   0.000     18.34634    21.30751
      ------------------------------------------------------------------------------
      
      . regress mpg i.foreign 
      
            Source |       SS           df       MS      Number of obs   =        74
      -------------+----------------------------------   F(1, 72)        =     13.18
             Model |  378.153515         1  378.153515   Prob > F        =    0.0005
          Residual |  2065.30594        72  28.6848048   R-squared       =    0.1548
      -------------+----------------------------------   Adj R-squared   =    0.1430
             Total |  2443.45946        73  33.4720474   Root MSE        =    5.3558
      
      ------------------------------------------------------------------------------
               mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
           foreign |
          Foreign  |   4.945804   1.362162     3.63   0.001     2.230384    7.661225
             _cons |   19.82692   .7427186    26.70   0.000     18.34634    21.30751
      ------------------------------------------------------------------------------
      In the first instance, the results are identical, just presented slightly differently. After all, the algebra is identical.

      However, there are other contexts in which it's a good idea to be explicit on what is an indicator. See

      Code:
      help fvvarlist

      Comment


      • #4
        I can think of one situation that arises fairly often where the use of foreign vs i.foreign makes a difference. If you ask -margins- to calculate marginal effects of this variable, it will use a different definition for i.foreign than it uses for foreign. In a linear model the end result will be the same (at least to within rounding error) but in a non-linear model the results will usually be different.

        Comment


        • #5
          Building on what Clyde says, Nick made an interesting point about 0/1 variables a while back. If Stata encounters a 0/1 variable, it doesn't know if (a) it really can have only those two values, or (b) it is continuous and can have more values than that but 0/1 were the only ones observed in the sample. So, by default, margins treats it as continuous; unless you use the i. notation, and then it knows to treat it as categorical.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment

          Working...
          X