Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inquire about .regress y if x == 0

    I saw a code which is kind of like
    Code:
    regress y if x == 0
    Both x and y are dummy variables. What is this regression? This is my 1st time to see a regression (reg y x) without x.


    Many thanks in advance!

  • #2
    It looks as if whoever wrote it wants something like the mean, standard error of the mean and confidence interval for y when x is zero. Unless the author wants something else that's specifically returned by -regress- (like test if the mean is different from zero or the variance or SD), it would seem more straightforward at least to me to have used -ci means-.
    Code:
    sysuse auto
    regress mpg if !foreign
    ci means mpg if !foreign

    Comment


    • #3
      Yao:
      as an aside to Joseph's helpful insight, you can easily find yourself that:
      Code:
      sysuse auto.dta
      . regress mpg if !foreign
      
            Source |       SS           df       MS      Number of obs   =        52
      -------------+----------------------------------   F(0, 51)        =      0.00
             Model |           0         0           .   Prob > F        =         .
          Residual |  1147.44231        51  22.4988688   R-squared       =    0.0000
      -------------+----------------------------------   Adj R-squared   =    0.0000
             Total |  1147.44231        51  22.4988688   Root MSE        =    4.7433
      
      ------------------------------------------------------------------------------
               mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
             _cons |   19.82692    .657777    30.14   0.000     18.50638    21.14747
      ------------------------------------------------------------------------------
      
      *obtains the very same results of:*
      
      . mean mpg if !foreign
      
      Mean estimation                   Number of obs   =         52
      
      --------------------------------------------------------------
                   |       Mean   Std. Err.     [95% Conf. Interval]
      -------------+------------------------------------------------
               mpg |   19.82692    .657777      18.50638    21.14747
      --------------------------------------------------------------
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Thank you above all. Now I am clear. But let me ask this question in another way, given both y and x are dummy vars, what's the difference between

        Code:
        regress y if x == 0
        Code:
        regress y x

        Comment


        • #5
          First, if yvar is binary, - logistic - or - logit - should preferrably be used instead of - regress - command. I'm thinking about modeling, I mean, adding new xvars.

          Second, the "if" condition will select the zero category of xvar. For example, being it "diabetics", we'd have selected those who aren't diabetics.
          Best regards,

          Marcos

          Comment


          • #6
            Yao:
            Marcos already replied helpfully to your last query.
            Marcos is also right in underlying that, provided the regressand (y) is a two-level categorical variable, there's no room for -regress-.
            Possibly, who created the example was trying to investigate something that has basically no effect in practical statistics.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Let me add the background of these two codes. The dependent variable y here is whether you receive the offer, i.e., take-up rate. The x is the treatment indicator.

              To form Table 1 of this paper (https://academic.oup.com/qje/article/133/3/1561/4768294), the author's do file include the two codes above.

              For example, I want to know the mean value of y given x = 0. Then I can do
              Code:
              summarize y if x == 0
              (this is I know),or
              Code:
              regress y if x == 0
              (this is what I don't know) or
              Code:
              regress y x
              In fact, I guess if y and x are both dummy, then doing regression in this case is similar to do tabulate. Am I right?

              Comment


              • #8
                Yao:
                thanks for providing further clarifications.
                Taking a glance to the paper, it seems that authors expressed the y in percentage and then (legally) go -regress- (because y is actually not a dummy).
                As far as the second part of your question is concerned:
                - both -summarize- and -regress- (and -mean-) conditional on -if- qualifier clause, will give you back the same sample estimate of the mean (provided that a zero value for -x- does exist).
                Conversely:
                Code:
                regress y x
                will include all the values of the x predictor.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  But I am still confused about the function of regression without x.

                  Code:
                  reg y if x == 0
                  When I analyze this code, I still feel very confused. Is the outcome above the same as ".reg y x" but then you just drop the constant row? I checked them and found that the coefficient is the same but the SE is different.

                  When I see the regression word, then I image reg y x immediately. This is my 1st time to see reg y only (without x). So I'm confused.

                  Comment


                  • #10
                    Yao:
                    I think we should consider two different scenarios, then.
                    1)
                    Code:
                    . sysuse auto
                    (1978 Automobile Data)
                    
                    . regress price if foreign==0
                    
                          Source |       SS           df       MS      Number of obs   =        52
                    -------------+----------------------------------   F(0, 51)        =      0.00
                           Model |           0         0           .   Prob > F        =         .
                        Residual |   489194801        51  9592054.92   R-squared       =    0.0000
                    -------------+----------------------------------   Adj R-squared   =    0.0000
                           Total |   489194801        51  9592054.92   Root MSE        =    3097.1
                    
                    ------------------------------------------------------------------------------
                           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                           _cons |   6072.423   429.4911    14.14   0.000     5210.184    6934.662
                    ------------------------------------------------------------------------------
                    
                    .
                    2)
                    Code:
                    . sysuse auto
                    (1978 Automobile Data)
                    
                    . regress price
                    
                          Source |       SS           df       MS      Number of obs   =        74
                    -------------+----------------------------------   F(0, 73)        =      0.00
                           Model |           0         0           .   Prob > F        =         .
                        Residual |   635065396        73  8699525.97   R-squared       =    0.0000
                    -------------+----------------------------------   Adj R-squared   =    0.0000
                           Total |   635065396        73  8699525.97   Root MSE        =    2949.5
                    
                    ------------------------------------------------------------------------------
                           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                           _cons |   6165.257   342.8719    17.98   0.000     5481.914      6848.6
                    ------------------------------------------------------------------------------
                    
                    . mean price
                    
                    Mean estimation                   Number of obs   =         74
                    
                    --------------------------------------------------------------
                                 |       Mean   Std. Err.     [95% Conf. Interval]
                    -------------+------------------------------------------------
                           price |   6165.257   342.8719      5481.914      6848.6
                    --------------------------------------------------------------
                    While in the 1st scenario I condition the regression on the categorical predictor level=0 (which does not mean that x=0), in the 2nd scenario I use -regress- as another way to estimate the mean of the dataset sample.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Carlo:

                      You prove that the 2 codes are different if y is not a dummy variable. Thanks! But also see this:
                      Code:
                      sysuse auto
                       generate fprice=1 if price > 6000
                      replace fprice = 0 if missing(fprice)
                      regress fprice if foreign == 0
                      regress fprice foreign
                      Are they the same?

                      Comment


                      • #12
                        Code:
                        reg y x
                        This is the familiar model of y = a + bx + e where a is a constant and e is the error term.
                        Code:
                        reg y
                        Now the model is y = a + e. You're only estimating a constant here, which is the mean of y.
                        Code:
                        reg y if x ==0
                        This is the mean of y for all observations where x is 0.

                        If we take the cars example:
                        Code:
                        . sysuse auto, clear
                        (1978 Automobile Data)
                        
                        . 
                        . * model: y = a + bx + e
                        . * a = mean price for domestic cars
                        . * a + b = mean price for foreign cars
                        . reg price foreign
                        
                              Source |       SS           df       MS      Number of obs   =        74
                        -------------+----------------------------------   F(1, 72)        =      0.17
                               Model |  1507382.66         1  1507382.66   Prob > F        =    0.6802
                            Residual |   633558013        72  8799416.85   R-squared       =    0.0024
                        -------------+----------------------------------   Adj R-squared   =   -0.0115
                               Total |   635065396        73  8699525.97   Root MSE        =    2966.4
                        
                        ------------------------------------------------------------------------------
                               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                             foreign |   312.2587   754.4488     0.41   0.680    -1191.708    1816.225
                               _cons |   6072.423    411.363    14.76   0.000     5252.386     6892.46
                        ------------------------------------------------------------------------------
                        
                        . 
                        . * model: y = a + e
                        . * a = mean price for domestic cars
                        . reg price if foreign == 0
                        
                              Source |       SS           df       MS      Number of obs   =        52
                        -------------+----------------------------------   F(0, 51)        =      0.00
                               Model |           0         0           .   Prob > F        =         .
                            Residual |   489194801        51  9592054.92   R-squared       =    0.0000
                        -------------+----------------------------------   Adj R-squared   =    0.0000
                               Total |   489194801        51  9592054.92   Root MSE        =    3097.1
                        
                        ------------------------------------------------------------------------------
                               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                               _cons |   6072.423   429.4911    14.14   0.000     5210.184    6934.662
                        ------------------------------------------------------------------------------
                        The coefficients you get from reg price foreign are the same as reg price if foreign == 0 and reg price if foreign == 1. The standard errors are different because they are calculated over the whole sample in the first case and over sub-samples in the latter.

                        Hope this helps.

                        Comment


                        • #13
                          Thank you so much!

                          Comment

                          Working...
                          X