Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A simple regression with conditions


    In a regression of y on a b c d
    If I want to run regressions that satisfy the following conditions each time:

    (1) both current values of b and lagged values of b are not zero; and also current and lagged values of c are not zero; I would use:

    reg y a b c d if b!=0 & l.b!=0 & c!=0 & l.c!=0,r

    But;
    (2) if both current and lagged values of b are not zero and "either" current or lagged values of c are not zero;

    reg y a b c d if b!=0 & l.b!=0 & c!=0 | l.c !=0

    I think the problem here will be that Stata might run the regression either b and lagged b and c are not zero OR lagged c is not zero, and this is not my aim. My aim is run the regression if b and lagged b are not zero and either c or lagged c are non zero.



  • #2
    Originally posted by Mike Kraft View Post
    In a regression of y on a b c d
    If I want to run regressions that satisfy the following conditions each time:

    (1) both current values of b and lagged values of b are not zero; and also current and lagged values of c are not zero; I would use:

    reg y a b c d if b!=0 & l.b!=0 & c!=0 & l.c!=0,r

    But;
    (2) if both current and lagged values of b are not zero and "either" current or lagged values of c are not zero;

    reg y a b c d if b!=0 & l.b!=0 & c!=0 | l.c !=0

    I think the problem here will be that Stata might run the regression either b and lagged b and c are not zero OR lagged c is not zero, and this is not my aim. My aim is run the regression if b and lagged b are not zero and either c or lagged c are non zero.


    You can use parentheses to group logical expressions. I believe what you are looking for is

    Code:
    reg y a b c d if (b!=0 & l.b!=0) & (c!=0 | l.c !=0)
    The first set of parentheses is not strictly necessary, but it helps make your intent more clear.

    Comment


    • #3
      Alan;
      I tried your code and gave me some results different from:
      preserve
      keep if b!=0 & l.b!=0
      reg y a b c d if c!=0 | l.c !=0
      restore


      I understand that they should be producing identical results, but it turns out not to be the case. Do you know why and which is the correct one ?

      Comment


      • #4
        Because you are working with lags, your keep followed by regress (or any other command) with an if condition is not the same as the full condition all being on the regress command line.

        Consider the following example, using list instead of regress. First, we try it with the condition:

        Code:
        . list
        
             +------------+
             | id   b   c |
             |------------|
          1. |  1   1   1 |
          2. |  2   0   1 |
          3. |  3   1   0 |
          4. |  4   1   0 |
          5. |  5   1   1 |
             |------------|
          6. |  6   1   1 |
          7. |  7   0   0 |
          8. |  8   1   0 |
          9. |  9   1   1 |
             +------------+
        
        . list if (b!=0 & l.b!=0) & (c!=0 | l.c!=0)
        
             +------------+
             | id   b   c |
             |------------|
          1. |  1   1   1 |
          5. |  5   1   1 |
          6. |  6   1   1 |
          9. |  9   1   1 |
             +------------+
        Note that id 4 did not satisfy the condition.

        Now, let's take the same data and perform the keep on it with the first part of the condition:

        Code:
        . preserve
        
        . keep if b!=0 & l.b!=0
        (4 observations deleted)
        
        . list
        
             +------------+
             | id   b   c |
             |------------|
          1. |  1   1   1 |
          2. |  4   1   0 |
          3. |  5   1   1 |
          4. |  6   1   1 |
          5. |  9   1   1 |
             +------------+
        Now, because we have eliminated the observations for id==2 and id==3, the lagged value of c for the observation of id==4 will be different. l.c for the observation with id==4 now looks back to the observation where id==1, and c in that observation is 1, not 0 like it was in the observation where id==3 which l.c would have picked up for the id==4 observation had we not done a keep.

        Now, when we add the second half of the condition with list, we see a different set of observations than we did originally:

        Code:
        . list if c!=0 | l.c!=0
        
             +------------+
             | id   b   c |
             |------------|
          1. |  1   1   1 |
          2. |  4   1   0 |
          3. |  5   1   1 |
          4. |  6   1   1 |
          5. |  9   1   1 |
             +------------+
        When working with lags, don't eliminate observations based on a partial condition, or the lagged values can change on you.

        Comment

        Working...
        X