Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exogeneity test for nonlinear unbalanced panel with single dummy endogenous explanatory variable using CRE.

    Hi,

    I'm essentially following the idea of Dr. Jeff Wooldridge Jeff Wooldridge in trying to test for the assumption assumption of endogeneity of a binary endogenous explanatory variable in the context of a nonlinear poisson structural equation/response function.

    I want to:
    A) Test for endogeneity of binary y2 in an unbalanced panel format
    B) After testing, gain consistent estimates of my parameters of interest

    I have:
    1. An unbalanced panel
    2. A count response with overdispersion
    3. Potentially endogeneity of y2

    I want to use the correlated random effects approach with a control function to see if I can perhaps check for the endogeneity of y2. Following the material, I've created a selection indicator that removes the unbalanced panel for which all the covariates are not observed (which is, for my case, only the time period 1 observations for all of my IDs). It is "The strict exogeneity of selection assumption" (the entire paper in the first attachment) Following the remainder of the material, I've done the following two step approach:

    1. Pooled OLS on the binary endogenous regressor creating residuals from a LPM on all of the "instruments" in Z. That is, the control variables in the structural equation ("x"), the time dummies for the unbalanced panel (required for CRE approach), the instruments (time dummies plus instruments plus controls are all in "z"), and the time averages of all of those variables ("zbar"). Note, this procedure is also in Jeff Wooldridge book Econometric Analysis of Panel Data page 766.

    Code:
    reg campaign_factor_a $zlist *_mean , vce(cluster household_key)
    2. Compute the residuals, which I'll call lpuhat

    Code:
    predict lpuhat, residuals
    Then, I've performed two separate regressions for endogeneity, which I'll deem 3a and 3b (page 17 on the 2nd attachment)

    3a. Pooled Poisson QMLE of count response y1 on y2, z-bar, the controls "x", and an offset. I get a coefficient on lpuhat that is indeed statistically significant indicating endogeneity.

    Code:
    poisson visits_per_period campaign_factor_a $xlist  *_mean  c.lpuhat, offset(log_diff_days)  vce(cluster household_key)
    3b. Pooled Poisson QMLE of count response y1 on y2, y2-bar, z-bar, the controls "x", and an offset. I get a coefficient on lpuhat that is indeed statistically significant indicating endogeneity. I also get a statistically significant coefficent on y2-bar (at 10% significance level). This regression, 3b, states that this is a test only of idiosyncratic exogeneity since I controlled for unobserved heterogeneity due to inclusion of y2-bar.

    Code:
    poisson visits_per_period campaign_factor_a $xlist  *_mean  c.lpuhat, offset(log_diff_days)  vce(cluster household_key) //This includes y2_bar in *_mean
    Results
    Poisson regression Number of obs = 14,238
    Wald chi2(79) = .
    Prob > chi2 = .
    Log pseudolikelihood = -43841.69 Pseudo R2 = 0.4345

    (Std. Err. adjusted for 1,584 clusters in household_key)
    --------------------------------------------------------------------------------
    | Robust
    visits_per_p~d | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    campaign_fac~a | .0244323 .0344804 0.71 0.479 -.0431481 .0920127
    campaign_fa~_b | .1820243 .0277672 6.56 0.000 .1276016 .2364471
    campaign_fa~ab | .6791518 .0192806 35.22 0.000 .6413625 .7169411
    campaign_f~abc | .9160575 .0412163 22.23 0.000 .8352751 .99684
    campaign_f~_bc | .5711531 .0416605 13.71 0.000 .4894999 .6528062
    campaign_fa~_c | .0807524 .0357571 2.26 0.024 .0106698 .1508351
    campaign_fa~ac | .6084234 .0284141 21.41 0.000 .5527328 .664114
    items_per_pe~3 | .001461 .0000906 16.12 0.000 .0012834 .0016387
    _Iperiod_2 | .5242697 .056334 9.31 0.000 .413857 .6346823
    _Iperiod_3 | .3950308 .0483376 8.17 0.000 .3002909 .4897706
    _Iperiod_4 | .4836983 .0577899 8.37 0.000 .3704323 .5969644
    _Iperiod_5 | .4117123 .0494826 8.32 0.000 .3147283 .5086964
    _Iperiod_6 | .4684135 .0585501 8.00 0.000 .3536574 .5831697
    _Iperiod_7 | .3608489 .0512862 7.04 0.000 .2603298 .461368
    _Iperiod_8 | .4051761 .056305 7.20 0.000 .2948203 .5155319
    _Iperiod_9 | .3288929 .0513919 6.40 0.000 .2281666 .4296191
    _Iperiod_10 | .3747675 .0585755 6.40 0.000 .2599616 .4895734
    _Iperiod_11 | .3050915 .0544466 5.60 0.000 .1983781 .4118049
    _Iperiod_12 | .3467143 .0571859 6.06 0.000 .234632 .4587967
    _Iperiod_13 | .231065 .0557044 4.15 0.000 .1218863 .3402437
    _Iperiod_14 | .3232659 .0562936 5.74 0.000 .2129325 .4335992
    _Iperiod_15 | .2694259 .0579282 4.65 0.000 .1558886 .3829631
    _Iperiod_16 | .2889558 .0661618 4.37 0.000 .1592811 .4186306
    _Iperiod_17 | .306312 .0613791 4.99 0.000 .1860113 .4266128
    _Iperiod_18 | .3917844 .0668209 5.86 0.000 .2608178 .5227509
    _Iperiod_19 | .2412787 .0683353 3.53 0.000 .1073439 .3752135
    _Iperiod_20 | .162989 .0681987 2.39 0.017 .029322 .296656
    _Iperiod_21 | .2581352 .0783971 3.29 0.001 .1044797 .4117907
    _Iperiod_22 | .3792274 .084947 4.46 0.000 .2127343 .5457206
    _Iperiod_23 | .0598377 .0855905 0.70 0.484 -.1079167 .2275921
    _Iperiod_24 | .1010361 .1201353 0.84 0.400 -.1344248 .336497
    _Iperiod_25 | .3724466 .118256 3.15 0.002 .1406691 .6042241
    _Iperiod_26 | .1775822 .2022082 0.88 0.380 -.2187385 .573903
    _Iperiod_27 | .1494721 .096566 1.55 0.122 -.0397938 .338738
    _Iperiod_28 | -.1009292 .2272399 -0.44 0.657 -.5463112 .3444527
    _Iperiod_29 | .1885725 .1546619 1.22 0.223 -.1145592 .4917042
    _Iperiod_30 | -.54734 .4384815 -1.25 0.212 -1.406748 .3120679
    _Iperiod_31 | .1407992 .2185676 0.64 0.519 -.2875855 .5691838
    _Iperiod_32 | -10.6318 .9994915 -10.64 0.000 -12.59077 -8.672832
    _Iperiod_33 | .2351399 .0304283 7.73 0.000 .1755015 .2947782
    _Iperiod_34 | 0 (omitted)
    campai~_b_mean | -.0653849 .3781024 -0.17 0.863 -.8064519 .6756821
    campai~ab_mean | -.0158255 .2287326 -0.07 0.945 -.4641332 .4324822
    campa~abc_mean | .35993 .591749 0.61 0.543 -.7998766 1.519737
    campa~_bc_mean | -.637022 .3933618 -1.62 0.105 -1.407997 .1339529
    campai~_c_mean | -.8971231 .5502648 -1.63 0.103 -1.975622 .1813762
    campai~ac_mean | .3519183 .3269151 1.08 0.282 -.2888234 .9926601
    _Iperiod_2_m~n | -1.005768 4.595628 -0.22 0.827 -10.01303 8.001498
    _Iperiod_3_m~n | 0 (omitted)
    _Iperiod_4_m~n | -1.114617 2.295491 -0.49 0.627 -5.613696 3.384463
    _Iperiod_5_m~n | 2.20383 2.337211 0.94 0.346 -2.37702 6.784679
    _Iperiod_6_m~n | 1.387314 2.417485 0.57 0.566 -3.350869 6.125496
    _Iperiod_7_m~n | .2184244 2.517606 0.09 0.931 -4.715993 5.152842
    _Iperiod_8_m~n | -.3355481 2.619149 -0.13 0.898 -5.468986 4.79789
    _Iperiod_9_m~n | 2.600205 2.662685 0.98 0.329 -2.618562 7.818972
    _Iperiod_10_~n | -.8957391 2.335228 -0.38 0.701 -5.472701 3.681223
    _Iperiod_11_~n | 2.475095 2.389934 1.04 0.300 -2.20909 7.15928
    _Iperiod_12_~n | 2.622049 2.534325 1.03 0.301 -2.345138 7.589236
    _Iperiod_13_~n | .1924305 2.58857 0.07 0.941 -4.881073 5.265934
    _Iperiod_14_~n | 1.570859 2.45265 0.64 0.522 -3.236246 6.377964
    _Iperiod_15_~n | 1.055761 2.506522 0.42 0.674 -3.856931 5.968454
    _Iperiod_16_~n | -.3317405 2.653816 -0.13 0.901 -5.533125 4.869644
    _Iperiod_17_~n | 3.802867 2.589789 1.47 0.142 -1.273027 8.87876
    _Iperiod_18_~n | .6874144 2.660317 0.26 0.796 -4.526712 5.901541
    _Iperiod_19_~n | 1.154653 2.769009 0.42 0.677 -4.272504 6.58181
    _Iperiod_20_~n | 1.459548 2.736716 0.53 0.594 -3.904318 6.823414
    _Iperiod_21_~n | 4.222689 2.83617 1.49 0.137 -1.336101 9.78148
    _Iperiod_22_~n | -2.280563 3.843791 -0.59 0.553 -9.814254 5.253129
    _Iperiod_23_~n | 5.494265 4.079368 1.35 0.178 -2.50115 13.48968
    _Iperiod_24_~n | -4.046938 3.2166 -1.26 0.208 -10.35136 2.257483
    _Iperiod_25_~n | 4.73571 3.79902 1.25 0.213 -2.710233 12.18165
    _Iperiod_26_~n | 14.96615 5.598303 2.67 0.008 3.993677 25.93862
    _Iperiod_27_~n | -4.979484 5.133265 -0.97 0.332 -15.0405 5.08153
    _Iperiod_28_~n | .2224332 2.815017 0.08 0.937 -5.294899 5.739765
    _Iperiod_29_~n | 4.635166 3.196474 1.45 0.147 -1.629808 10.90014
    _Iperiod_30_~n | 13.139 5.703868 2.30 0.021 1.959628 24.31838
    _Iperiod_31_~n | -7.063863 10.29276 -0.69 0.493 -27.2373 13.10957
    _Iperiod_32_~n | 0 (omitted)
    _Iperiod_33_~n | 0 (omitted)
    pre_days_be~an | -.010314 .0016647 -6.20 0.000 -.0135767 -.0070512
    sales_per_pe~n | .0004635 .0001722 2.69 0.007 .000126 .0008009
    pre_items_mean | -.0005158 .0001057 -4.88 0.000 -.000723 -.0003087
    pre_sales_mean | -.0002386 .0000549 -4.34 0.000 -.0003462 -.0001309
    campaig~a_mean | -.0087831 .4472429 -0.02 0.984 -.8853631 .8677969
    items_per_pe~n | .0022165 .0006399 3.46 0.001 .0009624 .0034706
    baseline_spe~n | .0000851 .0000154 5.54 0.000 .000055 .0001152
    control_vr_m~n | .9682009 .1835686 5.27 0.000 .608413 1.327989
    lpuhat | .2433615 .0408389 5.96 0.000 .1633187 .3234044
    _cons | -.9555393 2.394563 -0.40 0.690 -5.648796 3.737718
    log_diff_days | 1 (offset)
    --------------------------------------------

    This may be a lot to ask, but I think this is correct, but I really need clarification on a few things.

    I.) The "strict exogeneity assumption of selection" (data point in a time period cannot be systematically related to the idiosyncratic errors). I'm performing this analysis only where (y,x) are observed for all my ids/panels. Is this assumption reasonable? Is there a way to explicitly test for the selection is ignorable?

    II.) Is the endogeneity resulting from 3a really a "problem". Can we really not disentangle the source of endogeneity with this approach?

    III.) What exactly are the "significant interpretations" of the means of all the variables in Z in my final regression? Aren't they just controls for the CRE approach? Can they be surpressed? Most are insignificant.

    IV.) Since y2 is endogenous binary variable, it is true that these parameters will not lead to consistent estimates since the "bad" assumption of a linear reduced form for the errors (and further assumptions). However, can I gain consistent estimates using a GMM approach?

    Thank you all so much. This is my first post; I'd appreciate any help.
    -AJ
    Attached Files
    Last edited by AJ Williamson; 28 Jan 2019, 14:16.

  • #2
    any thoughts?

    Comment

    Working...
    X