Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Village fixed effects entered as continuous due to otherwise uncomputable AMEs (collinear)

    Dear Statalist community,

    I am analyzing the demand determinants of 3 insurance products in a pooled Heckman 2-Step regression. My outcome variable is coded as: 0=no insurance, 1 = lower premium insurance, 2 = middle premium insurance and 3 = high premium insurance. As I am controlling for social spillover effects I want to add dummies for the different places of residence. However, as in some cases there are not many observations per place of residence, resulting in lacking variance. Consequently, its marginal effects cannot be computed due to collinearity issues. This is not much of a problem for me as I just want to control for these but do not need to interpret its marginal effects. I also have another variable in there (insurer's presence, 0/1) that does not vary per residency and thus also has uncomputable marginal effects but these are of interest to me.

    Instead of just ignoring the not estimable effects, I am wondering whether I can include my residency variable as a continuous one and leave the insurer's presence as a dummy. Of course, I can then not interpret the residency variable anymore but still control for it.

    The regression table for using the residency variable as a continuous one looks like this:

    Code:
     
    (1) (2)
    VARIABLES Insurance purchase Insurance product purchase
    Peer behavior product choice(1-3) 0.3743***
    (0.0895)
    Peer behavior insurance uptake (0/1) 0.0037***
    (0.0007)
    Village fixed effects (continuous!) 0.0070* -0.0107
    (0.0038) (0.0070)
    Income 0.2408 0.2728
    (0.2117) (0.4392)
    Presence insurer (0/1) -0.0578
    (0.0515)
    Year 1 -0.0334 0.0446
    (0.0545) (0.1029)
    Year 2 0.0189 0.0897
    (0.0439) (0.1030)
    Year 3 0.0069 0.0959
    (0.0485) (0.0908)
    Year 4 -0.0516 0.0871
    (0.0537) (0.1186)
    Gender -0.0831* -0.1605*
    (0.0435) (0.0905)
    Age 0.0000 -0.0001**
    (0.0000) (0.0000)
    Educ 0.0023 0.0037
    (0.0020) (0.0044)
    HH size 0.0085 0.0213**
    (0.0073) (0.0101)
    Risk aversion 0.0951* -0.0750
    (0.0502) (0.1114)
    (0.0156) (0.0359)
    imr1 0.5364*
    (0.2747)
    Observations 574 472
    R-squared 0.1927 0.1679
    Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1
    Or do you see any objections here?

    Thanks in advance!!!

  • #2
    Hi Kerstin,
    Your chance of getting answers will improve if you share more of your code. For example, I would want to see the first stage of the selection model and be convinced that it takes care of any endogeneity problems (which I would be very sceptical about, but who knows you may have some randomization or source of exogenous variation that helps). What you have done is add a factor variable enumerating geographical locations as if it has an incremental interpretation: increasing village by 1 increases the demand for insurance by .007. You say that this does not matter since you are not interested in its effect size. However, the problem is that mathematically this estimation is different from what a within-village estimator would produce. The reason we want to include group effects is to control for time invariant endogeneity and this is precisely what you are not doing with the continuous approach. And if endogeneity is left unaddressed, all coefficients of interest may be biased. Yes, I understand that in the absence of within village variation a proper FE estimator is infeasible, in which case I would add other variables that make sense in your setting and capture as much as possible of the variation by village, like proximity to coast, fertile land, mountains, rivers, local admin unit GDP, tax revenues, ethnic diversity – I am making these up as have no idea of your context. And even then you need to go the extra mile to eliminate possible alternative explanations for your findings that may be due to the remaining village effects in the error.
    Last edited by Maria Boutchkova; 10 Aug 2021, 05:33.

    Comment


    • #3
      Hey Maria,

      Thanks for your advice! The idea of using the village fixed effects in my setting can rather be interpreted as peer fixed effects. The idea of including them is to control for unobserved heterogeneity in one's peer group. However, due to high data sensitivity I do not have the village names but they are rather assigned a numerical value. Therefore, I cannot use an alternative village variable instead.

      Employing a Heckman 2-Step Approach does not allow for a proper FE estimator. Maybe having a look at my code brings more clarity and ideas on how to improve my work:

      Code:
      *1st stage Heckman model: pooled probit
      global x1 InsuranceUptake_Group Group_num Income i.PresenceInsurer i.year i.Gender Age Educ ///
                    HHsize i.RiskAversion
      *i.G_num changed to continuous format and inserted as a trend due to collinearity issues
          
      probit G_insurance $x1, vce(cluster HHID)
      margins, dydx(*) post          
      
      *calculate inverse mill's ratio (imr)
      qui probit G_insurance $x1, vce(cluster HHID)
      predict p1, xb
      replace p1=-p1
      generate phi = (1/sqrt(2*_pi))*exp(-(p1^2/2))
      generate capphi = normal(p1)
      generate imr1 = phi/(1-capphi)
      
      *2nd stage Heckman model: truncated and pooled OLS
      mean insurance
      global x2 ProductChoice_Group Group_num Income i.year i.Gender Age Educ HHsize i.RiskAversion
      *exclusion restriction: Presence insurer
      
      reg insurance $x2 imr1, vce(cluster HHID)
      
      *test exclusion restriction of presence of insurance company in 2nd stage
      reg insurance $x2 P_InsuranceC imr1, vce(cluster HHID)   
      test P_InsuranceC
      *The wald test shows that the null hypothesis that the coefficient of P_InsuranceC is equal to zero ...
      *... cannot be rejected on any arbitrary significance level. = valid exclusion restriction

      Comment


      • #4
        Hi Kerstin,
        thank you for sharing your code.
        Several observations:
        1. my comments about the way you have used the village continuous variable holds only for a numeric variable. Even when using a proper FE model we include the group identifier as numeric. Stata will give you an error if you attempted to include a string variable in a regression.
        2. I see that what you are doing is estimate the probability to take insurance based on observables and then use that predicted probability in the second stage - this is precisely what I meant in saying I was sceptical. There is always unobserved heterogeneity that is left in the error and Heckman is not a solution. However, as it might be established practice in some disciplines, it is useful for new work to show results with the same methodology so that the reader can compare to existing results.
        3. You say that "Employing a Heckman 2-Step Approach does not allow for a proper FE estimator" - I am not sure this is case. See, for example Fernández-Vala and Vella (2011), here:
        https://doi.org/10.1016/j.jeconom.2011.03.002
        4. I stick with the suggestions I gave you in #2 about how to attempt to control for the village endogeneity using other variables.

        Comment

        Working...
        X