Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deal with interaction Term and Influential Observation Removal in Geographical Regression

    Dear Statalist,

    I am working on a university project involving a geographical regression to determine if municipalities historically under central control perform better on various outcomes. To avoid bias from municipalities neither part of the former reign nor near it, I exclude influential observations and restrict the sample to a specific distance from the reign's center. Here is my initial code:
    Code:
    program dbetareg
        * Define the arguments
        args map outcome var_of_interest distance control1 control2 control3 control4 control5
        
        * Regress the desired outcome on the interaction and other controls, if within the desired distance from the center of government
        qui reg `outcome' `var_of_interest' `control1' `control2' `control3' `control4' `control5' if dist < `distance' $condition
        
        * Compute the cutoff based on the DFBETA
        qui dfbeta(`var_of_interest')
        qui replace _dfbeta_1=abs(_dfbeta_1)
        qui gsort -_dfbeta_1
        list _dfbeta_1 `map' `var_of_interest' `outcome' in 1/10
        
        local cutoff = 2/sqrt(e(N))
        di "Suggested cutoff value = `cutoff'"
        
        * Do the final robust regression
        eststo: reg `outcome' `var_of_interest' `control1' `control2' `control3' `control4' `control5' if dist < `distance' & _dfbeta_1 < `cutoff' $condition, r
        qui drop _dfbeta*
    end
    .

    After my main specification, I want to understand the conditions under which the main effect works by adding interaction terms. Here is my updated code with the interaction term:
    Code:
    program dbetareg_int
        * Define the arguments
        args map outcome var_of_interest var1 interaction distance control1 control2 control3 control4 control5
        
        * Regress the desired outcome on the interaction and other controls, if within the desired distance from the center of government
        qui reg `outcome' `var_of_interest' `var1' `interaction' `control1' `control2' `control3' `control4' `control5' if dist < `distance' $condition
        
        * Compute the cutoff based on the DFBETA
        qui dfbeta(`interaction')
        qui replace _dfbeta_1=abs(_dfbeta_1)
        qui gsort -_dfbeta_1
        list _dfbeta_1 `map' `interaction' `outcome' in 1/10
        
        local cutoff = 2/sqrt(e(N))
        di "Suggested cutoff value = `cutoff'"
        
        * Do the final robust regression
        eststo: reg `outcome' `var_of_interest' `var1' `interaction' `control1' `control2' `control3' `control4' `control5' if dist < `distance' & _dfbeta_1 < `cutoff' $condition, r
        qui drop _dfbeta*
    end
    .

    My questions are:
    • Is it meaningful to judge the cutoff based on the interaction term? Or I have to remain on the main variable of interest?
    • Are there alternative methods in Stata to implement this type of regression analysis?
    Thank you in advance for your assistance!
Working...
X