Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference-in-difference and CEM with panel data


    Dear everybody

    I am investigating how employees are affected when they are outsourced from public employment to private employment. My dataset is in a long format with seven time periods (years). I cannot share my data as it is on a confidential server, but I have made an example dataset with 14 individuals (although my data has more than 300 treated individuals and more than 70.000 matched control individuals). Hopefully, the example data contains enough data points.

    In the example data set, my treatment group is outsourced in time 100. I have used a Coarsened Exact Matching procedure to identify a similar group in terms of job type but who are still public employees in time 100 (and in the prior time, 99). I use a variable with 1200 different job types, so the matches are pretty detailed.

    I have a few questions that need answering.

    First of all, my cem_weights are obtained in time 100 when the outsourcing occurred. As a consequence, the rest of the time periods in each panel does not contain any weights. I could then use the following command to fill out the remaining time periods:
    Code:
    bysort id (cem_weights): gen cem_weights_all = cem_weights[1]
    ​​​​However, I am not sure whether this is the proper procedure when conducting a panel data analysis with cem weights?

    Secondly, I want to model the effect of the treatment. I have no problem interpreting the following model:
    Code:
    xtreg salary i.treatment##i.time
    margins treatment, at(time=(97(1)103))
    marginsplot
    So hopefully, my setup is not entirely wrong. I want to add control variables like gender and educational background. Is there a need in this case for a first difference estimator (or fixed effects)? And if so, how would I go about it?

    In the case of first differences, I have tried the following code:
    Code:
    xtreg d.salary i.treatment##i.time [aw = cem_weights_all], nocons
    But I receive the error “weights not allowed”, and if I remove the weights, I get “option nocons not allowed”. In all honesty, I have just today figured out how to run first-differences with the “d.-operator”, so my specification might not be correct.

    I have also tried the areg command as it allows for weights, but the results are also puzzling as the treatment variable is omitted:
    Code:
    areg d.salary i.treatment##i.time [aw = cem_weights_all], a(id)
    So, my quesiton is, do I need the first difference estimator? And if so how do I obtain it?

    I hope somebody out there can help me out.


    The example data contains the following variables:
    Code:
    Id                      
    time
    treatment (0 control. 1 treatment)
    cem_weights
    cem_weights_all
    outsourcing_year (1 indicates when the outsourcing occurred)
    company_id
    sector (0 public. 1 private)
    job_code (used to match treatment and control)
    salary
    cem_strata
    cem_matched
    matched_all
    Attached Files

  • #2
    I have just stumbled upon the following:
    in a first differences regression the individual dummies will drop out because they do not change over time, hence the difference is zero for all the dummies and then your statistical software will omit them due to perfect collinearity
    (from: https://stats.stackexchange.com/ques...for-panel-data)

    So I guess, it is not possible to do first difference estimation in this case. Or have I misunderstood something crucial?

    When I add weights to the first model that I outlined:
    Code:
    xtreg salary i.treatment##i.time [aw = cem_weights_all]
    margins treatment, at(time=(97(1)103))
    marginsplot
    I get the "weights not allowed" error. And when I use the areg command on the same model, the treatment variable is omitted
    Code:
    areg salary i.treatment##i.time [aw = cem_weights_all], a(id)
    margins treatment, at(time=(97(1)103))
    marginsplot
    I could just run a regular reg, but I do not know if I am violating some panel data principles:
    Code:
    reg salary i.treatment##i.time [aw = cem_weights_all]
    margins treatment, at(time=(97(1)103))
    marginsplot
    I do have the time-component in my model, so maybe it is all right?

    Can anybody clarify?
    Last edited by Gustav Egede Hansen; 04 Jun 2021, 01:19.

    Comment

    Working...
    X