Hi there, I am having some trouble understanding the weighted average command and how to use it when producing counterfactuals.
Here is an example of my problem!
*** Trying to figure out what is going on with these weighted averages
*** Looking at car data, and the number of sales and defects per model
*** Want to answer the question: Did the share of cars with defects increase from 2000 to 2005?
*** And if so, is that due to a composition change towards the production of more cars that were already more defect prone in 2000,
*** Or is it because the car models themselves became more defect prone
** So, I expect defect_percentage and new_defect_percentage to be equal
** I also expect counterfactual_defect_percentage and new_countfact_defect_percentage to be equal
***Unfortunately, the two counterfactual variables I create are NOT equal. And I don't understand why
Here is an example of my problem!
*** Trying to figure out what is going on with these weighted averages
*** Looking at car data, and the number of sales and defects per model
*** Want to answer the question: Did the share of cars with defects increase from 2000 to 2005?
*** And if so, is that due to a composition change towards the production of more cars that were already more defect prone in 2000,
*** Or is it because the car models themselves became more defect prone
** So, I expect defect_percentage and new_defect_percentage to be equal
** I also expect counterfactual_defect_percentage and new_countfact_defect_percentage to be equal
***Unfortunately, the two counterfactual variables I create are NOT equal. And I don't understand why
Code:
clear all sysuse auto drop price-foreign expand 2, gen(dupindicator) set seed 12345 sort make gen year = 2000 if dupindicator == 0 replace year = 2005 if dupindicator == 1 gen sales = runiform() replace sales = sales * 1000 replace sales = round(sales, 1) set seed 54321 gen defects = runiform() replace defects = defects * 100 replace defects = round(defects,1) ** Defect percentage by make-year gen defect_percentage = defects / sales ** Total number of sales per year bysort year: egen total_year_sales = sum(sales) ** make share of sales in a year gen make_share = sales/total_year_sales ** What was the total defect share in 2000? What was it in 2005? bysort year: egen defect_percentage_year = wtmean(defect_percentage), weight(make_share) sort make year gen make_share_2000 = make_share replace make_share_2000 = . if year == 2005 bysort make: carryforward make_share_2000, replace bysort year: egen counterfactual_defect_percentage = wtmean(defect_percentage), weight(make_share_2000) sort make year bysort year : egen defect_total_year = sum(defects) gen new_defect_percentage = defect_total_year / total_year_sales gen sales_2019 = total_year_sales replace sales_2019 = . if year == 2020 sort make year bysort make: carryforward sales_2019, replace gen new_countfact_defect_percentage = defect_total_year / sales_2019 sort make year

Comment