Dear all,
I'm using the Oaxaca decomposition command, in Stata 18, with svy subpopulation option.
I noticed that some observations are being excluded from the models within both Group 1 and Group 2. Additionally, I observed that in the decomposition model, the number of observations is the total sample size rather than the intended subpopulation size.
My subpopulation's sample size is 14,367, with Group 1 (lower education) comprising 2,192 observations and Group 2 constituting 12,175 observations.
When I run the decomposition without the svy option I get the correct subpopulation sample size in all the models.
I would greatly appreciate your guidance and advice regarding this issue.
Here are the outputs for the number of obsevations in the regression analyses for each group and the outputs for the number of observations in the oaxaca decomposition.
Thanks in advance for your help.
Total sample
Group 1
Group 2
Here is the code I used for the decomposition
Here is the output for the number of observations generated by the decomposition
I'm using the Oaxaca decomposition command, in Stata 18, with svy subpopulation option.
I noticed that some observations are being excluded from the models within both Group 1 and Group 2. Additionally, I observed that in the decomposition model, the number of observations is the total sample size rather than the intended subpopulation size.
My subpopulation's sample size is 14,367, with Group 1 (lower education) comprising 2,192 observations and Group 2 constituting 12,175 observations.
When I run the decomposition without the svy option I get the correct subpopulation sample size in all the models.
I would greatly appreciate your guidance and advice regarding this issue.
Here are the outputs for the number of obsevations in the regression analyses for each group and the outputs for the number of observations in the oaxaca decomposition.
Thanks in advance for your help.
Total sample
Code:
svy, subpop(if subpop==1): logistic self age sex income badl visit eat prot_d2 dent
Code:
Survey: Logistic regression
Number of strata = 574 Number of obs = 90,846
Number of PSUs = 8,027 Population size = 168,426,190
Subpop. no. obs = 14,367
Subpop. size = 21,722,187.6
Design df = 7,453
F(8, 7446) = 75.68
Prob > F = 0.0000
Code:
svy, subpop(if subpop==1 & ses==0): logistic self age sex income badl visit eat prot_d2 dent
Code:
Number of strata = 457 Number of obs = 80,899
Number of PSUs = 7,085 Population size = 135,901,805
Subpop. no. obs = 2,192
Subpop. size = 2,505,225.62
Design df = 6,628
F(8, 6621) = 22.36
Prob > F = 0.0000
Code:
svy, subpop(if subpop==1 & ses==1): logistic self age sex income badl visit eat prot_d2 dent
Code:
Number of strata = 573 Number of obs = 90,789
Number of PSUs = 8,022 Population size = 168,374,254
Subpop. no. obs = 12,175
Subpop. size = 19,216,962
Design df = 7,449
F(8, 7442) = 55.80
Prob > F = 0.0000
Code:
oaxaca self age sex income badl visit eat prot_d2 dent, /// by(ses) logit weight(0) svy(,subpop(subpop)) noisily cformat(%4.3f)
Here is the output for the number of observations generated by the decomposition
Code:
Model for group 1
(running logit on estimation sample)
Survey: Logistic regression
Number of strata = 456 Number of obs = 80,842
Number of PSUs = 7,080 Population size = 135,849,869
Subpop. no. obs = 2,183
Subpop. size = 2,497,605.89
Design df = 6,624
F(8, 6617) = 22.28
Prob > F = 0.0000
Note: 117 strata omitted because they contain no subpopulation members.
Model for group 2
(running logit on estimation sample)
Survey: Logistic regression
Number of strata = 456 Number of obs = 80,842
Number of PSUs = 7,080 Population size = 135,849,869
Subpop. no. obs = 10,365
Subpop. size = 14,703,327
Design df = 6,624
F(8, 6617) = 49.14
Prob > F = 0.0000
Blinder-Oaxaca decomposition
Number of strata = 456 Number of obs = 80,842
Number of PSUs = 7,080 Population size = 135,849,869
Design df = 6,624
Model = logit
Group 1: ses = 0 N of obs 1 = 7,353
Group 2: ses = 1 N of obs 2 = 73,489
explained: (X1 - X2) * b2
unexplained: X1 * (b1 - b2)

Comment