Hi all,
I am analyzing a very large database which is the sample size of roughly 20% of all the hospitalizations in each year in the US. This database has a variable —DISCWT— which is used for weighting and producing the national estimates (after applying it should roughly make the population and descriptive data 5 times greater. for example if I have 8 million observations/cases in my data, then the national estimate should be about 5*8=40 million).
For weighting the data, I use the code below in STATA:
mycases is the variable for the cases that I am interested in.
HOSPID is the variable that contains codes for the hospitals that the procedure has been done or the patient has been hospitalized.
There is a way provided by the database provider itself that quickly gives you the number of cases in the 'national estimate' (=weighted) level.
After applying the above code for weighting the data, although I get very close estimates, but unfortunately they are not EXACTLY the same as what the provider gives me—my UNWEIGHTED numbers are exactly the same so I think the problem must be in the way that I weight the data.
I have checked the website and it says the way they calculate the national estimate in SAS is as follows:
I would be very grateful if anyone can help me in this regard.
Thank you very much!
Reza
I am analyzing a very large database which is the sample size of roughly 20% of all the hospitalizations in each year in the US. This database has a variable —DISCWT— which is used for weighting and producing the national estimates (after applying it should roughly make the population and descriptive data 5 times greater. for example if I have 8 million observations/cases in my data, then the national estimate should be about 5*8=40 million).
For weighting the data, I use the code below in STATA:
Code:
svyset HOSPID [pw=DISCWT], strata(NIS_STRATUM) pus(HOSPID) svy: mean AGE if mycases==1
HOSPID is the variable that contains codes for the hospitals that the procedure has been done or the patient has been hospitalized.
There is a way provided by the database provider itself that quickly gives you the number of cases in the 'national estimate' (=weighted) level.
After applying the above code for weighting the data, although I get very close estimates, but unfortunately they are not EXACTLY the same as what the provider gives me—my UNWEIGHTED numbers are exactly the same so I think the problem must be in the way that I weight the data.
I have checked the website and it says the way they calculate the national estimate in SAS is as follows:
Code:
PROC SURVEYMEANS DATA=mycases SUM STD MEAN STDERR; VAR mycases; WEIGHT DISCWT; CLUSTER HOSPID; STRATA NIS_STRATUM; run;
Thank you very much!
Reza
Comment