I have about 200,000 cases in my dataset. About 40,000 are in the treatment group and 160k in the control group. I am predicting the participation variable (0 or 1 values) based on about 20 demographics covariates. I do a set seed and random sort order then do psmatch2 and it displays the output regression but hangs after saying the following:
There are observations with identical propensity score values.
The sort order of the data could affect your results.
Make sure that the sort order is random before calling psmatch2.
I left it on for a few days and it was still busy.
Is it normal to be taking so long wit this many cases or is something wrong with the data? If I randomly sample like 1000 or 50 from the treatment group then run psmatch2, it is able to finish quickly.
There are observations with identical propensity score values.
The sort order of the data could affect your results.
Make sure that the sort order is random before calling psmatch2.
I left it on for a few days and it was still busy.
Is it normal to be taking so long wit this many cases or is something wrong with the data? If I randomly sample like 1000 or 50 from the treatment group then run psmatch2, it is able to finish quickly.