Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weighted VS Unweighted Advice + Help

    Good evening everyone;

    I have a quick question / in need of some advice.

    I am doing weighted and unweighted regressions, but the output is extremely different for both. And so I am trying to decide which one would be better to use in my final paper. Unfortuantely, I cannot share the output here directly from Stata because I do my analysis in a secured facility. However I do remember the numbers.

    My dependent variable is vocabularyscore and my independent variable is child care participation.

    WEIGHTED:
    the coefficient on childcare while weighted is 2.21, with a p-value of 0.98

    UNWEIGHTED
    coefficient on childcare is -1.15, with a p-value of 0.58

    Obviously, the coefficient on childcare in the weighted regression is more desirable as it shows that childcare positively influences the outcome of the vocabulary score.

    However, with the unweighted coefficient, my covariants are almost all below p-value of 0.05.

    If you were in my shoes, what would you use?

    Is it better to have a lower p-value and less-desirable coefficient, or a higher p-value with a more desirable coefficient? I've spoken to other researchers who used my dataset and some of them published the unweighted numbers because, as they wrote :

    some observations (children) have missing data for a subset of the explanatory variables so get dropped from the regression and it would require us to have either constructed a new modified weight or assume all incomplete data is completely at random to use an existing weight. The sample sizes also change across outcomes so this would require a great deal of belief in whatever your modelling choice was and make the study harder to replicate."
    Thanks for the help

  • #2
    Is it better to have a lower p-value and less-desirable coefficient, or a higher p-value with a more desirable coefficient?
    OMG!!! If you are even asking that question, you are not doing science, my friend. What is better is to have (approximately) correct answers, or at least answers that derive from an analysis that is defensible. And neither the p-value nor the the "desirability" of the coefficient, nor any other aspect of the results, sheds any light on that core issue.

    Now, the first thing to focus on is that at most one of the two approaches, weighted or unweighted is correct. They cannot both be right. (Even if they gave similar results, only one would be correct--the other would just agree by dumb luck.) Unweighted analysis is appropriate for data that were collected by simple random sampling--that is, sampling where every member of the target population has the same probability of being included in the sample. Weights are required, not optional, otherwise. So if you have a non-simple sampling scheme, you must use weights in the analysis. Unweighted analyses of non-simple designs is simply wrong. Now, from the quote you provide, it sounds like for various reasons, the missing data has invalidated the weights that were originally provided with your data.* That is most unfortunate, but it means that, like it or not, you must calculate a new modified weight. If you do not know how to do that, then you need help from somebody who does. (I do not know how to do that, but there are plenty of others here at Statalist who do and who could provide help if you post back with a clear and complete explanation of the sampling design and those modifications to the data that lead you to need new sampling weights calculated. If you end up doing that, I suggest you start a new thread for the purpose.)

    *Are you sure about this? If these data were curated by professional survey statisticians, they may already come with multilple sets of weights, to be used according to which variables are included in the analysis you want to do, so that the effects of missing data are accounted for. Check the documentation that came with the survey, or contact the people who produced the survey.

    Comment


    • #3
      Some information in addition to Clyde's comprehensive instruction: Solon et al's paper (https://www.nber.org/papers/w18859) suggests that, in a very limited number of cases (linear regression, where model is correctly specified, sampling weights are exogenous, and the variables based on which sampling was conducted are comprehensively controlled for), unweighted regressions would be correct and more efficient than weighted regressions. But doing weighted regressions in such cases, I think, is harmless, in particular when the sample size is sufficiently large and efficiency is not a big issue.

      Comment

      Working...
      X