Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ttest or ranksum approach for panel data

    Hi there, I was hoping you could help.

    I would like to compare the means between 2 samples for panel data.
    Context: I am comparing people with debt and without debt (0/1)

    I want to compare summary statistics across the panel, with a number of variables, sum of which are binary.

    I have seen that you can use, for example, "xtreg depressed(0/1) hasdebt, fe" and use the significance of the coefficient on hasdebt.

    Is this correct? Does this also apply to ranksum?

    For a larger set of variables would I have to do this for each variable?

    Thank you in advance!

  • #2
    Your proposed code and your stated goals are not in harmony.

    Your code seeks to estimate whether a single person's being depressed tends to vary from occasion to occasion according to whether or not he/she currently has debt. But your stated goal is to compare the prevalence of depression in a group of people who have debt to a group of people who do not. If your goal is really the latter, then you should not be using panel data in the first place: you need two clean groups, not people who oscillate between having and not having debt.

    So you need to first clarify for yourself what your goal is.

    As for -ranksum- it is not usable at all with panel data. Also it isn't clear how -ranksum- would be relevant to the variables in question: it is used to compare ordinal level outcomes--but your outcome is dichotomous. While a dichotomous outcome is, in a rather trivial sense, ordinal, the use of techniques designed specifically for dichotomous variables would be better here.

    There are also issues in using a linear probability model if the probability of depression is near 0 or near 1: in that case the linear model can predict negative probabilities or probabilities > 1. This is not necessarily a problem depending on where you want to go with this, but it can be. For that reason, logistic models tend to be more popular for dichotomous outcomes. Also, if you are really doing a comparison between separate groups of people, the simplest approach is just to do a cross-tabulation of depression with debt. (See, for example, the -cs- command.)

    Comment

    Working...
    X