Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtmelogit and panel data

    Hello Statalist,

    I am working on a model that can explain loss aversion in saving behaviour using a household survey (Household Finance and Consumption Survey from the European Central Bank) with 2 waves. To do so, I use an income binary variable that takes into account if the income over the last 12 months was unusually high or low compared to what the respondent would expect in a normal year. The dependent variable is a binary variable that considers if the respondent saved or not during this period. Along with these variables, I introduce a set of control variables such as age, level of education, labour status, tenure status and so on.

    I would like to compare these results by countries (7 countries) using a multilevel mixed-effect logit, and my question is consisted of whether I could use a mixed-effect logistic regression together with panel data. To do so, I include two random intercepts that varies from one country to the next and also by waves (time variable). I am also using multiple imputation to deal with the issue of missing values in the household data. I set up my dataset according to the command "mi xtset householdid wave". The HFCS has only two waves just right now.

    mi estimate, vceok esampvaryok: xtmelogit saving20 highincome lowincome gender i.pa0200 dh0001 ra0300 ownallresidence rentedresidence employee unemployed retiree marriedstatus singlestatus highrisk || sa0100: || wave: , intpoints(15)

    Another question is about the interpretation of these results. What can I say about the coefficients if I want to compare them. For instance, I am interested in explaining that the positive effect of high income on the probability of saving is higher than the negative effect of low income on the probability of saving. Can I use average marginal effect with xtmelogit? Or it is better other options such as odd-ratios, to interpret the coefficients?

    Mixed-effects logistic regression Number of obs = 23,722

    ----------------------------------------------------------------------------
    | No. of Observations per Group Integration
    Group Variable | Groups Minimum Average Maximum Points
    ----------------+-----------------------------------------------------------
    country | 7 1,209 3,388.9 7,407 15
    wave | 14 601 1,694.4 3,711 15
    ----------------------------------------------------------------------------

    Wald chi2(16) = 1169.72
    Log likelihood = -14931.238 Prob > chi2 = 0.0000

    ---------------------------------------------------------------------------------
    saving20 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    ----------------+----------------------------------------------------------------
    highincome | .3423776 .053509 6.40 0.000 .2375018 .4472534
    lowincome | -.6060649 .0366153 -16.55 0.000 -.6778296 -.5343002
    gender | .0774088 .0346698 2.23 0.026 .0094573 .1453604
    |
    pa0200 |
    2 | .0847355 .0489974 1.73 0.084 -.0112976 .1807686
    3 | .1830587 .0462057 3.96 0.000 .0924972 .2736202
    5 | .5877664 .0457286 12.85 0.000 .4981399 .6773928
    |
    dh0001 | -.1129852 .0151901 -7.44 0.000 -.1427572 -.0832132
    agebracket | .004263 .001578 2.70 0.007 .0011702 .0073559
    ownallresidence | .1575485 .0555991 2.83 0.005 .0485764 .2665207
    rentedresidence | -.3119736 .0640343 -4.87 0.000 -.4374785 -.1864687
    employee | .2644405 .0665266 3.97 0.000 .1340508 .3948301
    unemployed | -.4973625 .1020421 -4.87 0.000 -.6973613 -.2973636
    retiree | -.0380407 .0649416 -0.59 0.558 -.1653238 .0892425
    marriedstatus | .1910295 .0440335 4.34 0.000 .1047254 .2773337
    singlestatus | .0132219 .0543512 0.24 0.808 -.0933046 .1197483
    highrisk | .1421069 .0569081 2.50 0.013 .030569 .2536448
    _cons | -.8839132 .234231 -3.77 0.000 -1.342997 -.4248289
    ---------------------------------------------------------------------------------

    ------------------------------------------------------------------------------
    Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
    -----------------------------+------------------------------------------------
    country: Identity |
    sd(_cons) | .4411826 .1596276 .2170895 .8965986
    -----------------------------+------------------------------------------------
    wave: Identity |
    sd(_cons) | .3433583 .0941693 .2005846 .5877567
    ------------------------------------------------------------------------------
    LR test vs. logistic model: chi2(2) = 1160.88 Prob > chi2 = 0.0000

    Note: LR test is conservative and provided only for reference.
    Note: Log-likelihood calculations are based on the Laplacian approximation.

    Thank you so much!!

    Gonzalo.

  • #2
    The code you show does not look like it goes with the output you show. In your code, you have a random effect for the variable sa100, but none for country; but the output has a random effect for country, and no mention of sa100 anywhere.

    In any case, the model is also problematic because wave is given as nested within (country or sa100, whichever it really is), when the structure of your data has them crossed. But, even beyond that, with only two waves, it doesn't really make sense to use random effects for wave--you are not adequately sampling wave space. I recommend having i.wave as a fixed effect in the model instead.

    As for interpreting the results (once you get the model fixed), I do recommend using the -margins- command because you can get results in the probability metric, which is easier for most people to understand, and which is more directly related to decision making and policy analysis than odds and odds ratios. But to do that, you will need to use factor variable notation. (Run -help fvvarlist- and read that for information about factor variable notation.) This is crucial, because -margins- will calculate incorrect marginal effects for your discrete variables if you do not prefix them with i. in the regression command.

    Finally, deciding whether one variable is a stronger predictor than another is always a hazardous undertaking. The marginal effects of those variables depend on lots of things, including the distributions of those effects themselves. For example, if the probability of highincome = 1 differs from the probability of lowincome = 1 in your estimation sample, then the comparison between their effects is already "contaminated" by that difference in baseline frequency and probably invalid. It is fine to estimate marginal effects of these variables, but comparing them to say one is bigger than the other is usually ill-advised.

    Comment

    Working...
    X