Missing values and collinearity error in mixed effect models

Elena Tore

Join Date: Aug 2017

Posts: 5
#1

Missing values and collinearity error in mixed effect models

06 Feb 2018, 08:33

Hello,

I am trying to generate a regression with mixed effects using the data given below. I have 4 measurements (conc) per person (ID). The "days" variable identify the day from the beginning of pregnancy at which each measurement was conducted.

I need a single regression for each ID. So I thought of this command (I'm using Stata 13 SE):
by ID, sort: xtmixed conc days || timepoint:
However, I have quite a few missing values and when running the command I get "could not calculate numerical derivatives -- discontinuous region with missing values encountered" (ID 301, and in every other case, even without missing values) and "conc collinear with days _cons" (ID 305), and it stops.

Is there a way to overcome the problem and calculate the regression with the measurements I have?

Also, how can I save the beta coefficient and the intercept (one per person) in two variables? I've read about the mat and svmat commands, but I end up with 4 variables for the beta coefficient, in which all but the first line is missing, while I couldn't find a way to save the intercept.

Thank you so much for your help!

Elena

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int ID byte timepoint int days float conc 301 1 72 24.26 301 2 136 . 301 3 211 22.65 301 4 268 22.39 305 1 48 . 305 2 132 19.21 305 3 228 19.65 305 4 288 . 309 1 87 18.65 309 2 141 20.55 309 3 220 22.2 309 4 274 21.18 319 1 76 21.46 319 2 153 22.2 319 3 223 21.1 319 4 272 19.12 end
Tags: mixed effects
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

06 Feb 2018, 09:12

-mixed- (formerly known as -xtmixed-) does not produce one regression per person. It would do something vaguely like that if the upper level of the model were the id variable rather than the timepoint. But as it stands, it just represents a model which is going to be very difficult to identify because the main fixed effect, days, is a proxy for the level variable, timepoint, so their effects will be difficult to separate. And, even with id as the upper level grouping variable, it would not produce individualized regressions. Rather it would fit a model in which each person is assumed to have her own intercept. But those intercepts are constrained to follow a normal distribution, and only the variance of that distribution is estimated directly, not the actual individual-level intercepts. What is being asked for is more like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int ID byte timepoint int days float conc 301 1 72 24.26 301 2 136 . 301 3 211 22.65 301 4 268 22.39 305 1 48 . 305 2 132 19.21 305 3 228 19.65 305 4 288 . 309 1 87 18.65 309 2 141 20.55 309 3 220 22.2 309 4 274 21.18 319 1 76 21.46 319 2 153 22.2 319 3 223 21.1 319 4 272 19.12 end capture program drop one_id program define one_id capture regress conc days if c(rc) == 0 { foreach x in _cons days { gen b_`x' = _b[`x'] gen se_`x' = _se[`x'] } gen n_obs = e(N) gen r2 = e(r2) } else if !inlist(c(rc), 2000, 2001) { noisily display as error `"Unexpected regression error: ID = `=ID[1]'"' exit c(rc) } exit end runby one_id, by(ID) verbose

To use this code you will need to install the -runby- command, written by Robert Picard and me, from SSC.

If you have any IDs with two missing observations, then there not be enough data to carry out a regression for that ID. Because of the way I set up the -capture- structure, this will result in that ID's original data being retained in the final results, with missing values for the regression results, but it is not counted as an error. If there is some other problem encountered during regression, that will count as an error and you will be notified about it.

If your data set is very large, you may want to add the -status- option to the -runby- command so that you can see that progress is being made while you wait.
Comment

Announcement

Missing values and collinearity error in mixed effect models

Comment