Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing values and collinearity error in mixed effect models

    Hello,

    I am trying to generate a regression with mixed effects using the data given below. I have 4 measurements (conc) per person (ID). The "days" variable identify the day from the beginning of pregnancy at which each measurement was conducted.

    I need a single regression for each ID. So I thought of this command (I'm using Stata 13 SE):
    by ID, sort: xtmixed conc days || timepoint:
    However, I have quite a few missing values and when running the command I get "could not calculate numerical derivatives -- discontinuous region with missing values encountered" (ID 301, and in every other case, even without missing values) and "conc collinear with days _cons" (ID 305), and it stops.

    Is there a way to overcome the problem and calculate the regression with the measurements I have?

    Also, how can I save the beta coefficient and the intercept (one per person) in two variables? I've read about the mat and svmat commands, but I end up with 4 variables for the beta coefficient, in which all but the first line is missing, while I couldn't find a way to save the intercept.


    Thank you so much for your help!

    Elena

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int ID byte timepoint int days float conc
    301 1  72 24.26
    301 2 136     .
    301 3 211 22.65
    301 4 268 22.39
    305 1  48     .
    305 2 132 19.21
    305 3 228 19.65
    305 4 288     .
    309 1  87 18.65
    309 2 141 20.55
    309 3 220  22.2
    309 4 274 21.18
    319 1  76 21.46
    319 2 153  22.2
    319 3 223  21.1
    319 4 272 19.12
    end




  • #2
    -mixed- (formerly known as -xtmixed-) does not produce one regression per person. It would do something vaguely like that if the upper level of the model were the id variable rather than the timepoint. But as it stands, it just represents a model which is going to be very difficult to identify because the main fixed effect, days, is a proxy for the level variable, timepoint, so their effects will be difficult to separate. And, even with id as the upper level grouping variable, it would not produce individualized regressions. Rather it would fit a model in which each person is assumed to have her own intercept. But those intercepts are constrained to follow a normal distribution, and only the variance of that distribution is estimated directly, not the actual individual-level intercepts. What is being asked for is more like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int ID byte timepoint int days float conc
    301 1  72 24.26
    301 2 136     .
    301 3 211 22.65
    301 4 268 22.39
    305 1  48     .
    305 2 132 19.21
    305 3 228 19.65
    305 4 288     .
    309 1  87 18.65
    309 2 141 20.55
    309 3 220  22.2
    309 4 274 21.18
    319 1  76 21.46
    319 2 153  22.2
    319 3 223  21.1
    319 4 272 19.12
    end
    
    capture program drop one_id
    program define one_id
        capture regress conc days
        if c(rc) == 0 {
            foreach x in _cons days {
                gen b_`x' = _b[`x']
                gen se_`x' = _se[`x']
            }
            gen n_obs = e(N)
            gen r2 = e(r2)
        }
        else if !inlist(c(rc), 2000, 2001) {
            noisily display as error `"Unexpected regression error: ID = `=ID[1]'"'
            exit c(rc)
        }
        exit
    end
    
    runby one_id, by(ID) verbose
    To use this code you will need to install the -runby- command, written by Robert Picard and me, from SSC.

    If you have any IDs with two missing observations, then there not be enough data to carry out a regression for that ID. Because of the way I set up the -capture- structure, this will result in that ID's original data being retained in the final results, with missing values for the regression results, but it is not counted as an error. If there is some other problem encountered during regression, that will count as an error and you will be notified about it.

    If your data set is very large, you may want to add the -status- option to the -runby- command so that you can see that progress is being made while you wait.

    Comment

    Working...
    X