mixed not concave

Fred Lee

Join Date: Nov 2017

Posts: 473
#1

mixed not concave

01 Jun 2019, 08:21

In order to replicate the problem of not concave, the data is uploaded:https://www.dropbox.com/s/k45ns53rte...ample.dta?dl=0
Here is the code, how to deal with this problem?

Code:

mixed y cl.x##c.w i.a b c d || id:cl.x, vce(robust) cov(exc)
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

01 Jun 2019, 09:11

I am not an expert in mixed models. With that said, I am concerned when I notice that your variables a, b, c, and d are constant within each id.

Code:

. tabstat a b c d, by(id) statistics(range)

Summary statistics: range
  by categories of: id (group(name))

      id |         a         b         c         d
---------+----------------------------------------
       1 |         0         0         0         0
       2 |         0         0         0         0
       3 |         0         0         0         0
...

      74 |         0         0         0         0
      75 |         0         0         0         0
      76 |         0         0         0         0
---------+----------------------------------------
   Total |         8  5.992714  10.30899         1
--------------------------------------------------

That leads me to first try the model

Code:

mixed y cl.x##c.w || id:cl.x, vce(robust) cov(exc) difficult

which converges, then four more versions, separately adding i.a, b, c, and d, all of which converge. But when I include i.a and either b or c, the model fails to converge.

So a common diagnostic approach in these circumstances is to limit the number of iterations and inspect the results. For the model with i.a and b we have the following results, in which I have highlighted in red a result that perhaps another member with more experience in mixed models can explain the significance of.

Code:

. mixed y cl.x##c.w i.a b || id:cl.x, vce(robust) cov(exc) iterate(20)

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log pseudolikelihood = -243.52449  
Iteration 1:   log pseudolikelihood = -242.24485  
Iteration 2:   log pseudolikelihood = -242.24323  
Iteration 3:   log pseudolikelihood = -242.24323  (not concave)
Iteration 4:   log pseudolikelihood = -242.24323  (not concave)
...
Iteration 19:  log pseudolikelihood = -242.24323  (not concave)
Iteration 20:  log pseudolikelihood = -242.24323  (not concave)
convergence not achieved

Computing standard errors:

Mixed-effects regression                        Number of obs     =        474
Group variable: id                              Number of groups  =         76

                                                Obs per group:
                                                              min =          3
                                                              avg =        6.2
                                                              max =          7

                                                Wald chi2(11)     =      48.30
Log pseudolikelihood = -242.24323               Prob > chi2       =     0.0000

                                    (Std. Err. adjusted for 76 clusters in id)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |
         L1. |   .1016916   .2058044     0.49   0.621    -.3016775    .5050607
             |
           w |   .1518304   .1564095     0.97   0.332    -.1547266    .4583873
             |
    cL.x#c.w |   -.047828   .0558256    -0.86   0.392    -.1572443    .0615882
             |
           a |
          4  |   .4307988   .1445108     2.98   0.003     .1475628    .7140347
          5  |   .6453963   .1293368     4.99   0.000     .3919009    .8988917
          6  |   .3326278   .1975582     1.68   0.092    -.0545791    .7198347
          7  |   .1581225   .1651106     0.96   0.338    -.1654883    .4817332
          8  |   .3042626   .0824021     3.69   0.000     .1427574    .4657678
          9  |   .1846456   .0963684     1.92   0.055    -.0042329    .3735242
         11  |    .256985   .1038483     2.47   0.013      .053446     .460524
             |
           b |   .0582941   .0413075     1.41   0.158    -.0226671    .1392553
       _cons |   3.357153   .6035209     5.56   0.000     2.174274    4.540032
------------------------------------------------------------------------------

------------------------------------------------------------------------------
                             |               Robust           
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Exchangeable             |
              var(L.x _cons) |   .0100354   .5708113      3.85e-51    2.61e+46
              cov(L.x,_cons) |   .0100354   .5708113     -1.108734    1.128805
-----------------------------+------------------------------------------------
               var(Residual) |    .118391   .1891595      .0051679    2.712236
------------------------------------------------------------------------------

Warning: convergence not achieved

Comment

Fred Lee

Join Date: Nov 2017

Posts: 473
#3

01 Jun 2019, 20:33

Thanks William, does anyone could explain more?
@Clyde Schechter @Weiwen Ng @Joseph Coveney

Last edited by Fred Lee; 01 Jun 2019, 20:35.
Comment
Fred Lee

Join Date: Nov 2017

Posts: 473
#4

02 Jun 2019, 06:07

@Nick Cox could you please help me? Thanks a lot!
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#5

02 Jun 2019, 07:48

Fred: I notice your requests to particular Forum participants. You should perhaps read and digest advice given in the thread at https://www.statalist.org/forums/for...ivate-messages
1 like
Comment
Fred Lee

Join Date: Nov 2017

Posts: 473
#6

02 Jun 2019, 08:09

Originally posted by Stephen Jenkins View Post

Fred: I notice your requests to particular Forum participants. You should perhaps read and digest advice given in the thread at https://www.statalist.org/forums/for...ivate-messages

Thanks for your remind.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

02 Jun 2019, 08:26

Run the following experiment.

Code:

regress y b i.id
regress y i.id b

The order of the variables makes a difference when Stata chooses which variables to eliminate because of collinearity.

The first regression tells us

Code:

. regress y b i.id
note: 76.id omitted because of collinearity

The second regression tells us

Code:

. regress y i.id b
note: b omitted because of collinearity

That is, the variable b is perfectly predicted by id. Not only is b constant within each id, as I noted in post #2, each id has a different value of b. The variable b adds no information to that which is already given by id.

The complicated formulation of the mixed model hid this fact from you, but the results I highlighted in post #2 are often indicative of problems due to collinearity.

Your model is flawed by including both b and id. You must omit one or the other.

If you omit b the model converges easily.

Code:

. mixed y cl.x##c.w i.a || id:cl.x, vce(robust) cov(exc)

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log pseudolikelihood = -245.06941  
Iteration 1:   log pseudolikelihood = -243.81957  
Iteration 2:   log pseudolikelihood = -243.81643  
Iteration 3:   log pseudolikelihood = -243.81643  

Computing standard errors:

Mixed-effects regression                        Number of obs     =        474
Group variable: id                              Number of groups  =         76

                                                Obs per group:
                                                              min =          3
                                                              avg =        6.2
                                                              max =          7

                                                Wald chi2(10)     =      35.21
Log pseudolikelihood = -243.81643               Prob > chi2       =     0.0001

                                    (Std. Err. adjusted for 76 clusters in id)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |
         L1. |   .0922254   .2007274     0.46   0.646     -.301193    .4856439
             |
           w |   .1581071   .1494684     1.06   0.290    -.1348456    .4510597
             |
    cL.x#c.w |  -.0464169   .0544704    -0.85   0.394     -.153177    .0603432
             |
           a |
          4  |   .4534787   .1415018     3.20   0.001     .1761402    .7308171
          5  |   .6441936   .1514849     4.25   0.000     .3472888    .9410985
          6  |   .3644797   .2040079     1.79   0.074    -.0353684    .7643277
          7  |   .1954667   .1864038     1.05   0.294    -.1698781    .5608115
          8  |   .3388006   .1145218     2.96   0.003     .1143419    .5632592
          9  |    .195889   .1068757     1.83   0.067    -.0135835    .4053615
         11  |   .2596739    .101415     2.56   0.010     .0609041    .4584436
             |
       _cons |   3.486475   .5657096     6.16   0.000     2.377704    4.595245
------------------------------------------------------------------------------

------------------------------------------------------------------------------
                             |               Robust           
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Exchangeable             |
              var(L.x _cons) |   .0105453   .0033627      .0056445    .0197011
              cov(L.x,_cons) |   .0105453   .0033627      .0039545    .0171361
-----------------------------+------------------------------------------------
               var(Residual) |   .1183675   .0315033      .0702565    .1994244
------------------------------------------------------------------------------

Last edited by William Lisowski; 02 Jun 2019, 08:29.

Comment

Fred Lee

Join Date: Nov 2017

Posts: 473
#8

02 Jun 2019, 09:10

Originally posted by William Lisowski View Post

Your model is flawed by including both b and id. You must omit one or the other.

If you omit b the model converges easily.

Thanks William! however I find if drop one of b,c,d, the model will concave, so which should I drop?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#9

02 Jun 2019, 09:52

I would not include any variable which is perfectly predicted by the id variable. But then, as I wrote in post #2, I am not an expert in mixed models. Perhps the methodology appropriately allows the user to make something out of nothing.
Comment
Fred Lee

Join Date: Nov 2017

Posts: 473
#10

02 Jun 2019, 20:59

Originally posted by William Lisowski View Post

I would not include any variable which is perfectly predicted by the id variable. But then, as I wrote in post #2, I am not an expert in mixed models. Perhps the methodology appropriately allows the user to make something out of nothing.

Thanks William, I use mixed to run the hierarchical linear model, the variables of level 2 are all the same for each individuals in levle1. For example, the level 1 are the characteristics of students, and the level 2 are the characteristics of school.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

#11

03 Jun 2019, 17:58

I use mixed to run the hierarchical linear model, the variables of level 2 are all the same for each individuals in levle1

Reading in your data, we see that you have declared it as panel data, with id as your panel variable and seq as your time variable.

Code:

. xtset
       panel variable:  id (strongly balanced)
        time variable:  serial, 1 to 8
                delta:  1 unit

. xtdescribe

      id:  1, 2, ..., 76                                     n =         76
  serial:  1, 2, ..., 8                                      T =          8
           Delta(serial) = 1 unit
           Span(serial)  = 8 periods
           (id*serial uniquely identifies each observation)
...

Here I dsplay the first two panels of your data.

Code:

. list id serial y x w a b c d if id<=2, sepby(id) noobs

  +--------------------------------------------------------------------+
  | id   serial           y     x   w    a           b           c   d |
  |--------------------------------------------------------------------|
  |  1        1   4.9444444   2.2   4   11   2.3978953   3.8918203   1 |
  |  1        2   4.2222222     2   4   11   2.3978953   3.8918203   1 |
  |  1        3   4.9444444     2   4   11   2.3978953   3.8918203   1 |
  |  1        4   4.3333333     2   4   11   2.3978953   3.8918203   1 |
  |  1        5   4.7222222     2   4   11   2.3978953   3.8918203   1 |
  |  1        6           4     2   4   11   2.3978953   3.8918203   1 |
  |  1        7           4   2.4   4   11   2.3978953   3.8918203   1 |
  |  1        8           4     2   4   11   2.3978953   3.8918203   1 |
  |--------------------------------------------------------------------|
  |  2        1   4.1111111   3.8   3    6   3.0445224   6.8035053   1 |
  |  2        2   4.1666667   2.8   3    6   3.0445224   6.8035053   1 |
  |  2        3   4.5555556   1.4   3    6   3.0445224   6.8035053   1 |
  |  2        4   4.7777778     2   3    6   3.0445224   6.8035053   1 |
  |  2        5           4     1   3    6   3.0445224   6.8035053   1 |
  |  2        6   4.7777778   2.4   3    6   3.0445224   6.8035053   1 |
  |  2        7           5   1.8   3    6   3.0445224   6.8035053   1 |
  |  2        8           5   1.4   3    6   3.0445224   6.8035053   1 |
  +--------------------------------------------------------------------+

We see in these two panels what is the case for all 76: the only variables that change with time are y and x. All others are constant for the panel.

In the context of your mixed model, level 2 is the id (panel) and level 1 is seq (time). It is true as you say that the variable you have at level 2 (id) is the same for each observation within level 1 (seq).

But there are also variables in your model - w a b c and d - that you included at level 1 but are in fact are the same for each observation within level 2. This is equivalent to including the school size as a characteristic of each of the students in that school when it is in fact a characteristic of the school. School size is a characteristic of the school (level 2), not the student (level 1). Your variables w a b c and d are characteristics of id (level 2), not of seq (level 1).

That is not my understanding of how hierarchical models work.

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment