Model for data with two dimensions but no time

Anne Carre

Join Date: May 2018

Posts: 6
#1

Model for data with two dimensions but no time

26 May 2018, 07:56

Hi all,

I have a dataset with two dimensions: individuals and documents.

I have a dummy variable ("consult") indicating whether a given individual has consulted a given document or not.

I have a series of variables related to the individuals (e.g. gender) and a series of variables related to the documents (e.g. printed or numerical).

I would like to model the probability that a document is consulted using individual-related and document-related variables.

My first try was a very simple logit :
logit consult individual_var document_var

I am now wondering wether I could improve my model. I am feeling that the data structure is not very different from panel data, except for the time dimension. Could I use an "xt-model"?

Any advice will be appreciated! Thanks in advance.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17700
#2

26 May 2018, 08:46

Anne:
welcome to this forum.
If you have repeated observation for individual (in that each individual can consult more than one document) you can cluster your standard errors on individual.
That said, please note that your chances of getting helpful replies are conditional on posting what you typed and what Stata gave you back (like you partially did, but without tagging your codes - please, see the FAQ on this and other posting-related topics. Thanks) and on sharing an excerpt/example of your dataset (or a fake example mimicking what's the matter with your real dataset, if you're under confidentiality Agreement) via -dataex-. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment

Anne Carre

Join Date: May 2018
Posts: 6

26 May 2018, 14:54

Hi Carlo,

Thanks for your answer. Sorry if my post was imprecise.

Indeed I have repeated observations for each individual (each individual can consult more than one document).

Here is an excerpt of my data set (with just a few of the variables I have):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(Indiv InformationNum Consult) long i_sexe float(i_NoteMatiere_imputee d_TypActi_3 d_TypActi_Actif)
 1 1 0 2     6 1 0
 1 2 0 2     6 1 0
 1 3 0 2     6 1 0
 1 4 0 2     6 3 0
 1 5 0 2     6 3 1
15 1 1 1 16.12 1 0
15 2 0 1 16.12 1 0
15 3 1 1 16.12 1 0
15 4 1 1 16.12 3 0
15 5 1 1 16.12 3 1
end
label values i_sexe i_sexe
label def i_sexe 1 "femme", modify
label def i_sexe 2 "homme", modify

Here is the first very simple model I used:

Code:

logit Consult  i.i_sexe  i_NoteMatiere_imputee i.d_TypActi_3 i.d_TypActi_Actif

Here the output:

Code:

Iteration 0:   log likelihood = -10475.777  
Iteration 1:   log likelihood = -8526.9932  
Iteration 2:   log likelihood = -8387.8451  
Iteration 3:   log likelihood = -8386.9422  
Iteration 4:   log likelihood = -8386.9422  

Logistic regression                             Number of obs     =     17,940
                                                LR chi2(5)        =    4177.67
                                                Prob > chi2       =     0.0000
Log likelihood = -8386.9422                     Pseudo R2         =     0.1994

---------------------------------------------------------------------------------------
              Consult |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------------+----------------------------------------------------------------
               i_sexe |
               homme  |  -.4023254   .0399451   -10.07   0.000    -.4806163   -.3240344
i_NoteMatiere_imputee |   .1350651   .0040581    33.28   0.000     .1271114    .1430187
                      |
          d_TypActi_3 |
                   2  |    1.44667   .0526922    27.46   0.000     1.343395    1.549944
                   3  |  -1.050469   .0606042   -17.33   0.000    -1.169251   -.9316873
                      |
    1.d_TypActi_Actif |  -.5101569   .0451286   -11.30   0.000    -.5986073   -.4217064
                _cons |  -2.694959   .0680842   -39.58   0.000    -2.828401   -2.561516
---------------------------------------------------------------------------------------

So a first suggestion would be to cluster the standard errors on the individuals. I will try that, thanks!

Any other suggestion?

Best regards,
Anne

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#4

26 May 2018, 15:27

At least at first glance, I would be inclined to approach this as a mixed model with crossed individual and document random effects. It seems logical to assume that different individuals differ in their inclination to consult any document at all, and different documents are identifiable (perhaps by title or by length) as being more or less likely to be useful if consulted.
Comment

Anne Carre

Join Date: May 2018
Posts: 6

27 May 2018, 04:46

Thanks Clyde, that sounds like very good advice!

Here is what I tried. Am I right regarding the specification of the random effects? (they should be non-nested)

Code:

melogit Consult i.i_sexe  i_NoteMatiere_imputee  i.d_TypActi_3 i.d_TypActi_Actif || _all:R.Indiv  || Infor
> mationNum:
note: crossed random-effects model specified; option intmethod(laplace) implied

Fitting fixed-effects model:

Iteration 0:   log likelihood = -8494.1795  
Iteration 1:   log likelihood = -8387.3748  
Iteration 2:   log likelihood = -8386.9423  
Iteration 3:   log likelihood = -8386.9422  

Refining starting values:

Grid node 0:   log likelihood = -6899.2873

Fitting full model:

Iteration 0:   log likelihood = -6899.2873  
Iteration 1:   log likelihood = -6886.5898  (not concave)
Iteration 2:   log likelihood =  -6885.283  (not concave)
Iteration 3:   log likelihood = -6885.1533  
Iteration 4:   log likelihood = -6879.7107  (not concave)
Iteration 5:   log likelihood = -6879.3646  (not concave)
Iteration 6:   log likelihood = -6879.3495  (not concave)
Iteration 7:   log likelihood =  -6879.348  
Iteration 8:   log likelihood = -6879.3409  (not concave)
Iteration 9:   log likelihood = -6879.3402  (not concave)
Iteration 10:  log likelihood = -6879.3402  (not concave)
Iteration 11:  log likelihood = -6879.3401  
Iteration 12:  log likelihood = -6879.3326  
Iteration 13:  log likelihood =  -6878.452  
Iteration 14:  log likelihood = -6878.3652  
Iteration 15:  log likelihood = -6878.3621  
Iteration 16:  log likelihood =  -6878.362  

Mixed-effects logistic regression               Number of obs     =     17,940

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
           _all |          1     17,940   17,940.0     17,940
   Informatio~m |         78        230      230.0        230
-------------------------------------------------------------

Integration method:     laplace

                                                Wald chi2(5)      =     280.77
Log likelihood =  -6878.362                     Prob > chi2       =     0.0000
---------------------------------------------------------------------------------------
              Consult |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------------+----------------------------------------------------------------
                      |
               i_sexe |
               homme  |  -.6479697   .1772484    -3.66   0.000    -.9953702   -.3005692
i_NoteMatiere_imputee |     .19098   .0203371     9.39   0.000     .1511199    .2308401
                      |
          d_TypActi_3 |
                   2  |   1.819509   .4183267     4.35   0.000      .999604    2.639415
                   3  |  -1.273481   .4237598    -3.01   0.003    -2.104035   -.4429266
                      |
    1.d_TypActi_Actif |  -.3608841   .2671318    -1.35   0.177    -.8844527    .1626845
                _cons |  -3.779886   .5549849    -6.81   0.000    -4.867636   -2.692135
----------------------+----------------------------------------------------------------
_all>Indiv            |
            var(_cons)|   1.336868   .1491291                      1.074326     1.66357
----------------------+----------------------------------------------------------------
InformationNum        |
            var(_cons)|   .8627758    .150751                      .6125891    1.215141
---------------------------------------------------------------------------------------
LR test vs. logistic model: chi2(2) = 3017.16             Prob > chi2 = 0.0000

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#6

27 May 2018, 10:37

Yes, that code looks correct.
Comment
Anne Carre

Join Date: May 2018

Posts: 6
#7

27 May 2018, 11:31

Thanks a lot Clyde. I think I'll go with this approach.
Comment

Announcement