Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Model for data with two dimensions but no time

    Hi all,

    I have a dataset with two dimensions: individuals and documents.

    I have a dummy variable ("consult") indicating whether a given individual has consulted a given document or not.

    I have a series of variables related to the individuals (e.g. gender) and a series of variables related to the documents (e.g. printed or numerical).

    I would like to model the probability that a document is consulted using individual-related and document-related variables.

    My first try was a very simple logit :
    logit consult individual_var document_var

    I am now wondering wether I could improve my model. I am feeling that the data structure is not very different from panel data, except for the time dimension. Could I use an "xt-model"?

    Any advice will be appreciated! Thanks in advance.



  • #2
    Anne:
    welcome to this forum.
    If you have repeated observation for individual (in that each individual can consult more than one document) you can cluster your standard errors on individual.
    That said, please note that your chances of getting helpful replies are conditional on posting what you typed and what Stata gave you back (like you partially did, but without tagging your codes - please, see the FAQ on this and other posting-related topics. Thanks) and on sharing an excerpt/example of your dataset (or a fake example mimicking what's the matter with your real dataset, if you're under confidentiality Agreement) via -dataex-. Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo,

      Thanks for your answer. Sorry if my post was imprecise.

      Indeed I have repeated observations for each individual (each individual can consult more than one document).

      Here is an excerpt of my data set (with just a few of the variables I have):

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(Indiv InformationNum Consult) long i_sexe float(i_NoteMatiere_imputee d_TypActi_3 d_TypActi_Actif)
       1 1 0 2     6 1 0
       1 2 0 2     6 1 0
       1 3 0 2     6 1 0
       1 4 0 2     6 3 0
       1 5 0 2     6 3 1
      15 1 1 1 16.12 1 0
      15 2 0 1 16.12 1 0
      15 3 1 1 16.12 1 0
      15 4 1 1 16.12 3 0
      15 5 1 1 16.12 3 1
      end
      label values i_sexe i_sexe
      label def i_sexe 1 "femme", modify
      label def i_sexe 2 "homme", modify
      Here is the first very simple model I used:

      Code:
      logit Consult  i.i_sexe  i_NoteMatiere_imputee i.d_TypActi_3 i.d_TypActi_Actif
      Here the output:
      Code:
      Iteration 0:   log likelihood = -10475.777  
      Iteration 1:   log likelihood = -8526.9932  
      Iteration 2:   log likelihood = -8387.8451  
      Iteration 3:   log likelihood = -8386.9422  
      Iteration 4:   log likelihood = -8386.9422  
      
      Logistic regression                             Number of obs     =     17,940
                                                      LR chi2(5)        =    4177.67
                                                      Prob > chi2       =     0.0000
      Log likelihood = -8386.9422                     Pseudo R2         =     0.1994
      
      ---------------------------------------------------------------------------------------
                    Consult |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      ----------------------+----------------------------------------------------------------
                     i_sexe |
                     homme  |  -.4023254   .0399451   -10.07   0.000    -.4806163   -.3240344
      i_NoteMatiere_imputee |   .1350651   .0040581    33.28   0.000     .1271114    .1430187
                            |
                d_TypActi_3 |
                         2  |    1.44667   .0526922    27.46   0.000     1.343395    1.549944
                         3  |  -1.050469   .0606042   -17.33   0.000    -1.169251   -.9316873
                            |
          1.d_TypActi_Actif |  -.5101569   .0451286   -11.30   0.000    -.5986073   -.4217064
                      _cons |  -2.694959   .0680842   -39.58   0.000    -2.828401   -2.561516
      ---------------------------------------------------------------------------------------
      So a first suggestion would be to cluster the standard errors on the individuals. I will try that, thanks!

      Any other suggestion?

      Best regards,
      Anne

      Comment


      • #4
        At least at first glance, I would be inclined to approach this as a mixed model with crossed individual and document random effects. It seems logical to assume that different individuals differ in their inclination to consult any document at all, and different documents are identifiable (perhaps by title or by length) as being more or less likely to be useful if consulted.

        Comment


        • #5
          Thanks Clyde, that sounds like very good advice!

          Here is what I tried. Am I right regarding the specification of the random effects? (they should be non-nested)

          Code:
          melogit Consult i.i_sexe  i_NoteMatiere_imputee  i.d_TypActi_3 i.d_TypActi_Actif || _all:R.Indiv  || Infor
          > mationNum:
          note: crossed random-effects model specified; option intmethod(laplace) implied
          
          Fitting fixed-effects model:
          
          Iteration 0:   log likelihood = -8494.1795  
          Iteration 1:   log likelihood = -8387.3748  
          Iteration 2:   log likelihood = -8386.9423  
          Iteration 3:   log likelihood = -8386.9422  
          
          Refining starting values:
          
          Grid node 0:   log likelihood = -6899.2873
          
          Fitting full model:
          
          Iteration 0:   log likelihood = -6899.2873  
          Iteration 1:   log likelihood = -6886.5898  (not concave)
          Iteration 2:   log likelihood =  -6885.283  (not concave)
          Iteration 3:   log likelihood = -6885.1533  
          Iteration 4:   log likelihood = -6879.7107  (not concave)
          Iteration 5:   log likelihood = -6879.3646  (not concave)
          Iteration 6:   log likelihood = -6879.3495  (not concave)
          Iteration 7:   log likelihood =  -6879.348  
          Iteration 8:   log likelihood = -6879.3409  (not concave)
          Iteration 9:   log likelihood = -6879.3402  (not concave)
          Iteration 10:  log likelihood = -6879.3402  (not concave)
          Iteration 11:  log likelihood = -6879.3401  
          Iteration 12:  log likelihood = -6879.3326  
          Iteration 13:  log likelihood =  -6878.452  
          Iteration 14:  log likelihood = -6878.3652  
          Iteration 15:  log likelihood = -6878.3621  
          Iteration 16:  log likelihood =  -6878.362  
          
          Mixed-effects logistic regression               Number of obs     =     17,940
          
          -------------------------------------------------------------
                          |     No. of       Observations per Group
           Group Variable |     Groups    Minimum    Average    Maximum
          ----------------+--------------------------------------------
                     _all |          1     17,940   17,940.0     17,940
             Informatio~m |         78        230      230.0        230
          -------------------------------------------------------------
          
          Integration method:     laplace
          
                                                          Wald chi2(5)      =     280.77
          Log likelihood =  -6878.362                     Prob > chi2       =     0.0000
          ---------------------------------------------------------------------------------------
                        Consult |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          ----------------------+----------------------------------------------------------------
                                |
                         i_sexe |
                         homme  |  -.6479697   .1772484    -3.66   0.000    -.9953702   -.3005692
          i_NoteMatiere_imputee |     .19098   .0203371     9.39   0.000     .1511199    .2308401
                                |
                    d_TypActi_3 |
                             2  |   1.819509   .4183267     4.35   0.000      .999604    2.639415
                             3  |  -1.273481   .4237598    -3.01   0.003    -2.104035   -.4429266
                                |
              1.d_TypActi_Actif |  -.3608841   .2671318    -1.35   0.177    -.8844527    .1626845
                          _cons |  -3.779886   .5549849    -6.81   0.000    -4.867636   -2.692135
          ----------------------+----------------------------------------------------------------
          _all>Indiv            |
                      var(_cons)|   1.336868   .1491291                      1.074326     1.66357
          ----------------------+----------------------------------------------------------------
          InformationNum        |
                      var(_cons)|   .8627758    .150751                      .6125891    1.215141
          ---------------------------------------------------------------------------------------
          LR test vs. logistic model: chi2(2) = 3017.16             Prob > chi2 = 0.0000

          Comment


          • #6
            Yes, that code looks correct.

            Comment


            • #7
              Thanks a lot Clyde. I think I'll go with this approach.

              Comment

              Working...
              X