Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression model for repeated measurements that are clustered across several dimensions - xtivreg, ivregress

    Dear Statalist gurus,

    We're having a problem that may be more of a statistics issue than a Stata issue. We're using Stata 16 on Windows.
    Please excuse us if we have missed any important guideline for posting and let us know how we can help you make sense of our problem.

    We are PhD students from Germany and are exploring how the use of certain stylistic characteristics, let's say e.g. sentiment, in questions, influence the length of the corresponding answers in the context of personality traits of the answering individual.
    The data are structured as follows:
    Length of answer (qlength) Sentiment of question (sent) Extroversion of individual answering (extro) Control variables on individuals asking and answering (ctrls) ... Conversation date (date)
    12 0.5 2.5 x x
    ... ... ... ... ... ...
    The data are clustered across several dimensions:

    - There are several individuals who answer any number of questions across different conversations
    - There are several individuals who ask any number of questions across different conversations
    - The conversations take place in different settings and we are measuring the control variables at each conversation date
    - The conversations take place at different points in time, where there are multiple Q-A-pairs within each conversation

    We would like to run a regression that estimates the effect of the question sentiment on the length of the corresponding answer, and check the interaction between sent and extro.
    We currently are working with a fixed effects model, and are fixing the individual who asks the question (for the reason that we have the least control variables available for that person).
    We would like to use 2SLS to help us with the issue that sent might be endogenous. Let's say that we have 2 instruments for sent that are called instru1 and instru2.

    - We would describe our data as an unbalanced panel with the additional issue of repeated measurements at each conversation date. Is that right?
    - Which 2SLS estimator should we use to run the regression where qlength = sent##extro ctrls, where we want to instrument sent with instru1 and instru2.
    - Does it make sense to use ivregress 2slsor xtivreg, fe? How would the command need to look like to tell STATA how to instrument the endogenous regressors in the context of the interaction term?
    - Conceptually, are there other regression models that are more suitable to handle the structure of our data?

    Yours,

    Andy and Anna

  • #2
    Anyone?

    Comment


    • #3
      Andy:
      1) I'm not sure you actually have a panel dataset, as per your description I'm not able to get whether the same sample of individuals (provided that they represent the -panelid-) in measured at different, equally spaced, points in time;
      2) panel dataset unbalancedness is simply a matter of fact conditional on the data at hand;
      3) you do not report on the set criteria/tests that led to go -fe- (-hausman-? -xtoverid-? litetrature? else?);
      4) you can go -xtivreg,fe- if you detect/suspect endogeneity.
      As you have an interaction and one of the term may be endogenous, you can mimick the following toy-example, in which the two main conditional effects are separated from the interaction they contribute to (instead of using -##- -fvvarlist- operator):
      Code:
      . use https://www.stata-press.com/data/r16/nlswork
      (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
      
      . xtivreg ln_w age c.tenure#c.age not_smsa (tenure = union south), fe first
      
      First-stage within regression
      
      Fixed-effects (within) regression               Number of obs     =     19,007
      Group variable: idcode                          Number of groups  =      4,134
      
      R-sq:                                           Obs per group:
           within  = 0.9725                                         min =          1
           between = 0.9840                                         avg =        4.6
           overall = 0.9790                                         max =         12
      
                                                      F(5,14868)        =  105075.52
      corr(u_i, Xb)  = 0.2725                         Prob > F          =     0.0000
      
      --------------------------------------------------------------------------------
              tenure |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      ---------------+----------------------------------------------------------------
                 age |   -.052472   .0010373   -50.58   0.000    -.0545053   -.0504387
                     |
      c.tenure#c.age |   .0263435   .0000437   602.97   0.000     .0262578    .0264291
                     |
            not_smsa |  -.0071715   .0252084    -0.28   0.776    -.0565831    .0422402
               union |   .0709441   .0140386     5.05   0.000     .0434267    .0984616
               south |   -.055251   .0267877    -2.06   0.039    -.1077582   -.0027438
               _cons |   2.108377   .0321117    65.66   0.000     2.045434     2.17132
      ---------------+----------------------------------------------------------------
             sigma_u |  .40814343
             sigma_e |  .51363392
                 rho |    .387037   (fraction of variance due to u_i)
      --------------------------------------------------------------------------------
      F test that all u_i=0: F(4133, 14868) = 2.41                 Prob > F = 0.0000
      
      Fixed-effects (within) IV regression            Number of obs     =     19,007
      Group variable: idcode                          Number of groups  =      4,134
      
      R-sq:                                           Obs per group:
           within  =      .                                         min =          1
           between = 0.0828                                         avg =        4.6
           overall = 0.0413                                         max =         12
      
                                                      Wald chi2(4)      =  112330.20
      corr(u_i, Xb)  = -0.5194                        Prob > chi2       =     0.0000
      
      --------------------------------------------------------------------------------
             ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      ---------------+----------------------------------------------------------------
              tenure |   1.367147   .2530389     5.40   0.000     .8712001    1.863094
                 age |   .0811735   .0133768     6.07   0.000     .0549554    .1073915
                     |
      c.tenure#c.age |  -.0355936   .0066689    -5.34   0.000    -.0486645   -.0225228
                     |
            not_smsa |  -.0879237   .0355536    -2.47   0.013    -.1576075   -.0182399
               _cons |  -1.442162   .5340497    -2.70   0.007     -2.48888   -.3954437
      ---------------+----------------------------------------------------------------
             sigma_u |  .61735654
             sigma_e |  .72344279
                 rho |  .42137059   (fraction of variance due to u_i)
      --------------------------------------------------------------------------------
      F  test that all u_i=0:     F(4133,14869) =     1.11      Prob > F    = 0.0000
      --------------------------------------------------------------------------------
      Instrumented:   tenure
      Instruments:    age c.tenure#c.age not_smsa union south
      --------------------------------------------------------------------------------
      See https://www.stata.com/statalist/arch.../msg00081.html as far as the missing -within R-sq- is concerned.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment

      Working...
      X