Regression model for repeated measurements that are clustered across several dimensions - xtivreg, ivregress

Andy Bouta

Join Date: Oct 2021
Posts: 2

Regression model for repeated measurements that are clustered across several dimensions - xtivreg, ivregress

05 Oct 2021, 08:54

Dear Statalist gurus,

We're having a problem that may be more of a statistics issue than a Stata issue. We're using Stata 16 on Windows.
Please excuse us if we have missed any important guideline for posting and let us know how we can help you make sense of our problem.

We are PhD students from Germany and are exploring how the use of certain stylistic characteristics, let's say e.g. sentiment, in questions, influence the length of the corresponding answers in the context of personality traits of the answering individual.
The data are structured as follows:

Length of answer (*qlength*)	Sentiment of question (sent)	Extroversion of individual answering (extro)	Control variables on individuals asking and answering (ctrls)	...	Conversation date (date)
12	0.5	2.5	x		x
...	...	...	...	...	...

The data are clustered across several dimensions:

- There are several individuals who answer any number of questions across different conversations
- There are several individuals who ask any number of questions across different conversations
- The conversations take place in different settings and we are measuring the control variables at each conversation date
- The conversations take place at different points in time, where there are multiple Q-A-pairs within each conversation

We would like to run a regression that estimates the effect of the question sentiment on the length of the corresponding answer, and check the interaction between sent and extro.
We currently are working with a fixed effects model, and are fixing the individual who asks the question (for the reason that we have the least control variables available for that person).
We would like to use 2SLS to help us with the issue that sent might be endogenous. Let's say that we have 2 instruments for sent that are called instru1 and instru2.

- We would describe our data as an unbalanced panel with the additional issue of repeated measurements at each conversation date. Is that right?
- Which 2SLS estimator should we use to run the regression where qlength = sent##extro ctrls, where we want to instrument sent with instru1 and instru2.
- Does it make sense to use ivregress 2slsor xtivreg, fe? How would the command need to look like to tell STATA how to instrument the endogenous regressors in the context of the interaction term?
- Conceptually, are there other regression models that are more suitable to handle the structure of our data?

Yours,

Andy and Anna

Tags: fixed effects, interaction, panel, regression, statistics

Andy Bouta

Join Date: Oct 2021

Posts: 2
#2

18 Oct 2021, 09:40

Anyone?
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

19 Oct 2021, 02:52

Andy:
1) I'm not sure you actually have a panel dataset, as per your description I'm not able to get whether the same sample of individuals (provided that they represent the -panelid-) in measured at different, equally spaced, points in time;
2) panel dataset unbalancedness is simply a matter of fact conditional on the data at hand;
3) you do not report on the set criteria/tests that led to go -fe- (-hausman-? -xtoverid-? litetrature? else?);
4) you can go -xtivreg,fe- if you detect/suspect endogeneity.
As you have an interaction and one of the term may be endogenous, you can mimick the following toy-example, in which the two main conditional effects are separated from the interaction they contribute to (instead of using -##- -fvvarlist- operator):

Code:

. use https://www.stata-press.com/data/r16/nlswork
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtivreg ln_w age c.tenure#c.age not_smsa (tenure = union south), fe first

First-stage within regression

Fixed-effects (within) regression               Number of obs     =     19,007
Group variable: idcode                          Number of groups  =      4,134

R-sq:                                           Obs per group:
     within  = 0.9725                                         min =          1
     between = 0.9840                                         avg =        4.6
     overall = 0.9790                                         max =         12

                                                F(5,14868)        =  105075.52
corr(u_i, Xb)  = 0.2725                         Prob > F          =     0.0000

--------------------------------------------------------------------------------
        tenure |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
           age |   -.052472   .0010373   -50.58   0.000    -.0545053   -.0504387
               |
c.tenure#c.age |   .0263435   .0000437   602.97   0.000     .0262578    .0264291
               |
      not_smsa |  -.0071715   .0252084    -0.28   0.776    -.0565831    .0422402
         union |   .0709441   .0140386     5.05   0.000     .0434267    .0984616
         south |   -.055251   .0267877    -2.06   0.039    -.1077582   -.0027438
         _cons |   2.108377   .0321117    65.66   0.000     2.045434     2.17132
---------------+----------------------------------------------------------------
       sigma_u |  .40814343
       sigma_e |  .51363392
           rho |    .387037   (fraction of variance due to u_i)
--------------------------------------------------------------------------------
F test that all u_i=0: F(4133, 14868) = 2.41                 Prob > F = 0.0000

Fixed-effects (within) IV regression            Number of obs     =     19,007
Group variable: idcode                          Number of groups  =      4,134

R-sq:                                           Obs per group:
     within  =      .                                         min =          1
     between = 0.0828                                         avg =        4.6
     overall = 0.0413                                         max =         12

                                                Wald chi2(4)      =  112330.20
corr(u_i, Xb)  = -0.5194                        Prob > chi2       =     0.0000

--------------------------------------------------------------------------------
       ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
        tenure |   1.367147   .2530389     5.40   0.000     .8712001    1.863094
           age |   .0811735   .0133768     6.07   0.000     .0549554    .1073915
               |
c.tenure#c.age |  -.0355936   .0066689    -5.34   0.000    -.0486645   -.0225228
               |
      not_smsa |  -.0879237   .0355536    -2.47   0.013    -.1576075   -.0182399
         _cons |  -1.442162   .5340497    -2.70   0.007     -2.48888   -.3954437
---------------+----------------------------------------------------------------
       sigma_u |  .61735654
       sigma_e |  .72344279
           rho |  .42137059   (fraction of variance due to u_i)
--------------------------------------------------------------------------------
F  test that all u_i=0:     F(4133,14869) =     1.11      Prob > F    = 0.0000
--------------------------------------------------------------------------------
Instrumented:   tenure
Instruments:    age c.tenure#c.age not_smsa union south
--------------------------------------------------------------------------------

See https://www.stata.com/statalist/arch.../msg00081.html as far as the missing -within R-sq- is concerned.

Kind regards,
Carlo
(Stata 19.0)

Announcement

Regression model for repeated measurements that are clustered across several dimensions - xtivreg, ivregress

Comment

Comment