Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Modelling temporal dependence in within-between "hybrid" models

    Dear Statalisters,

    I would like to know how to properly specify a temporal variable in within-between "hybrid" random effects models.

    My model, a logistic regression concerning inter-state conflict initiation , takes the following form:

    y: binary dependent variable (1 = conflict initiation)
    z1: time-variant predictor
    z2: time-variant predictor
    t t2 t3 : cubic polynomial of years since conflict initiation

    Following Schunck (2013), the within-between variables are constructed:
    Code:
    by cluster, sort: center z1, prefix(w) mean(b)
    by cluster, sort: center z2, prefix(w) mean(b)
    And an interaction between z1 and z2 may take the following form
    Code:
    gen wz1Xbz2 = wz1*bz2
    by cluster, sort: center wz1Xbz2, prefix(w_) mean(b_)
    And so the model takes the following form:
    Code:
    logit y wz1 bz1 wz2 bz2  w_wz1Xbz2  b_wz1Xbz2
    The next step is to include t t2 t3 in the model to account for potential temporal dependence, but I am unsure about how to do so. Specifically, is it correct to simply include these variables in the model like so:
    Code:
    logit y wz1 bz1 wz2 bz2  w_wz1Xbz2  b_wz1Xbz2  t t2 t3
    Or do I instead need to generate mean and centred versions of these time variables as well; I am tending toward the former since it does make much sense to me to model peace years as a cluster-average or cluster-deviation, but yet these variables are obviously not time-invariant - and so herein lies my dilemma..)

    Further to this, I would also like to include an interaction between ,say, wz1 and t t2 t3. But , of course, to do so, I first need to know how these time variables should be specified in the model.

    Finally, I should add that t t2 t3 - as a running count of the number years since conflict - are distinct from the actual time series variable, which is simply the year of observation.

    Any advice would be hugely appreciated.

    Matthew

    Last edited by Matthew Alexander; 31 Jul 2021, 12:38.

  • #2
    My theory is that you generate any interaction and squared terms yourself, and then let Schunk's xthybrid command (available from SSC) take it from there. Something like

    xthybrid y z1 z2 z1z2 t t2 t3, family(binomial) link(logit) cluster(clustervar)

    That is, I think you need xthybrid to generate all the mean-centered terms in order for it to work right.

    But, I have been wrong before.

    Even if I am totally wrong about this (I have been using xthybrid for a good 2 or 3 days now), shouldn't you be using xtlogit, re?

    I'll be interested to see if anyone has a more expert opinion than I have.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Thanks so much for replying Richard.

      I was not at all aware of the xthybrid command. I have had a quick look at it and seen that there is an option - use- "which splits between- and within-cluster effects only for selected explanatory variables". I think that I can use this option to model t t2 t3, since it does not make much sense to me to split the time duration variable - so thanks for pointing me that way.

      The question of how to treat interactions now comes to the fore. You see I would like to estimate substantive effects after fitting the model, but estimates from - margins - will not take account of the effects of the interaction z1z2 on z1 when the interaction is manually generated since z1z2 will not change with z1.

      Do you perhaps know if it possible to use standard factor variable notation,#, to specify interactions using xthybrid? And if doing so results in correct estimates from margins?

      Once again, I really appreciate your input.
      Last edited by Matthew Alexander; 31 Jul 2021, 15:10.

      Comment


      • #4
        As luck would have it, I've been playing around with xthybrid lately because I want to do marginal effects after running it. But, the command does not support factor variable notation. I can see why it doesn't. Suppose a categorical variable has 3 categories. You would need to split it up into multiple categories to get the means for each category. Or, if you have an interaction, there might be 8 combos in that interaction, and it seems like you'd need to have a mean for each combo.

        Then again, maybe not. ;-)

        So, basically, you are asking questions I started asking 2 or 3 days ago. I am not clear on the answers yet, especially if you start adding squared terms and interaction terms to the models. If you only had continuous vars and binary vars, I suspect it wouldn't be that hard.

        So, maybe ask me again a week or two! Or better yet, maybe someone else will chime in.

        Also, I am going to take another look at this paper, which introduced the xthybrid command:

        https://journals.sagepub.com/doi/10....867X1701700106
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          This do-file shows my theory on how to proceed. The results seem reasonable but that doesn't mean they are right, It is easier just to compute means for everything, e.g. timesquared, but if that actually does harm, the procedure could be tweaked. The big problem is that xthybrid (and mundlak) don't support factor variable notation, so I think you need to create various variables yourself, e.g. timesquared, and then let xthybrid or mundlak compute the mean variables from there.

          If somebody knows that I am doing this horribly wrong please let me know before I spend too much more time on it. ;-)

          Incidentally a new version of xthybrid just came out yesterday so if you are using it make sure you have the current version.

          Code:
          use https://www.stata-press.com/data/r17/union, clear
          
          //# Regular FE Model. Run for comparison purposes
          xtlogit union age grade i.not_smsa i.south year i.south#c.year, fe
          est store fe
          
          //# Regular RE Model. Run for comparison purposes
          xtlogit union age grade i.not_smsa i.south year i.south#c.year, re
          est store re
          
          //# Hausman test says RE is problematic
          hausman fe re
          
          //# Compute any needed variables yourself, rather than use factor variables
          // This can include interactions, squared terms
          // Also compute dummies for any categorical var with more than 2 categories
          // or binary variables not coded 0/1
          gen southyear = south*year
          
          //# Let mundlak (available from SSC) compute any mean variables.
          // Could also use xthybrid; or compute mean terms yourself
          // xi might also work. Factor variable notation not allowed.
          mundlak union age grade not_smsa south year southyear, keep
          
          //# Run xtlogit, with mean vars added on.
          // Results are similar to those given by xtlogit, fe
          xtlogit union age grade i.not_smsa i.south year i.south#c.year mean__*
          est store hybrid
          est table fe hybrid re, stats(N N_g chi2 df_m p) t(%7.2f) b(%7.4f)
          
          //# Hopefully things like margins work ok!!! But not sure
          margins not_smsa south
          margins, dydx(age grade not_smsa south year)
          
          //# If this works correctly -- An ideal program would
          // accept factor variable notation, compute the
          // mean variables correctly, and then run xtlogit.
          Last edited by Richard Williams; 01 Aug 2021, 07:36.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Thanks for sharing this, Richard.

            If I understand correctly, the inclusion of the cluster means in a standard xtlogit model allows one to estimate within effects and their interactions while adjusting for cluster mean between effects, or vice-versa. And this, in turn, means that you can estimate within-cluster margins which are adjusted for interactions and, if desired, adjusted at different levels of the cluster mean variables - this is brilliant stuff!

            The only reservation I have is that margins may not be able to adjust for interactions between within and between effects, since, according to Schunck 2013 p.72, the interaction itself has to be cluster mean centered. And, of course, I just happen to want to do that one particularly difficult thing

            It's for this reason that I've taken a more manual, and thus error-prone and cumbersome, approach - I'd like to hear your thoughts (and, please ,correct me if I have gone wrong somewhere)


            Code:
            ********Construct variables/model***************
            
            use https://www.stata-press.com/data/r17/union, clear
            
            /// Manually generate within-between variables on full multivariate sample
            mark nonmiss
            markout nonmiss union age grade not_smsa south year
            sort idcode
            foreach v of varlist age grade year {
                by idcode: center `v' if nonmiss==1, prefix(w_) mean(b_)
            }
            
            /// Manually generate interaction between within-cluster grade and between-cluster age
            /// and cluster mean center
            gen w_gradeXb_age = w_grade*b_age
            by idcode: center w_gradeXb_age , prefix(w_) mean(b_)
            
            /// Run xtlogit
            xtlogit union w_age b_age w_grade b_grade w_w_gradeXb_age b_w_gradeXb_age w_year b_year i.not_smsa i.south, re
            
            
            *********** Manually estimate margins for within-cluster grade(at 0 1)*****************
            
            /// Save original variable values
            foreach v of varlist w_grade w_gradeXb_age w_w_gradeXb_age b_w_gradeXb_age {
                clonevar clone_`v' = `v'
            }
            
            /// Manually change predictor value, thus replicating margins procedure (loop as needed)
            replace w_grade = 0 if nonmiss == 1
            
            /// Re-generate interaction
            drop w_w_gradeXb_age b_w_gradeXb_age
            replace w_gradeXb_age = w_grade*b_age if nonmiss == 1
            by idcode: center w_gradeXb_age , prefix(w_) mean(b_)
            
            /// Generate predictive margins at within-cluster grade = 0
            predictnl pr_at0 = predict(pr) if e(sample)
            
            ///Repeat process, this time with predictor set to 1
            replace w_grade = 1 if nonmiss == 1
            
            drop w_w_gradeXb_age b_w_gradeXb_age
            replace w_gradeXb_age = w_grade*b_age if nonmiss == 1
            by idcode: center w_gradeXb_age , prefix(w_) mean(b_)
            
            predictnl pr_at1 = predict(pr) if e(sample)
            
            /// Reset variables to original values
            foreach v of varlist w_grade w_gradeXb_age w_w_gradeXb_age b_w_gradeXb_age {
                replace `v' = clone_`v'
            }
            
            
            **************** Construct matrices and feed to nlcom via e for estimation of substantive effects**************
            /// Program to post results matrixes to e
            capture program drop epost_bv
            program define epost_bv, eclass
              args b V
              ereturn post `b' `V'
             end
            
            /// Construct estimates matrix: e(b)
            mat b = J(1,2,.)
            matrix colnames b = pr_at0 pr_at1
            sum pr_at0, meanonly
            mat b[1,1] = r(mean)
            sum pr_at1, meanonly
            mat b[1,2] = r(mean)
            
            /// Construct variance-covariance matrix: e(V)
            correlate pr_at0 pr_at1, covariance
            mat V = r(C)
            matrix colnames V = pr_at0 pr_at1
            matrix rownames V = pr_at0 pr_at1
            sum pr_at0, detail
            local se0 = r(sd)
            mat V[1,1] = `se0'^2
            sum pr_at1, detail
            local se1 = r(sd)
            mat V[2,2] = `se1'^2
            
            /// Post results to e
            epost_bv b V
            ereturn list
            matrix list e(b)
            matrix list e(V)
            
            /// Estimate relative risk % change
            nlcom (rr:(_b[pr_at1]/_b[pr_at0]-1)*100)
            I realized after the fact that there is not probably not enough variation in within-cluster grade, hence the enormous risk change....
            I should add, too, that this manual approach follows from two particularly enlightening discussions on this forum:
            https://www.statalist.org/forums/for...ithout-margins
            https://www.statalist.org/forums/for...ce-in-a-matrix
            Last edited by Matthew Alexander; 01 Aug 2021, 21:54.

            Comment


            • #7
              Well, I certainly hope it is a little easier than that!

              I can't figure out what to look at. Like, where would I find the marginal effect for age, or something like that?
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                As do I! But until then, this does do what I want, I think.

                I do not know how to calculate the marginal effect of a continuous variable. But if you wanted, say, predictive margins for between-cluster age at intervals of, say, 5 years while adjusting for an interaction with within-cluster grade, this should work:
                Code:
                use https://www.stata-press.com/data/r17/union, clear
                
                /// Manually generate within-between variables on full multivariate sample
                mark nonmiss
                markout nonmiss union age grade
                
                sort idcode
                foreach v of varlist age grade {
                    by idcode: center `v' if nonmiss==1, prefix(w_) mean(b_)
                }
                
                /// Manually generate interaction for within-cluster grade and between-cluster age
                /// and cluster mean center
                gen w_gradeXb_age = w_grade*b_age
                by idcode: center w_gradeXb_age , prefix(w_) mean(b_)
                
                /// Run xtlogit
                xtlogit union w_age b_age w_grade b_grade w_w_gradeXb_age b_w_gradeXb_age, re
                
                
                *********** Estimate margins for between-cluster age*****************
                
                /// Save original variable values
                foreach v of varlist b_age w_gradeXb_age w_w_gradeXb_age b_w_gradeXb_age {
                    clonevar clone_`v' = `v'
                }
                
                /// Manually change predictor value by 5 year age intervals, thus replicating margins process
                forvalues i = 20(5)45 {
                    qui replace b_age =  `i' if nonmiss == 1
                
                    /// Re-generate interaction
                    qui drop w_w_gradeXb_age b_w_gradeXb_age
                    qui replace w_gradeXb_age = w_grade*b_age if nonmiss == 1
                    qui by idcode: center w_gradeXb_age , prefix(w_) mean(b_)
                
                    /// Generate predictions for between-cluster age adjusted for interaction with within-cluster grade
                    qui predictnl pr_at`i' = predict(pr) if e(sample)
                    
                    /// Predictive margin with standard error and 95% CI
                    qui sum pr_at`i', detail
                    local av: di% 5.4f r(mean)
                    local se: di% 5.4f r(sd)
                    local lb: di% 5.4f `av' - invnorm(.975) * `se'
                    local ub: di% 5.4f `av' + invnorm(.975) * `se'
                    di "Predictive margin for between-cluster age at `i' = `av'"
                    di "Standard error = `se'"
                    di "95% CI: lb= `lb', ub = `ub'"
                    di " "
                }
                
                /// Reset variables to original values
                foreach v of varlist b_age w_gradeXb_age w_w_gradeXb_age b_w_gradeXb_age {
                    qui replace `v' = clone_`v'
                }
                How get the marginal effect from here I do not know. Though if you substract, say, the predictions at 25 from that at 20, you get .0037 - the marginal effect is at least in that ballpark.

                Of course, you could also set all other covariates at their means, or whatever values you are interested in.
                Last edited by Matthew Alexander; 02 Aug 2021, 04:01.

                Comment


                • #9
                  When I estimate hybrid models, I like to compare results with an xtlogit, fe model. I’ve found the hybrid models with interactions and squared terms like I’ve described seem to work very well, producing similar coefficients. You might try estimating an fe model and see how the coefficients compare with what you think is the right approach.
                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  StataNow Version: 19.5 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://www3.nd.edu/~rwilliam

                  Comment


                  • #10
                    The approach I show in #5 does indeed perfectly match schunk’s model 5, table 2. Thus once again proving the old adage that a few months in the laboratory can often save a few hours in the library. I think I’ve come up up with a way to make these models less tedious and error prone. Further, as Shunck says the approach produces correct results when used with margins, which is what I want.

                    i don’t know how to do what you want, or why exactly you want to do it. All it would take is one coding error to screw you up, so make sure you really really really want to do this! Good luck.
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    StataNow Version: 19.5 MP (2 processor)

                    EMAIL: [email protected]
                    WWW: https://www3.nd.edu/~rwilliam

                    Comment

                    Working...
                    X