Can I generate a cross-moment matrix and then use it in repeated regressions?

Michael Anbar

Join Date: Aug 2014

Posts: 116
#1

Can I generate a cross-moment matrix and then use it in repeated regressions?

03 Dec 2015, 13:57

Is it possible to compute the cross-moment matrix of a set of variables, and then tell the -regress- command to use that matrix instead of re-calculating it again? This has two benefits. The first is for performance, but the second is that it allows me to ensure that any regression using a SUBSET of the variables in the cross-moment matrix is run over *precisely* the same sample. For example, if I build this data set:

Code:

cls freduse UNRATE GDPC1, clear rename UNRATE unemp rename GDPC1 gdp gen t = qofd(daten) collapse (mean) unemp gdp, by(t) fast tsset t, q gen lgdp = log(gdp)

and run two regressions:

Code:

regress gdp L4.gdp L4.unemp regress gdp L4.gdp L3.unemp

these two regressions repeat some calculations internally and also use slightly different samples (because L4.unemp has one fewer observations than L3.unemp).

Note that the example in the documentation for -matrix accum- ( [P] matrix accum ) isn't helpful here because it's just the basic calculation of

Code:

syminv(XX)*Xy

not the more complex (and more useful) calculation that first calculates the cross-moment matrix and then uses it to estimate a regression involving a SUBSET of the variables in it instead of the entire matrix.

For example, in RATS, I can compute a cross-moment matrix using the CMOMENT command, and then tell LINREG to use that matrix when performing its calculation.

Code:

cmoment(noprint) # gdp{0 to 4} unemp{1 to 4} constant linreg(cmom, print) gdp # gdp{4} unemp{4} constant linreg(cmom, print) gdp # gdp{4} unemp{3} constant

This guarantees that the regressions are run on *exactly* the same sample, and the cross-moment matrix isn't calculated multiple times.

Is this at all possible in Stata? This is an useful feature for applied time series work, especially in econometrics, in which model selection methods often require running similar regressions repeatedly, while guaranteeing that changes in the results are driven only by changes in the variables included, NOT by changes in the sample.

Last edited by Michael Anbar; 03 Dec 2015, 14:01.
Tags: None
Michael Anbar

Join Date: Aug 2014

Posts: 116
#2

04 Dec 2015, 10:13

Also posted on Stack Overflow.

statistics - Generating a cross-moment matrix in Stata and using it in repeated regressions? - Stack Overflow

http://stackoverflow.com

Is it possible to compute the cross-moment matrix of a set of variables, and then tell the regress command to use that matrix instead of re-calculating it again? This has two benefits. The first is...
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5010
#3

09 Dec 2015, 16:59

Michael also mentioned this in the Wish List for Stata 15 thread. But, if I understand the Q, the wish has already been fulfilled. See pages 8-10 of

http://www3.nd.edu/~rwilliam/stats2/OLS-Stata9.pdf

for a discussion of how to analyze means, correlations, and standard deviations using the corr2data command.

As Clyde pointed out in the Wish List thread, you can also use sem with ssd. Michael responded

-sem- doesn't support factor-variable notation (according to the linear regression example in [sem] intro 6, Structural models 1). I can bypass this by using -xi-, but as the documentation states, factor variables are the recommended method (unless, of course, the command doesn't support them). Since -gsem- supports them, though, maybe that's where I should look.

But if you are analyzing covariances you can't use factor variables anyway. Students who don't read my notes carefully are always asking questions like "how come gender has codes like -3.219, 2.72, etc." There are an infinite number of ways to create data that reproduce the the correlation matrix; corr2data will reproduce the correlations but it won't reproduce the original data set.

In the other thread Michael also raises concerns about massive data sets. Again, not an issue once you have created the covariances. If you have lots and lots of variables in the model, that may slow sem down, but the N of the data set is not an issue if you are using the summary statistics.

Whether Michael can actually use summary statistics is another matter. If he is always changing his models, adding new variables, modifying his sample, or whatever, he may have to break down and use the original data all the time.

One other trick with corr2data: lets say you have 100 million cases. You could just have corr2data create a data set with 100 cases; and then, on your regress command, say fw=1000000. You don't need a lot of cases for corr2data to accurately reproduce a covariance matrix.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5010
#4

09 Dec 2015, 17:59

Also, I suspect you would have to create lagged vars, e.g.

Code:

gen L4gdp = L4.gdp

and then use them when creating the covariance matrix.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment

Announcement

Can I generate a cross-moment matrix and then use it in repeated regressions?

Comment

Comment

Comment