Macro for cross-sectional regressions

Inigo Sanchez

Join Date: Apr 2016

Posts: 39
#1

Macro for cross-sectional regressions

17 May 2016, 15:48

Hi everyone,

I have a dataset containing 1,320 columns, each column representing a month from January 2003 to December 2013 for 1 independent variable and 9 dependent variables (12 months x 11 years x 10 variables = 1,320 columns). I want to run 132 cross sectional regressions (i.e. one for each month from January 2003 to December 2013) by regressing the independent variable on the 9 dependent variables and store the coefficients in a file. My aim is to write a code that will allow me to run the regressions without performing 132 individual cross-sectional regressions. Plus, I would like to get as much information as possible with regard to the individual regressions (e.g. R^2, t-stats, etc.). One important consideration is that each of the variables considered displays the exact same number of columns under the same headings (i.e. months from January 2003 to December 2013 for each variable). As a consequence, should columns be renamed with the aim of not getting an error from Stata? Thank you in advance.
Tags: None
Sebastian Geiger

Join Date: Oct 2015

Posts: 124
#2

17 May 2016, 16:12

Hey,

It may be me, but I don't quite get what you're saying. Do you really have 1320 columns (= 1320 variables)? How many values (rows) are in these columns? Just one? How can the dependent/variables vary within one month (it's essential that there is some variation, otherwise a regression is not possible)? Or do you have micro data with several observations for different individuals? Maybe you can post a data example using the -dataex- command (available on SSC).

Best regards,
Sebastian
Comment
Inigo Sanchez

Join Date: Apr 2016

Posts: 39
#3

18 May 2016, 05:02

Hi,

I have 132 columns for each of the 10 variables (one independent variable + 9 dependent variables), each column representing a month from January 2003 to December 2013. Plus, each column has 9,630 rows. I want to regress the January 2003 column of the independent variable on the January 2003 columns of the 9 dependent variables, and repeat this process for each of the 132 months. 132 cross-sectional regressions in total. Plus, I want to store the coefficient in a file and get as much information as possible with regard to the individual regressions (e.g. R^2, t-stats, etc.). Thank you.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35681
#4

18 May 2016, 05:57

Inigo: This makes no sense to me either. In Stata, column is either an informal word for a variable in the dataset, so variables and columns are one and the same, or it refers in the usual way to a column of a matrix. So, "132 columns for each of 10 variables" makes no sense.

Similarly, row is either an informal word for an observation in the dataset, so observations and rows are one and the same, or it refers in the usual way to a row of a matrix

The whole discussion would go a lot faster if you did what Sebastian has already specifically suggested, post an example using dataex (SSC).
Comment
Inigo Sanchez

Join Date: Apr 2016

Posts: 39
#5

18 May 2016, 07:15

It seems that I did not explain it correctly, I guess it is a semantic issue, I will try it again.

As said in ·#4, each column represents a variable, I agree. The point is that I have 10 sets ( one set for 1 independent variable and 9 sets for the dependent variables) of 132 columns each (each column representing one month from January 2003 to December 2013) with their corresponding observations (i.e. rows). The first 132 columns represent the independent variable, the second 132 columns represent the first dependent variable, the third 132 columns represent the second dependent variable, and so on. The first regression I want to run is the January 2003 column of the independent variable (the first column on my dataset) on the January 2003 columns of the dependent variables. The second regression I would like to run is the February 2003 column of the independent variable on the February 2003 columns of the dependent variables, and so on, until running 132 cross-sectional regressions (one regression for each month).

Hope you now understood what I want to do.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35681
#6

18 May 2016, 07:52

What about dataex?

Ploughing through the words, I guess that by "column" you mean a block of observations within a variable, but I really can't be certain.

Using your own terminology and not Stata's raises more than a "semantic issue": unless and until we understand what you're doing, we can't give good advice.

Code:

help statsby

may be what you seek as a way of repeating the regressions.
Comment
Inigo Sanchez

Join Date: Apr 2016

Posts: 39
#7

18 May 2016, 07:58

Yes, that exactly what I mean, when I say "column" I refer to a block of observations within a variable.
Comment
Sebastian Geiger

Join Date: Oct 2015

Posts: 124
#8

18 May 2016, 10:16

I still don't quite know what you mean, but it appears to me that your number of variables (or as you call it columns) is way too high. Usually, you should have one variable (column) for each variable to perform appropriate regressions. You can reshape your dataset from the so called wide format (which I expect your dataset is in) to the long format using the command -reshape long-. See the help file for the full syntax of this command. You still can run cross-sectional regressions with the long format using the if option of the -reg- command.

If I understand you correctly, you probably also need to merge the datasets (see command -merge-) as all variables need to be in one dataset to perform a regression.

I would try to provide a code for your problem, but for that I need the exact structure of the dataset(s). Again, output of the command -dataex- would help.

Last edited by Sebastian Geiger; 18 May 2016, 10:21.
Comment
Inigo Sanchez

Join Date: Apr 2016

Posts: 39
#9

18 May 2016, 13:18

I will try to explain it another way. My objective is to run 132 regressions in total. For each of them I intend to regress one dependent variable on 9 independent variables (I was wrong on the previous posts, I meant the other way around). I don´t want to run the regressions individually as this is time consuming. My question is if there is a code that would allow me to run the 132 regressions (without running them individually) while storing the coefficients and getting information (R^2, t-stat, etc.) on each individual regression. Thank you.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35681
#10

18 May 2016, 13:22

Inigo: Already answered in #6: look at statsby If that's not the right answer, you really will need to explain why, as despite repeated requests you won't give us examples showing what your data look like.
Comment
Inigo Sanchez

Join Date: Apr 2016

Posts: 39
#11

18 May 2016, 16:07

I have attached a sample file of how my data look like. My original data have way more observations and variables, but the code to be entered should be analogous. So, based on the data of the attached file, I would like to store the coefficients and other statistical information (R^2, t-stats, etc) of four regressions. The regressions that I would need to run would be the following:

Regress Dep1 on Indep1.1 and Indep2.1
Regress Dep2 on Indep1.2 and Indep2.2
Regress Dep3 on Indep1.3 and Indep2.3
Regress Dep4 on Indep1.4 and Indep2.4

What code should I enter? Thank you in advance. Hope this is more illustrative.
Attached Files

Statalist.xlsx (8.6 KB, 1 view)
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#12

18 May 2016, 18:56

Inigo -

You will more successfully communicate what you want, and at the same time provide sample data others can use to demonstrate the techniques they describe, by creating a Stata dataset with 1 independent variable and just 2 dependent variables, for just the 4 months November 2003, December 2003, January 2004, and February 2004 - a total of (1+2)x4 = 12 columns in Excel-speak - and 20 of your observations - rows in Excel-speak. Then install and use the dataex command as described in the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question and section 12 for advice on dataex, and use it to create a listing that can be copied and pasted to a Statalist post.

As it is, Nick and Sebastian are having much the same issues understanding your problem description that Clyde and I had with the earlier problem we corresponded on. Part of the problem is that your data in this problem is set up in a way that makes it difficult to work with in Stata. That was the case in the earlier problem, and while an answer eventually emerged, we didn't address the real problem, which was the unsuitability of the way you are arranging your data in Stata. That was what Robert Picard tried to communicate, although with what was perhaps too complicated a solution for someone who is apparently a new user of Stata.

With sample data, we can show you not only how to solve your problem, but how to better arrange your data to allow a simple efficient solution to your problem.

An Excel spreadsheet with column headings that are not suitable for use as Stata variable names doesn't really help. You need to start by creating Stata datasets from your Excel data - many of us do not use Excel, and of those that do, many are reluctant to open Excel workbooks that could transmit malware.
2 likes
Comment

Announcement

Macro for cross-sectional regressions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment