Looping an extra explanatory variable on each iteration.

Mohiuddin Alamgir

Join Date: May 2015

Posts: 20
#1

Looping an extra explanatory variable on each iteration.

24 May 2015, 05:39

Dear Statalist,
I am doing an OLS regression, with 12 explanatory variables. I want to check the regression results on every iteration and adding extra variable to the regression.
For example, my dependent variable is :Y; and independent variables are: x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12.
my regression will be:

reg Y x1
reg Y x1 x2
reg Y x1 x2 x3
reg Y x1 x2 x3 x4
......
.......
reg Y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12

Now I want to do it in a single command so that for one command I can have all the regression results.
I would really appreciate if anyone helps me.
Thanks,
Mohiuddin
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3859
#2

24 May 2015, 05:55

See help nestreg.

If that is not what you seek, it is probably a simple loop, basically something like

Code:

local predictors x1 x2 x3 local X foreach x of loc predictors { local X `X' `x' reg y `X' }

Best
Daniel
1 like
Comment
Mohiuddin Alamgir

Join Date: May 2015

Posts: 20
#3

24 May 2015, 06:23

Dear Daniel,

Thank you so much!!!!

Your suggestions solved my problem.

Thank you again!!!

Best regards,
Mohiuddin.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#4

24 May 2015, 10:23

One word of caution! If there are missing values for any of these x variables, then at the iteration(s) where you add that variable (those variables), the sample for your regression will change, dropping the observations for which that variable has missing values. As a result, your results are not comparable to those of the regressions done with fewer variables and more observations. If you are sure there are no missing values on any of these variables, then fine. Otherwise, you need to modify the code so that all of the regressions are carried out on only the observations that have no missing values for any of them.
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#5

24 May 2015, 10:55

Clyde has an important point. nestreg takes care of missing values, but the loop outlined does not. This is, however, easily modified. Just add the line

Code:

qui reg y `predictors' keep if e(sample)

before entering the loop. Prefix the hole thing with a preserve statement and restore after the loop has finished, if you do not want to drop cases from the dataset. Alternatively do

[code]
qui reg y `predictors'
g byte sample = e(sample)
[...]
reg y `X' if sample
[code]

Best
Daniel
1 like
Comment
Mohiuddin Alamgir

Join Date: May 2015

Posts: 20
#6

24 May 2015, 13:18

Dear Clyde,
Thank you so much I really was loosing observations, I did not noticed though.

Dear Daniel,
Thanks again for the solution. However, I wrote the below codes according to your suggestions but my observations become almost half. Though in every regression the observation number is equal. Its kind of solution though. But if dont want to loose any observation than what should I do? any suggestion?

preserve
local predictors x1 x2 x3 ...........x12
qui reg y `predictors'
g byte sample = e(sample)

local X
foreach x of loc predictors {
local X `X' `x'
reg y `X'
}
restore
Comment
Mohiuddin Alamgir

Join Date: May 2015

Posts: 20
#7

24 May 2015, 13:20

Sorry please read :

*reg y `X' if sample
Comment
Mohiuddin Alamgir

Join Date: May 2015

Posts: 20
#8

24 May 2015, 14:08

Another problem is that if I want to regress the same regression to get the contribution of each variable as like below:

1st step: reg y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
2nd step: reg y x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 (no x1 is here)
3r step: reg y x1 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 (no x2 is here)
4th step: reg y x1 x2 x4 x5 x6 x7 x8 x9 x10 x11 x12 (no x3 is here)
...........................
..........................
11th step: reg y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 (no x12 is here)

How can I do this? Would you please help in doing this. I am almost a new user of Stata and econometrics.
Thank you in advance.

Thankfully,
Mohiuddin.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#9

24 May 2015, 14:22

Another problem is that if I want to regress the same regression to get the contribution of each variable as like below:

1st step: reg y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12
2nd step: reg y x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 (no x1 is here)
3r step: reg y x1 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 (no x2 is here)
4th step: reg y x1 x2 x4 x5 x6 x7 x8 x9 x10 x11 x12 (no x3 is here)
...........................
..........................
11th step: reg y x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 (no x12 is here)

How can I do this? Would you please help in doing this. I am almost a new user of Stata and econometrics.

In terms of Stata there are user written commands that help you do such thing. From a scientific point of view, what you are trying to do here is very likely a bad idea and not sound in terms of econometric theory, which is why I will not point you to the respective commands.

If you tell us more about the substantive research question(s), we might be able to point to alternative approaches to tackle the underlying problem, although local face-top-face support is likely more helpful.

But if dont want to loose any observation than what should I do?

The most appropriate approaches are multiple imputation or full information maximum likelihood. If you have not heard of these terms, you might want to start by typing help mi. However, I strongly advice against practically using these approaches before you are sure to at least get the basic idea behind. I am afraid this is way beyond the scope of a forum and you need to do a lot of reading and probably some face-to-face support.

Best
Daniel
1 like
Comment
Mohiuddin Alamgir

Join Date: May 2015

Posts: 20
#10

25 May 2015, 07:17

Dear Daniel,
Thank you very much for those advises. I have been doing my mastter's thesis and still playing with variables. However, my professor also said me to find out the imapct when we drop a variable. So, I think I should follow his advice. I would like have the solution for these regression. Anyway, my research is : The impact of credit on the productivity of firms.

I really want to read materials which would helpful for me and to be a better hand of Stata. I dont have much chances to face the experts face to face here in Japan. I like to have your advice about how to become better in Stata. Thank you again so much.

Thankfully,
Mohiuddin.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30116

#11

25 May 2015, 10:30

While I agree with Daniel Klein that your thesis advisor may be giving you bad advice, I understand that you don't have much choice but to follow it. So here is some fast code that will run your model, dropping one predictor at a time:

Code:

local predictors x1-x12 // OR WHATEVER THEY ARE
//  MODEL WITH ALL PREDICTORS INCLUDE
reg y `predictors'
gen byte full_model_sample = e(sample) // NOTE FULL SAMPLE

// NOW LOOP DROPPING EACH PREDICTOR IN TURN
foreach p of local predictors {
    local to_use: list predictors - p
    display "Model omitting `p'"
    regress y `to_use' if full_model_sample
    display _newline(3)
}

Comment

Mohiuddin Alamgir

Join Date: May 2015

Posts: 20
#12

25 May 2015, 16:41

Dear Clyde Schechter,

Thank you so much. Your codes work very smooth and first. You are right. Now the whole regression result is becoming complicated. I will talk to my supervisor about this. However, I have been loosing almost half of the observations. Trying to figure it out. If you have anymore advises I will be grateful.
Thank you very much again.

Thankfully,
Mohiuddin.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#13

25 May 2015, 21:42

The problem is missing values for your regression variables. The -regress- command will only include those observations for which all of the variables have non-missing values. If one variable has a few observations with missing values, and another variable has a different bunch of observations with missing values, etc., by the time you account for 12 predictors and an outcome variable, it is very easy to lose large numbers of observations.

The best solution is to find the correct values for the missings. But that is often difficult or impossible to do. It is, however, worth checking your original data source and the steps you used to create your working data set from it to make sure you didn't do something along the way to create missing values that shouldn't be there. Or perhaps the missing values are the result of skip patterns that imply a particular value. (For example, response to a pregnancy question might be coded as missing for men, but one could confidently replace that by "no.") But if the data are simply missing and there is no way to supply them, then your sample is limited, and you have no good options.

If you can make a credible case that the missing values are missing at random, then the options of using multiple imputation or full-information maximum likelihood estimation are available. But both of these are fairly advanced procedures that you would need to invest a fair amount of time and effort into learning. Multiple imputation, in particular, is quite tricky to pull off and, even when you know how to do it, it's very time-consuming to set it up and run it. I'm not sure this is the right time and circumstances for you to go down either of these roads. (Not to mention that I am almost always highly skeptical of claims that data are missing at random: it is an assumption that, by its very nature, cannot be verified in the data itself and must be supported by some kind of hand-waving arguments about the nature of the mechanism generating missing values. In reality, we usually know far less about that mechanism than we like to think we know.)

I imagine you were hoping for a more positive response, perhaps a few lines of code that would fix the problem. But that just isn't possible here.

Last edited by Clyde Schechter; 25 May 2015, 21:44.
1 like
Comment
Mohiuddin Alamgir

Join Date: May 2015

Posts: 20
#14

26 May 2015, 11:39

I have one varible of Alternative Power Supply to the Firm and its impact on Credit availibility for the firm. Anyway after Summerizing al these variables I found that this variable has more than half of the obsevations as missing. Since it is a binary variable I asume that the missing data are for the non alternative power supply and replace those variale as '0' and like tha another variable has about 200 missing observation and I assume it as also '0', since, if some firm's data are missing that mean that they are unware about it and to be noted that those questions are not that shophisticated and conducted on small and micro firms. Now I have gain the observation almost as the whole sample is.

Thank you so much for the clarifiction, and I am learning the Multiple Imputation. Hopefully, I will make a good thesis.

Gratefully,
Mohiuddin.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#15

26 May 2015, 13:45

A few words of caution.

Since it is a binary variable I asume that the missing data are for the non alternative power supply

If this is true, what do the original zeros indicate? I would, in any case, at least include an indicator variable, marking the cases where you replaced a missing value by zero. This sometimes called "dummy variable adjustment". If the missing values are not by design and, thus, do not "mask" some true, but unknown value, you are probably better off with listwise deletion.

Richard Williams has a nice introductory paper on missing data. Actually, you might find some of his other papers very useful, too. I highly recommend his teaching material, especially as a starting point. Richard explains the problems and solutions very intuitively, shows examples using Stata and gives you the references for further reading.

Best
Daniel
1 like
Comment

Announcement

Looping an extra explanatory variable on each iteration.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment