Looping regression to determine one set of control variables that make models significant

Yubo Fu

Join Date: Apr 2020

Posts: 5
#1

Looping regression to determine one set of control variables that make models significant

28 Apr 2020, 00:33

Hi everyone,

I have one dependent variable Y, four independent variables X1 X2 X3 X4, 12 control variables C1 C2 .....C12 (please note that independent and control variables are in random forms and not in any order)

Now, I want to loop regression to select a set of control variables from my 12 ones that can make the coefficient of my 4 independent variables(X1 X2 X3 X4) significant, which in my case the p-value of the four coefficients should be less than 10%( the smaller the better).

Please note the set of control variables that I want to pick out can contain from just 1 control variable to as much as 12 ones, which means it can be just C1 or just C2 or just C7 and it can also be a combination like(C1, C7, C10, C12). If my calculation is correct, there should be ( 2^12-1=4095) combinations and correspondently 4095 times of looping regression.

And when the set of control variables satisfied the significance request is detected, I'd love it to be output so that I can see which sets are available.

Thank you for your help
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3886
#2

28 Apr 2020, 00:50

A p-value of 0.1 could loosely be taken to mean that there is a 1 in 10 chance of finding a false positive result (given that the null hypothesis of no association is correct). Anyway, you are going for 1 in 4,095. What do you think is the meaning of a p-value of the selected model, if anything?

Best
Daniel

Last edited by daniel klein; 28 Apr 2020, 01:13.
Comment
Yubo Fu

Join Date: Apr 2020

Posts: 5
#3

28 Apr 2020, 01:31

Thanks, I see your point，however I'd love to ask based on what principles should I choose control variables if not in this way? There are 12 ones that I suspect to be related to my dependent variable Y, and if I put them all in and cluster id, the model wouldn't be significant, so I think I may need to pick up some from them.

Regards
Yubo
Comment
daniel klein

Join Date: Mar 2014

Posts: 3886
#4

28 Apr 2020, 02:30

How you build your model depends on the questions you want to answer with that model. There are few if any situations where tweaking your model to fit a preferred answer would be helpful.

Edit:
The term "control variables" suggests that you are interested in the coefficients of the "independent variables". In a regression framework, you would probably want to account/control for potential confounders that are correlated with both your "dependent variable" and your "independent variables". You would probably not want to include potential mediators, though. It is hard to be more specific given the information you provide.

Best
Daniel

Last edited by daniel klein; 28 Apr 2020, 02:42.
Comment
Yubo Fu

Join Date: Apr 2020

Posts: 5
#5

28 Apr 2020, 03:02

I understand. Thank you for your advice.

Regards,
Yubo
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#6

29 Apr 2020, 10:51

To add to Daniel's helpful comment, let me note that this is data mining in the old bad sense – trying piles of stuff in a search for statistical significance and then treating the result as if you just estimated the model once.

While there is a legitimate endeavor in the new data mining which is largely about exploratory data analysis, this kind of stepwise analysis is exactly what's wrong in the old style data mining.

Note that if you really really want to do this, there is a stepwise procedure available in Stata.
Comment
Yubo Fu

Join Date: Apr 2020

Posts: 5
#7

13 May 2020, 06:31

Thank you, I've decided to drop this method or the results would be meaningless which will make me feel very bad.

And sorry about the late reply, haven't checked the notifications a lot recently.

Regards,
Yubo
Comment

Announcement

Looping regression to determine one set of control variables that make models significant

Comment

Comment

Comment

Comment

Comment

Comment