Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looping regression to determine one set of control variables that make models significant

    Hi everyone,

    I have one dependent variable Y, four independent variables X1 X2 X3 X4, 12 control variables C1 C2 .....C12 (please note that independent and control variables are in random forms and not in any order)

    Now, I want to loop regression to select a set of control variables from my 12 ones that can make the coefficient of my 4 independent variables(X1 X2 X3 X4) significant, which in my case the p-value of the four coefficients should be less than 10%( the smaller the better).

    Please note the set of control variables that I want to pick out can contain from just 1 control variable to as much as 12 ones, which means it can be just C1 or just C2 or just C7 and it can also be a combination like(C1, C7, C10, C12). If my calculation is correct, there should be ( 2^12-1=4095) combinations and correspondently 4095 times of looping regression.

    And when the set of control variables satisfied the significance request is detected, I'd love it to be output so that I can see which sets are available.

    Thank you for your help

  • #2
    A p-value of 0.1 could loosely be taken to mean that there is a 1 in 10 chance of finding a false positive result (given that the null hypothesis of no association is correct). Anyway, you are going for 1 in 4,095. What do you think is the meaning of a p-value of the selected model, if anything?

    Best
    Daniel
    Last edited by daniel klein; 28 Apr 2020, 01:13.

    Comment


    • #3
      Thanks, I see your point,however I'd love to ask based on what principles should I choose control variables if not in this way? There are 12 ones that I suspect to be related to my dependent variable Y, and if I put them all in and cluster id, the model wouldn't be significant, so I think I may need to pick up some from them.

      Regards
      Yubo

      Comment


      • #4
        How you build your model depends on the questions you want to answer with that model. There are few if any situations where tweaking your model to fit a preferred answer would be helpful.

        Edit:
        The term "control variables" suggests that you are interested in the coefficients of the "independent variables". In a regression framework, you would probably want to account/control for potential confounders that are correlated with both your "dependent variable" and your "independent variables". You would probably not want to include potential mediators, though. It is hard to be more specific given the information you provide.


        Best
        Daniel
        Last edited by daniel klein; 28 Apr 2020, 02:42.

        Comment


        • #5
          I understand. Thank you for your advice.

          Regards,
          Yubo

          Comment


          • #6
            To add to Daniel's helpful comment, let me note that this is data mining in the old bad sense – trying piles of stuff in a search for statistical significance and then treating the result as if you just estimated the model once.

            While there is a legitimate endeavor in the new data mining which is largely about exploratory data analysis, this kind of stepwise analysis is exactly what's wrong in the old style data mining.

            Note that if you really really want to do this, there is a stepwise procedure available in Stata.

            Comment


            • #7
              Thank you, I've decided to drop this method or the results would be meaningless which will make me feel very bad.

              And sorry about the late reply, haven't checked the notifications a lot recently.


              Regards,
              Yubo

              Comment

              Working...
              X