Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best variables for a regression

    Hello everyone! I am new here, I hope to write the question in the right session.
    I have to do a logistic regression with 20 available variables and i have to find which combination of them is the best.
    How can I do it?
    I thought to do a loop in which I will add a variable and see if the r2 adjusted is better than before, but i really don't understand stata language.
    Please help me, thank you.

  • #2
    Edoardo:
    welcome to this forum.
    Your question sounds as ill-posed.
    The first thing you should aim to is to give a fair and true view of the data generating process you're investigating.
    This has nothing to do with hunting for the regression speciifcation that gives you back tons of statistically significant coefficients.
    Stata language, like all the languages, requires time to be learnt and used properly. No priority lane and/or hard and fast rules are available.
    See the wise comment on this point that William Lisowski wisely recall on this forum from time to time.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      I'm late to this discussion, Carlo is several time zones ahead of me.

      The comment Carlo refers to is one that I offer in any post to a member who identifies as being new to Stata.
      I'm sympathetic to you as a new user of Stata - there is quite a lot to absorb. Nevertheless, I'd like to encourage you to take a step back from your immediate tasks.

      When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

      The objective in doing the reading was not so much to master Stata - I'm still far from that goal - as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

      Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

      Stata also supples YouTube videos, if that's your thing.
      Now, with that said, your problem statement

      I have to do a logistic regression with 20 available variables and i have to find which combination of them is the best.
      sounds very much like a project assignment for a course you are taking. If so, be sure that you will be graded not only on the logistic regression output you present but on your understanding of the subject matter involved. Too many years ago in an econometrics class, the course project was to model the supply of housing in the United States. I spend a lot of time getting the programming down just right, and then effectively estimated the supply of housing in year t as the supply in year t-1 plus new construction in year t minus demolition in year t. The results included significant coefficient estimates and an r2 in the high 0.90's. The instructor returned the paper with the note "you've estimated an identity" and a suitably modest grade. It seems like you might be heading down this path.

      I will mention that logistic regression does not produce an "r2 adjusted" - it produces a "pseudo R2" which fills the role of r2 for the logisitic model. I do not remember offhand if it is "adjusted" in the sense of taking into account the number of variables in the model.

      Finally, with 20 independent variables, there are 220 (over a million) possible models that can be built. Your attempt to automate the process apparently is not informed by the 60 years of statistical research and development of techniques to automate the process (described in the Wikipedia article on Stepwise Regression) and articles linked to from that one) and the criticisms of these automated approaches that go back just about as far. The stepwise techniques are embodied in Stata in the stepwise command described in the output of help stepwise.

      Comment

      Working...
      X