Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • statistics of all variable (including factor variables) used in regression

    I'm running a regression on a large dataset (60m observations) with around 130 independent variables (included several factor variables). I would like to get a single table of the means, sd, min, max, 10th and 90th percentiles) of all the independent variables used in the regression, including the factor variable levels.

    Example regression command:
    Code:
    reg mydep i.Gender#ib5.agegrp ib5.agegrp#c.IMDScore ib5.agegrp_wide#ib5.hholdtype ib1.propertytype noise airpollution , base
    I can get part of what I need with
    Code:
    summ i.Gender#ib5.agegrp ib5.agegrp#c.IMDScore ib5.agegrp_wide#ib5.hholdtype ib1.propertytype noise airpollution if e(sample) , base sep(0)
    But this doesn't output the percentiles or the statistics for the base factor variable category.

    Can anyone help please?

  • #2
    Code:
    help tabstat

    Comment


    • #3
      I had hoped to used tabstat but
      Code:
      factor-variable and time-series operators not allowed
      r(101);


      I'm using Stata 15.1 MP8

      Comment


      • #4
        Rob:
        have you already considered the -detail- option available from -summarize-?
        That said, I'd prefer -tabstat- for this kind of job, but you have to create interaction by hand to avoid -tabstat- rejecting the -fvvarlist- notation.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          The individual levels of the factor variables are all constants, so their means and quantiles are all identical and their SD is 0.

          Otherwise those statistics have in my view some meaning if a factor variable is ordered, although some literature opposes such a view.

          In any case as implied you must strip the factor variable notation out of a call to tabstat.

          Comment


          • #6
            The -detail- option for -summarize- does run, but outputs a mini table for each of the 130 variables I have, so very unwieldy.

            I think I will have to go with -tabstat- and use a combination of -xi- and manual manipulation.

            Thanks
            Rob

            Comment


            • #7
              Rob:
              I'm afraid manual manipulation is unavoidable in your case.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment

              Working...
              X