Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting clean output for regression loop over levels of a variable and with an indicator independent variable

    I have been struggling to get Stata to output my regression results, and have tried the eststo commands and outreg2 commands. My code is as follows. I also thought of using parmest but do not know how to tell Stata to put output in 1 file for all 50 states and for each of the 5 levels of the "riskfactors" variable. I want the odds ratios, confidence intervals and p-values in my tables.

    set more off
    levelsof _state, local(states)
    foreach _state of local states {
    display _newline(2) "State=`_state'"
    xi: svy, subpop(if _state==`_state' & riskfactors!=.): logistic fphlth i.riskfactors
    }

    Any help would be much appreciated! I am fairly new to Stata and data analysis.

  • #2
    The use of survey data makes this complicated. Without that, -statsby- would make this a one-liner, but -statsby- is incompatible with -svy:- So you have to, in effect, program the fundamental logic of -statsby-, which is a wrapper for -postfile-:

    Code:
    capture postutil clear
    postfile handle str32 state float or2 or3 or4 or5 ll2 ll3 ll4 ll5 ///
        ul2 ul3 ul4 ul5 p2 p3 p4 p5 using all_states_results, replace
        
    levelsof _state, local(states)
    foreach s of local states {
        svy, subpop(if _state == "`s'"): logistic fphlth i.riskfactors
        local topost ("`s'")
        matrix M = r(table)
        forvalues j = 2/5 {
            local topost `topost' (M[1, `j'])
        }
        forvalues j = 2/5 {
            local topost `topost' (M[5, `j'])
        }
        forvalues j = 2/5 {
            local topost `topost' (M[6, `j'])
        }
        forvalues j = 2/5 {
            local topost `topost' (M[4, `j'])
        }
        post handle `topost'
    }
    postclose handle
    Notes:
    1. The above code will leave the results you want in the file all_states_results.dta, which you can then -list- or use in other ways. You might want to -reshape- it long, depending on where you're taking this.
    2. Given that you want odds ratios, one category must be the reference category with OR = 1; the above code assumes it's the lowest value of riskfactors. If you want to set a different one as the reference category use the ib. prefix (-help fvvarlist-). So the file generated contains odds ratios, confidence limits, and p-values for 4 levels of variable riskfactors.
    3.. With a different ordering of the variables in the results file, this could be simplified to a single -forvalues- loop with a single -local topost- command that includes all four of the matrix elements mentioned.
    4.. xi: is obsolete and unnecessary in light of factor variable notation. See -help fvvarlist-.
    5. No need to specify non-missingness of riskfactors in the svy: prefix as regression commands always automatically omit cases with missing values on any model variable.

    Comment


    • #3
      Consider the following example using estimates table to store/restore results ( including the ORs, SEs, and p-values for each IV) and -outreg2- (from SSC) to export the results for each regression to Excel:


      Code:
      **create fake data for MWE
      sysuse census, clear
      rename state2 _state 
      expand  50 //15 respondents per state
      g riskfactors = int(1+runiform()*5)
      g fphlth = rbinomial(1, .33)
      **svyset
      bys _state: g id = _n
      svyset id , strata(_state)
      
      *regression output
      
      cap rm test.xls //files containing results
      cap rm test.txt //files containing results
      
      levelsof _state, local(states)
      foreach _state of local states {
      display _newline(2) "State=`_state'"
      svy, subpop(if _state=="`_state'" & riskfactors!=.): logistic fphlth i.riskfactors
      **for estimates table output 
      est sto `_state'
      loc a `"`a' `_state'"'
      outreg2 using "test.xls", eform cti(odds ratio)  excel  stats(coef se pval )
      }
      est table TX, eform star //example
      est table `a', eform  b(%7.4f) se(%7.4f) stats(N F) //all states
      
      **open the test.xls  file to see results
      Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

      Comment


      • #4
        Thank you! I was able to get Clyde's suggested code to work!

        I am wondering if one of the 2 approaches above could be applied to the proportion command? For example, we have run the following code using the mean command in the past for dichotomous variables, but I am not sure how to make this work to get clean output for each state that includes the proportion, se, lower and upper ci, and number of observations for each level of riskfactor (0-5).

        svy, subpop(if riskfactors!=.): proportion riskfactors, over(_state)
        matrix obs_overall = e(_N)'
        parmest, saving("riskfactors_overall.dta", replace) rename(parm _state estimate overall_riskfactors min95 overall_lower max95 overall_upper)

        I believe I need to specify "if riskfactors!=." in this case...

        In response to item #3 from Clyde's response, could you explain how to do this? I am not sure I follow. Thanks in advance.

        Comment


        • #5
          Well a slight modification of my code in #2 will serve here. In addition, to de-mystify my note #3 in post #2, I have re-arranged the order of the variables in the postfile so that only a single loop is required.

          Code:
          capture postutil clear
          postfile handle str32 state float ///
            n1 prop1 se1 ll1 ul1  ///
              n2 prop2 se2 ll2 ul2 ///
              n3 prop3 se3 ll3 ul3 ///
              n4 prop4 se4 ll4 ul4 ///
              n5 prop5 se5 ll5 ul5 using all_states_riskfactors, replace
              
          levelsof _state, local(states)
          foreach s of local states {
              svy, subpop(if _state == "`s'"): proportion riskfactors
              local topost ("`s'")
              matrix M = r(table)
              matrix N = e(_N)
              forvalues j = 1/5 {
                local topost `topost' (N[1,`j']) (M[1, `j']) (M[2, `j']) ///
                      (M[5, `j']) (M[6, `j'])
              }
              post handle `topost'
          }
          postclose handle
          I do not myself use -parmest-, so it may be that the approach you have taken in #4 is fixable, but I wouldn't know.

          And, as before, I see no reason to specify -if riskfactors != .- since the -proportion- command automatically deals with missing values this way.

          Comment


          • #6
            Thanks Clyde. That worked for the proportions. I am wondering how to add another loop to get proportions by state and by age groups, for example? I attempted adding this, but was not sure how to specify the outcome variables (i.e. n1 prop1 se1 ll1 ul1, etc).

            My other question is similar to the ones above. How to get the output for a two-way tab/proportion table for all 50 states and for all 50 states by subpopulation groups? I attempted the following with the associated error message.

            capture postutil clear
            postfile handle str32 state float ///
            prop1 se1 ll1 ul1 ///
            prop2 se2 ll2 ul2 ///
            prop3 se3 ll3 ul3 ///
            prop4 se4 ll4 ul4 using all_states_twowaytabs, replace
            tostring _state, replace
            levelsof _state, local(states)
            foreach s of local states {
            svy, subpop(if _state == "`s'"): proportion fphlth prev3plus
            local topost ("`s'")
            matrix M = r(table)
            forvalues j = 1/5 {
            local topost `topost' (N[1,`j']) (M[1, `j']) (M[2, `j']) ///
            (M[5, `j']) (M[6, `j'])
            }
            post handle `topost'
            }
            postclose handle

            Comment


            • #7
              It doesn't really require any modification to the postfile handling except to add a variable for age group, and to post the age group. And then you need to embed the existing loop over states inside a loop over age groups. So it looks like this (changes from code in #3 shown in bold face):

              Code:
              capture postutil clear
              postfile handle str32 state int age_group float ///
                n1 prop1 se1 ll1 ul1  ///
                  n2 prop2 se2 ll2 ul2 ///
                  n3 prop3 se3 ll3 ul3 ///
                  n4 prop4 se4 ll4 ul4 ///
                  n5 prop5 se5 ll5 ul5 using all_states_age_groups_riskfactors, replace
                  
              levelsof _state, local(states)
              levelsof age_group, local(age_groups)
              foreach a of local age_groups {
                  foreach s of local states {
                      svy, subpop(if _state == "`s'" & age_group == `a'): proportion riskfactors
                      local topost ("`s'") (`a')
                      matrix M = r(table)
                      matrix N = e(_N)
                      forvalues j = 1/5 {
                        local topost `topost' (N[1,`j']) (M[1, `j']) (M[2, `j']) ///
                              (M[5, `j']) (M[6, `j'])
                      }
                      post handle `topost'
                  }
              }
              postclose handle
              Note: I assumed your age group variable is an integer and is called age_group.

              Going forward, when showing code, please place it in a code block (just as I do) so that it comes out easily readable. If you don't know how to set up a code block on this Forum, you can find the instructions in FAQ #12.

              Comment


              • #8
                Hi Clyde,

                I am running into an error message using a new variable, "grid," with the code you had provided. I edited the code to reflect 4 levels of the variable "grid" 4. The previous code was intended for the 5 levels of the variable "riskfactors". I also substituted "l" for "s" because I was getting an error message that I had an invalid subpop() option. After running the code below, I get the following error message ---

                post: 22 expressions expected and only 17 found
                r(198);


                "grid" is a float variable with 4 integer values, 1 2 3 4. "_state" is a double variable with 51 integer values (includes District of Columbia).


                Code:
                capture postutil clear
                postfile handle double _state float grid ///
                    n1 prop1 se1 ll1 ul1  ///
                    n2 prop2 se2 ll2 ul2 ///
                    n3 prop3 se3 ll3 ul3 ///
                    n4 prop4 se4 ll4 ul4 using all_states_quadrants, replace
                
                levelsof _state, local(states)
                foreach l of local states {
                    svy, subpop(if _state == `l'): proportion grid
                    local topost (`l')
                    matrix M = r(table)
                    matrix N = e(_N)
                    forvalues j = 1/4 {
                      local topost `topost' (N[1,`j']) (M[1, `j']) (M[2, `j']) ///
                            (M[5, `j'])
                    }
                    post handle `topost'
                }
                postclose handle
                Thank you,
                ​Laura

                Comment


                • #9
                  You still need the last term (in blue) in this line:

                  Code:
                  local topost `topost' (N[1,`j']) (M[1, `j']) (M[2, `j']) ///
                             (M[5, `j']) (M[6, `j'])
                  (N[1,`j']) -- collects n1-n4
                  (M[1, `j']) -- collects prop1-prop4
                  (M[2, `j']) -- collects se1-se4
                  (M[5, `j']) -- collects lower bound of the CI (ll1-ll4)
                  (M[6, `j']) -- collects the upper bound of the CI (ul1-ul4)
                  Stata/MP 14.1 (64-bit x86-64)
                  Revision 19 May 2016
                  Win 8.1

                  Comment


                  • #10
                    Thank you Carole! Something is still not adding up because I get an error message stating that 22 expressions expected and only 21 found.

                    Comment


                    • #11
                      You'll notice that the macro `topost' in Clyde's example if first defined as: local topost ("`s'") (`a') and contains two terms: the string "state" and the numeric age group. Those are defined in his first postfile call.

                      In your first postfile call, you say that you are going to include state (as numeric) and grid, your macro local topost (`l') does not include grid (which is fine) but grid is not included anywhere else. You either need to remove grid from your first postfile call, or define and include it elsewhere

                      Code:
                      capture postutil clear
                      postfile handle double _state float grid ///
                          n1 prop1 se1 ll1 ul1  ///
                          n2 prop2 se2 ll2 ul2 ///
                          n3 prop3 se3 ll3 ul3 ///
                          n4 prop4 se4 ll4 ul4 using all_states_quadrants, replace
                      
                      levelsof _state, local(states)
                      foreach l of local states {
                          svy, subpop(if _state == `l'): proportion grid
                          local topost (`l')
                          matrix M = r(table)
                          matrix N = e(_N)
                          forvalues j = 1/4 {
                            local topost `topost' (N[1,`j']) (M[1, `j']) (M[2, `j']) ///
                                  (M[5, `j']) (M[6, `j'])
                          }
                          post handle `topost'
                      }
                      postclose handle
                      Stata/MP 14.1 (64-bit x86-64)
                      Revision 19 May 2016
                      Win 8.1

                      Comment


                      • #12
                        Thank you Carole. Your explanation helped me understand what I am telling Stata to do, and I got it to work.

                        Comment


                        • #13
                          Dear Members,

                          I am trying to adapt the above code to work with the svy:mean command and to include the CV or RSE in the output. Below is what I have tried and I get the "invalid syntax r(198) error message. I don't think STATA likes how I am specifying the r(cv) matrix.
                          Code:
                          capture postutil clear
                          postfile handle double _state ///
                              n mean se rse ll ul using all_states_mean, replace
                              
                          levelsof _state, local(states)
                          foreach s of local states {
                              svy, subpop(if cms!=. & _state == `s'): mean cms
                              estat cv
                              local topost (`s')
                              matrix M = r(table)
                              matrix N = e(_N)
                              matrix O = r(cv)
                              forvalues j = 1 {
                                local topost `topost' (N[1,`j']) (O[1, `j']) (M[1, `j']) (M[2, `j']) ///
                                      (M[5, `j']) (M[6, `j'])
                              }
                              post handle `topost'
                          }
                          postclose handle
                          }

                          Comment


                          • #14
                            I see a few problems here.

                            First, if you run a post-estimation command following the -svy: mean- command, it will obliterate anything that -mean- itself left in r(). In particular, r(table) no longer exists at the point in the code where you try to store it in matrix M. So you have to move that command up immeidately after -svy: mean-.

                            Next, -forvalues j = 1- is illegal syntax. If you want to do this, say, 5, times, then you need -forvalues j = 1/5 {-. If you just want to do it once, you don't need a loop at all, although you could preserve the overall syntax with -forvalues j = 1/1 {- if that would be useful in some other way.

                            Comment

                            Working...
                            X