Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing factor variable base values from fvexpand list

    Hi everyone,

    Basically, I am building a program that does not need to calculate values for the omitted categories of a factor variable.

    webuse sysdsn1
    fvexpand i.insure i.site
    di r(varlist)

    Yields : "1b.insure 2.insure 3.insure 1.site 2b.site 3.site"

    What is the most efficient way to strip out "1b.insure" and "2b.site" from this syntax, or more generally any varlist?



  • #2
    I needed something similar so I wrote a utility to do it called fvstrip.ado. I put it on GitHub because I figured people would typically use in their own code (e.g., inside their own ado files) instead of as a standalone program. You can find it here:

    https://github.com/markeschaffer/stata-utilities

    There's a help file as well with some examples.

    Comment


    • #3
      Update: I've answered my own question!

      For the curious, you can extract the omitted base value using _ms_extract_varlist and the "noomitted" option.

      webuse sysdsn1
      regress insure i. site
      _ms_extract_varlist i.site, noomitted
      return list

      Comment


      • #4
        Mark, that script is fantastic, thank you so much!

        Comment


        • #5
          Matt - I'm curious ... for your purposes, what advantages does the fvstrip.ado script have over _ms_extract_varlist?

          Comment


          • #6
            Fvstrip is unequivocally better. I actually made that post without having seen yours yet.

            _ms_extract_varlist worked great for what I needed in this case-in the example above it stripped out the base category leaving me with 2.site 3.site. That's all I really wanted, as I was writing a program which did something like (the exact example is more complex, but you'll get the idea):

            [following a three category mlogit model]

            foreach var of `r(varlist)'{
            local k`var'1=[equation1]_b[`var']+[equation1]_b[cons]
            local k`var'2=[equation2]_b[`var']+[equation1]_b[cons]

            local final`var'eq1=exp(`k`var'1)/exp(`k`var'1)+exp(`k`var'2)+exp(0)
            local final`var'eq2=exp(`k`var'2)/exp(`k`var'1)+exp(`k`var'2)+exp(0)
            local final`var'eq3=exp(0)/exp(`k`var'1)+exp(`k`var'2)+exp(0)


            return scalar `var'eq1 =`final`var''eq1
            return scalar `var'eq2 =`final`var''eq2
            return scalar `var'eq3 =`final`var''eq3

            }

            After that, I would construct bootstrap SEs for each of the scalars that are returned.


            Stripping out those base categories just saved me the trouble of creating estimates (and bootstrapping and ultimately reporting those estimates) for the base categories, which always equal the intercept and need not be calculated again and again by the loop.

            Comment


            • #7
              Interesting ... after I saw your post, I went back to fvstrip and tried an alternative coding that basically uses your strategy - a call to _ms_extract_varlist to get the varlist from e(b) - but inside an little eclass program that posts e(b) using the varlist as the matrix stripe so that _ms_extract_varlist can find it. An rclass program calls this little eclass program so that the resulting varlist can be stored as an r(.) result (like the version of fvstrip now on GitHub). Would that strategy have worked for you as well? I think so but I'm not sure.

              Comment


              • #8
                I think it might have. Fvstrip.ado is a bit more flexible than _ms_extract_varlist since it allows for more options. What you're describing sort of sounds like a wrapper that would extend the functions of _ms_extract_varlist a little bit while also being more efficient than the original fvstrip.

                Comment


                • #9
                  I guess the difference is that as written, fvstrip has a varlist orientation, whereas _ms_extract_varlist is meant for matrix stripes. You can give _ms_extract_varlist a posted e(b) and it will pull out the varlist with the fv notation stripped out. If there are multiple equations, it will pull out the varlist of the equation you specify.

                  One limitation of _ms_extract_varlist is in the handling of the variable order: the order of variables is as it appears in the matrix stripe rather than as specified by the user. Using the example in the help file,

                  Code:
                  . regress mpg i.foreign i.rep78
                  
                  <snip>
                  
                  . _ms_extract_varlist 3.rep78 2.rep78
                  
                  . di r(varlist)
                  2.rep78 3.rep78
                  ...which could be a problem in some applications, I suppose.

                  fvstrip, on the other hand, obeys the varlist order as specified, but if the expand option is included, Stata reorders in the way you'd expect:

                  Code:
                  . fvstrip 2.rep78 1.rep78, noi
                  2.rep78 1.rep78
                  
                  . fvstrip 2.rep78 1.rep78, expand noi
                  1.rep78 2.rep78

                  Comment


                  • #10
                    The biggest advantage of fvstrip is that you don't need to generate the varlist matrix in advance with that regress command- I forgot about at part. It's much more efficient to skip the regress command if possible. Good point on the variable ordering as well, I didn't notice that when I was tinkering with _ms_extract_varlist. In my case I actually do want variables to be kept in a very specific order so it would have caused problems with my syntax eventually.

                    Comment


                    • #11
                      I know you surely have a practical solution by now, so just out of curiosity:

                      If you really just want to get rid of specific words in a list isn't there a much simpler (and maybe more efficient) approach?

                      Why not just loop through the list and take what you need to build a new list (leaving out what you don't want)?

                      Code:
                      webuse sysdsn1, clear
                      fvexpand i.insure ib2.site
                      
                      foreach wrd in `r(varlist)' {
                          if strpos("`wrd'", "b.") == 0 local result `result' `wrd'
                      }
                      di "`result'"
                      Best,
                      Max

                      Comment


                      • #12
                        That's basically what fvstrip does. But it's not just "b" that needs stripping out; there's also "n" and "o". You might or might not want the returned list to include the omitteds ("o" prefix). There's also interactions to worry about.

                        Comment

                        Working...
                        X