Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Run different commands dependent on numeric or factor variable

    I am trying to create a programme that will do "something" on numeric variables and "something else" on factor variables. Ideally, the programme would check whether the variable was numeric or a factor. If it is numeric, then it would summarise the variable. If it was a factor variable, then it would create dummy variables for each of its levels and then perform summarise. But this requires the programme to identify whether the variable is numeric or factor (from the user specifying the "i." prefix before the variable name).

    For the factor variables, the programme would ideally create dummy variables and 'summarise' would be performed on each of the dummy variables.


    I know there is a way to use the "s(fvops)" macro but I am not sure it is the best approach in this case?

    Code:
    * Load the data
    use "http://www.stata-press.com/data/r14/cattaneo2.dta", clear
    
    
    * Define the programme
    capture program drop myprogramme
    
    program define myprogramme
        syntax varlist(numeric fv)
        local fvops = "`s(fvops)'" == "true"
        if `fvops' {
            tab variable, gen(_variable)
            local k = [number of levels of factor variable]
            forval in 1/k {
            summarize
            }
        }
        if fvops = "`s(fvops)'" == "false" {
            sum `varlist'
        }
    end   
    
    
    * Run the programme
    myprogramme mbsmoke mage i.prenatal

    In the above, "mbsmoke" and "mage" would be numeric variables and so should be summarised as is. But as "i.prenatal" is a factor variable the programme should create dummy variables and then perform summarise on each of the dummy variables.

    Any help is much appreciated!

  • #2
    I know there is a way to use the "s(fvops)" macro but I am not sure it is the best approach in this case?
    I think it makes a great deal of sense to require your user to indicate whether a variable is a factor variable and test for that in your program. Notice that Stata doesn't have a "factor" type. Factor variables are stored as integers. You might think "oh, I should just automatically detect whether or not something is a factor," but as far as Stata is concerned, there is no difference between a numeric integer and a factor integer. So instead you have your user specify whether or not something is a factor. Makes perfect sense.

    Code:
     if fvops = "`s(fvops)'" == "false"
    Are you assigning a value to a variable called fvops in this if statement? That seems really bad. Seems like you should do this:

    Code:
    if !`fvops' {
    
    }
    Or better yet:

    Code:
    else {
    
    }

    Comment


    • #3
      If what you want is indeed
      Ideally, the programme would check whether the variable was numeric or a factor. If it is numeric, then it would summarise the variable. If it was a factor variable, then it would create dummy variables for each of its levels and then perform summarise.
      then you are working too hard.
      Code:
      . summarize  mbsmoke i.prenatal mage
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
           mbsmoke |      4,642    .1861267    .3892508          0          1
                   |
          prenatal |
                0  |      4,642    .0150797    .1218832          0          1
                1  |      4,642    .8013787    .3990052          0          1
                2  |      4,642    .1501508    .3572577          0          1
                3  |      4,642    .0333908    .1796741          0          1
      -------------+---------------------------------------------------------
                   |
              mage |      4,642    26.50452    5.619026         13         45
      But if there might be other things you want to do besides summarize, then perhaps this code will point in a useful direction. The key is using the fvexpand command on individual variables. Your problems began with failing to loop over the separate variables in the variable list presented to the program.
      Code:
      * Load the data
      use "http://www.stata-press.com/data/r14/cattaneo2.dta", clear
      
      * Define the programme
      capture program drop myprogramme
      
      program define myprogramme
          syntax varlist(numeric fv)
          foreach var in `varlist' {
              fvexpand `var'
              if "`r(fvops)'"=="true" {
                  foreach fvar in `r(varlist)' {
                      summarize `fvar'
                  }
              }
              else {
                  summarize `var'
              }    
          }
      end   
      
      * Run the programme
      myprogramme mbsmoke ibn.prenatal mage
      Code:
      . * Run the programme
      . myprogramme mbsmoke ibn.prenatal mage
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
           mbsmoke |      4,642    .1861267    .3892508          0          1
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
        0.prenatal |      4,642    .0150797    .1218832          0          1
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
        1.prenatal |      4,642    .8013787    .3990052          0          1
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
        2.prenatal |      4,642    .1501508    .3572577          0          1
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
        3.prenatal |      4,642    .0333908    .1796741          0          1
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
              mage |      4,642    26.50452    5.619026         13         45
      
      .

      Comment


      • #4
        Unlike in some other software, being a factor variable in Stata is a status conferred on the fly and ephemerally by factor variable notation in the context of a model fit command

        If you want to devise your own criterion or criteria for different kinds of variable, that is a challenge. For example, occupation or employer is categorical by just about any standard but could take on hundreds or thousands of distinct values. Conversely, number of spouses is a quantitative variable by just about any standard but in very many cultures the possible values are just 0 or 1.

        I broadly agree with Daniel Schaefer that getting your user to say which variable is which kind is an easier path to follow.

        Statawise, a practical criterion for a factor variable is that someone bothered to define value labels!

        Other way round, s(fvops) is new to me but I doubt it can be examined without running something else first.

        It's an old programming maxim that a good program does just one thing well, so I would think in terms of different commands for different purposes.

        Comment


        • #5
          Many thanks Daniel, William, and Nick.

          Daniel (and Nick), I certainly do want the user to specify whether they want the variable considered as a factor variable (i.e., using the "i." prefix). I'm glad I was on the right track there!

          William, many thanks for posting your solution! It is just what I needed! Is there a way I can specify that the first two variables of the varlist should be considered as outcome and exposure variables. For example, say I wanted to summarise (along with other analysis) the varlist by the first variable in the varlist (i.e., the outcome) and the second variable (i.e., the exposure).

          Would the "tokenize" or "gettoken" be useful in this situation?


          Code:
           * Load the data
          use "http://www.stata-press.com/data/r14/cattaneo2.dta", clear  
          
          * Define the programme
          
          capture program drop myprogramme  
          
          program define myprogramme    
          
          syntax varlist(numeric fv)    
          
          local macro for the first variable (i.e., the outcome)      
          
          local macro for the second variable (i.e., the exposure)      
          
          local macro specifying the varlist is the rest of the variables (but not including the outcome and exposure)      
          
          foreach var in `varlist' {        
          fvexpand `var'        
          if "`r(fvops)'"=="true" {            
               foreach fvar in `r(varlist)' {                
          summarize `fvar' if [second variable]==0                
          summarize `fvar' if [second variable]==1            
          }        
          }        
          else {            
          summarize `var' if [second variable]==0            
          summarize `var' if [second variable]==1        
          }        
          }
          end    
          
          * Run the programme
          myprogramme mbsmoke ibn.prenatal mage

          Once again, any help is much appreciated
          Last edited by Matthew Smith Stata; 04 Nov 2022, 12:21.

          Comment


          • #6
            I'm sorry, but I cannot figure out what you seek from what you describe abstractly and your pseudo-code.

            Instead, provide a sample command to run myprogramme with a suitable first and second variable, and with, say, a third and fourth variable, all selected from your example data. And then list all the summarize commands you would expect myprogramme to run, using those variable names.

            Although, with that said, another approach would be write myprogramme to accept the following sort of syntax
            Code:
            myprogramme third_variable fourth_variable, outcome(first_variable) exposure(second_variable)
            which has the advantage of greater clarity, and is perhaps easier to code.

            Comment


            • #7
              Hi William,

              This program will be part of a larger program that has already been defined and the way the user specifies their variables is as "myprogramme varlist" (i.e., without the options you have specified for the outcome and exposure).

              My apologies for the lack of clarity. Basically, I have a varlist but I don't want to run the "summarize" command over the entire varlist because the first two variables in the varlist are the outcome and exposure variables. For example, say I wanted to check the balance of the covariate (i.e., third variable, fourth variable, etc.) between levels of the exposure variable. I would need to run the "summarize" command for the third variable (for example) for each level of the exposure variable (i.e., the second variable in the varlist). Is there a way of defining a new macro that contains the original varlist but without the outcome and exposure variables? And also a way of specifying the first "word" in varlist is the outcome variable, and the second "word" is the exposure variable?

              The bits in red are what I suspect I will need to create code to allow for the above.

              Code:
                 
                   * Load the data
              use "http://www.stata-press.com/data/r14/cattaneo2.dta", clear  
              
              * Define the programme
              
              capture program drop myprogramme  
              
              program define myprogramme    
              
              syntax varlist(numeric fv)    
              
              local macro for the first variable (i.e., the outcome)      
              
              local macro for the second variable (i.e., the exposure)      
              
              local macro specifying the varlist is the rest of the variables (but not including the outcome and exposure)      
              
              foreach var in `varlist' {        
              fvexpand `var'        
              if "`r(fvops)'"=="true" {            
                   foreach fvar in `r(varlist)' {                
              summarize `fvar' if [second variable]==0                
              summarize `fvar' if [second variable]==1            
              }        
              }        
              else {            
              summarize `var' if [second variable]==0            
              summarize `var' if [second variable]==1        
              }        
              }
              end    
              
              * Run the programme
              myprogramme mbsmoke ibn.prenatal mage

              I apologise if this is not clear again!

              Comment


              • #8
                I understand that you are locked into the current syntax.

                The explanation you have given is essentially a repeat of the explanation in post #11 and is not what I requested you provide.

                To write and test suitable code I need to know precisely what summarize commands would be run by the command
                Code:
                myprogramme first_variable second_variable third_variable fourth_variable
                for four variables selected from the example data. Select as your first_variable one that is similar to your outcome, a second variable that is similar to your exposure, and third and fourth variables that are similar to those you want to summarize - one continuous and one categorical.

                Let me add this if you want to proceed on your own: perhaps what you want is
                Code:
                tokenize `varlist'
                local outcome `1'
                macro shift
                local exposure `1'
                macro shift
                local varlist `*'
                But since this code is trivially adapted from the example in the output of help tokenize I would expect you to have experimented before asking if tokenize would be useful. And similarly with the example from help gettoken.

                Comment


                • #9
                  Hi William,

                  Many thanks for your help, that's just what I needed!

                  Thank you for your time

                  Comment

                  Working...
                  X