Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Very simple example of foreach over one string variable with several values

    Dear forum,

    Im struggling to learn how to use the foreach loop using a string variable with several values as indicator var for regression. The examples I have found so far have been very intricate and not exactly what I was looking for.

    In the example below, imagine that I just want to use each value in "drug" as an indicator variable and display each of them.

    How do I do that?

    Code:
    sysuse cancer,clear
    
    tab drug,miss
    
    *two different aproaches I have tried
    
    *method 1
    foreach x of varlist drug {
        display "`x'"
    }
    
    *method 2
    foreach y in drug {
        display "`y'"
    }
    Thank you so much for advise and tips!
    ---
    Im using stata 17 for windows

  • #2
    Loops are not needed to use "a string variable with several values as indicator var for regression". The variable drug is not a string variable:

    Code:
    . sysuse cancer
    (Patient survival in drug trial)
    
    . des drug
    
    Variable      Storage   Display    Value
        name         type    format    label      Variable label
    --------------------------------------------------------------------------------------------------
    drug            byte    %8.0g      type       Drug type
    
    . summ drug
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
            drug |         48       1.875    .8410986          1          3
    
    . tab drug
    
      Drug type |      Freq.     Percent        Cum.
    ------------+-----------------------------------
        Placebo |         20       41.67       41.67
          Other |         14       29.17       70.83
             NA |         14       29.17      100.00
    ------------+-----------------------------------
          Total |         48      100.00
    it is a byte variable that has 3 levels.

    How you use such a variable in regression expanded as dummies is

    Code:
    . reg died i.drug
    
          Source |       SS           df       MS      Number of obs   =        48
    -------------+----------------------------------   F(2, 45)        =      9.14
           Model |  3.17202381         2   1.5860119   Prob > F        =    0.0005
        Residual |  7.80714286        45  .173492063   R-squared       =    0.2889
    -------------+----------------------------------   Adj R-squared   =    0.2573
           Total |  10.9791667        47  .233599291   Root MSE        =    .41652
    
    ------------------------------------------------------------------------------
            died | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            drug |
          Other  |  -.5214286   .1451444    -3.59   0.001    -.8137644   -.2290928
             NA  |  -.5214286   .1451444    -3.59   0.001    -.8137644   -.2290928
                 |
           _cons |        .95   .0931375    10.20   0.000     .7624113    1.137589
    ------------------------------------------------------------------------------

    Comment


    • #3
      Dear Joro,
      you are absolutely right, I am sorry my example to match my own data was bad - drug is of course a numerical var !

      But you pointing that out to me, also helped me realize how to solve my own dataset problem where "drug" is a string variable.
      (Instead of trying foreach on my string var, I convert my own string to numeric using encode and apply forvalues ! )


      *So instead of trying something like 'if drug was a string':
      Code:
      foreach x of varlist drug {
      regress died studytime if drug=="`x'"
      I shoud instead use:
      Code:
      *encode own_string_var, gen(drug)
      forval i=1/3 {
      regress died studytime if drug==`i' 
      }

      Comment


      • #4
        Assuming the string variable has a relatively limited number of values, the approach in #3 is practical. However, sometimes a string variable is an identifier (e.g. medical record number with 10 digits) and the number of values is too large of use with -encode-. Or, there is also the issue of looping over the values of a numeric variable whose values cannot be neatly summarized in a short list like 1/3. So the general approach is:

        Code:
        levelsof x, local(xlist)
        foreach x of local xlist {
            display `"`x'"'
        }
        This works for both numeric and string variables. (If x is a numeric variable, the outermost `" "' in the -display- command are not needed, though they do no harm if there. But the innermost ` ' around x must still be there.)

        Comment


        • #5
          That is very neat - thank you so much to both Joro and Clyde for your ideas and solutions!

          Comment

          Working...
          X