Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Shortcut for generating new variables based on many existing

    I have groups of variables contactoutcome_con_<y>_<x> and contactmethod_con_<y>_<x>, where x is a program from 1-8 and where y indexes the contact attempt (goes from 1-30) associated with that program. For example, contactoutcome_con_30_8 is the outcome of the 30th contact attempt made by the 8th program, and contactmethod_con_30_8 is the contact method that was used for that attempt. I need to generate a new variable "contact_success" based on the values of these two variable. For example, contact_success should = 0 if contactoutcome_con_<y>_<x>==9 AND contactmethod_con_<y>_<x>==11. What is the shortcut for doing this for all the 240 matching contactoutcome_con_<y>_<x> and contactmethod_con_<y>_<x> pairs? Thanks in advance!
    Last edited by Leeya Correll; 30 Jun 2022, 17:48.

  • #2
    So you want another series of variables, contact_success_<y_<x>, corresponding to the existing 240 pairs of outcome and method variables? You aren't happy with 480 variables in your data set? You want to add another 240? OK:

    Code:
    forvalues y = 1/30 {
        forvalues x = 1/8 {
            gen success_`y'_`x' ///
            = contactoutcome_con_`y'_`x' == 9 & contactmethod_con_`y'_`x' == 11
        }
    }
    Note: Code untested due to absence of example data, so beware of typos or other problems. But this is the gist of it.

    That said, you are starting with a data set that is unwieldy and turning into one that is nearly impossible to work with. Let me suggest a different way to work with this data.

    Code:
    gen long obs_no = _n
    reshape long contactoutcome_con_ contactmethod_con_, i(obs_no) j(yx) string
    split yx, parse("_") gen(xy) destring
    rename xy1 attempt
    rename xy2 program
    gen byte success = contact_outcome_con_ == 9 & contactmethod_con == 11
    drop yx
    This layout of the data will be much easier to work with for almost anything you would want to do with it in Stata. It is also much easier to manage because it contains only 6 variables instead of 700+. If your data set is very large, the initial -reshape- may be slow. You could speed that up by using -tolong- or -greshape-, community-contributed commands available from SSC, instead. (To get -greshape- you have install the -gtools- package from SSC.) But nearly everything will be much easier to do in Stata from this long layout.

    Added: In the future, when asking for code, show data examples, and please use the -dataex- command to do so.* If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    *-dataex- will not be able to process 480 variables, but the problem would be adequately illustrated with just a small number of these variables, say restricting y to 3 and x to 4 or something like that.




    Last edited by Clyde Schechter; 30 Jun 2022, 18:00.

    Comment

    Working...
    X