Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subsetting variable lists with other variable lists, and then using the result in gen/egen?

    Dear STATA Forum,

    Perhaps this is a relatively simple request and I am just not thinking about it in the correct way (most of my background is in R). The situation is as follows: there are 50 different variables, only some of which will need to be summed. For example, variables 1-10 might need to be summed. Then, the rest of the 50 variables, 11-50, will need to be summed as well.

    I've already hard-coded it and obtained the correct values, but I was wondering if there might be a way to do this by manipulating varlists in STATA's programming language?

    Let's say we have the following dataset:

    Code:
    clear
    input x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
    1 2 3 4 5 6 7 8 9 10
    4 5 6 7 8 9 10 1 2 3
    end
    And let's say we define the first varlist as follows:
    Code:
    varlist temp "x1 x2 x3"
    The main question: Is there a way to store and then subtract from the default _all varlist the `temp' variables object? Apologies as that was a bit of a wordy statement.

    Proposed pseudo-code:

    Code:
    local temp "x1 x2 x3"
    local temp2 _all-`temp' ///NB the '-' is interpreted as defining a range, need a workaround here...
    egen newvar1 = rowsum(`temp')
    egen newvar2 = rowsum(`temp2')
    The idea is to automatically define the complimentary temp2 object as all of the 50 variables minus those defined in the original temp object. Ideally both temp and temp2 would then automatically summed, although I was running into some issue doing sums of many variables (as indicated in earlier posts). The motivation behind this is to reduce the amount of user-inputs as possible as to avoid error (i.e. missing one variable out of 50 on the defined varlist, etc).

    Thanks in advance!

    Using STATA 15 MP.
    Last edited by Jeffery Sauer; 21 Oct 2018, 13:55. Reason: Clarity, adding detail.

  • #2
    The good news is that the following code, applied to your sample data, demonstrates the generation of the lists of variable names.
    Code:
    . local temp1 "x1 x2 x10"
    
    . unab  tempn : x1-x10
    
    . local temp2 : list tempn - temp1
    
    . macro list _tempn _temp1 _temp2
    _tempn:         x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
    _temp1:         x1 x2 x10
    _temp2:         x3 x4 x5 x6 x7 x8 x9
    
    . 
    . egen nv1 = rowtotal(`temp1')
    
    . list `temp1' nv1, noobs
    
      +---------------------+
      | x1   x2   x10   nv1 |
      |---------------------|
      |  1    2    10    13 |
      |  4    5     3    12 |
      +---------------------+
    Last edited by William Lisowski; 21 Oct 2018, 15:15.

    Comment


    • #3
      William Lisowski Thank you so much! This is absolutely fantastic news! Will definitely help with the reproducibility of the results.

      I'm not sure what I was doing wrong with my original pseudo code, would you mind explaining to me the following:

      'unab tempn : x1-x10' - what is this doing? I understand unab unabbreviates, but is it unabbreviating the default _all varlist?

      'local temp2 : list tempn - temp1' - does the semicolon and list change how STATA reads the following statement (i.e. tempn - temp1)?

      Thanks again for your response!

      Comment


      • #4
        'unab tempn : x1-x10' - what is this doing? I understand unab unabbreviates, but is it unabbreviating the default _all varlist?
        Well, the easy answer to "what is this doing" is that it is expanding the varlist "x1-x10" - not the "_all" varlist - into the list of individual variables it represents.

        In this case _all and x1-x10 are the same, but that need not be the case. My assumption was that along with your variables that may or may not need summing, you also might have other variables that you would exclude from both temp1 and temp2. So rather than rely on _all which would include the unsummable variables, I start with a varlist that includes just the summable variables.

        'local temp2 : list tempn - temp1' - does the semicolon and list change how STATA reads the following statement (i.e. tempn - temp1)?
        The colon (not semicolon) introduces a macro extended function. Viewing the output of help macro you notice ":extended_fcn" in the earliest lines; click on one of them for a description of extended macro functions available for your use. Scroll down that to "Macro extended functions for manipulating lists" and you'll see the "list" extended function; click on "macrolist_directive" to learn about these extended functions.

        Comment


        • #5
          Apologies for the delayed response and grammatical mistake! This is very helpful and indeed has solved my problem. I've combined your answer with Nic Cox's input here (https://www.statalist.org/forums/for...wtotal-with-by) on summing by groups to achieve the intended result.

          Thanks again and huzzah!

          Comment


          • #6
            For those interested, full example code demonstrating my objective:

            Code:
            clear
            input x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
            1 2 3 4 5 6 7 8 9 10
            4 5 6 7 8 9 10 1 2 3
            10 9 8 7 6 5 4 3 2 1
            end
            
                gen grp = ""
            
                replace grp = "a" in 1
                replace grp = "b" in 2
                replace grp = "a" in 3
            
            local temp1 "x1 x2 x10"
            
            unab  tempn : x1-x10
            
            local temp2 : list tempn - temp1
             
            egen nv1 = rowtotal(`temp1')
            
            list `temp1' nv1, noobs
            
            
              +---------------------+
              | x1   x2   x10   nv1 |
              |---------------------|
              |  1    2    10    13 |
              |  4    5     3    12 |
              | 10    9     1    20 |
              +---------------------+
            
            
            
            egen nv2 = total(nv1), by(grp)
            
            list `temp1' nv1 nv2, noobs
            
            
              +-----------+
              | nv1   nv2 |
              |-----------|
              |  13    33 |
              |  12    12 |
              |  20    33 |
              +-----------+

            Comment

            Working...
            X