Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Add variables if variable name is partially the same

    Hello Statalisters,
    my name is Christian Rose and I have a problem using Stata.

    It is probably the easiest way to explain my problem using a an image.

    The image shows my data situation.
    I have done some tests for the C variables.
    Based on these tests I have generated new variables named "Test_..." where the part after the "_" depends on the test and which C variables were used for the test.
    The test variables only contain 1 and 0 and indicate if the p-value of the test was above or below 0.05.
    Now my problem:
    I want to add all test variables if they have the same ending. For example I would like to add up Test1_13_13 and Test4_13_13 because the both have the ending 13_13.
    The reason I want to do this is that the 13_13 indicate that the test was done for two companies each belongs to group 13 (indicated by the variable name).
    So I would like to know how many tests between variables of group 13 and 13 resulted in a p-value less than 0.05 and how many resulted in a p-value above 0.05.
    I would like to know the same for the groups 13 and 15 and in my real sample I have many more groups.

    I hope you understand my problem. Otherwise I am happy to clearify what I mean.
    Sorry for my bad english and my lack of knowledge.
    I have tried different things and I have searched a lot on the internet but without results. Maybe, because I am looking for the wrong commands as I am not familiar with Stata.

    Thank you for your help.

    Christian

    Click image for larger version

Name:	image_883.png
Views:	3
Size:	11.2 KB
ID:	276473
    Attached Files
    Last edited by Christian Rose; 01 Oct 2014, 05:59.

  • #2
    The image is not readable, at least by me on the machine I am using.

    I advise that you contact the forum administrators to change your identifier to "Christian Rose". Use the "Contact us" button in the bottom right corner.

    If you want a sum within observations across variables, use the rowtotal() function of egen. You can specify wildcards in the varlist e.g. *13_13

    Comment


    • #3
      Mr. Cox, thank you for your help.

      Does the image work, now? I have tried to upload it again. Otherwise I will draw a table by hand.

      I have contacted the admin to change my indentifier.

      rowtotal() helps a lot and brings me to closer to a solution.
      Is there another comment which counts the variables? So that I can divide the rowtotal by the number of variables with *13_13?
      Furthermore, is it possible to use the egen and rowtotal within a loop? I have a lot of different combinations and not only *13_13 . They all look like _XX_YY and are all at the end of the varname.

      Code:
      foreach i of varlist Test*{
      
      local x = substr("`i'",-5,.)
      egen `x' = rowtotal(*`x')
      
      }
      The code will not work. For the first 13_13 variable it will work and the variable will be generated but for the next 13_13 variable Stata will tell me that this variable already exists.
      Is it possible to replace the oldvariable within an egen function?
      Replacing would not cause any problems as the rowtotal is the same (but Stata does not know it )

      Thank you for your help Mr. Cox.

      Comment


      • #4
        You need to loop over groups of variables, not individual variables.

        Suppose your suffixes are 13_13, 14_14, 15_15.

        Code:
         
        foreach s in 13_13 14_14 15_15 { 
            egen _`s' = rowtotal(*_`s') 
        }

        Comment


        • #5
          I am not sure if I am getting this right.
          I do not exactly know how many and which suffixes I have.
          So, I think I first have to extract all suffixes to a list right?

          Thank you so much.

          Comment


          • #6
            I don't understand how you would not know how many and which suffixes you have, given that you yourself created these variables by running some tests. But, in any case, if you know how many tests there were, it is not hard to create the list of suffixes. For purposes of illustration, I will assume that you did exactly 4 tests (to reflect what you show in your example data) and that they are called Test1, Test2, Test3, and Test4.

            Code:
            unab suffixes: Test*
            
            forvalues j = 1/4 { // Replace 4 by actual number of tests
                local suffixes: subinstr local suffixes "Test`j'_" "", all
            }
            
            display "`suffxies'"
            should do the trick. If you don't know how many tests were done, or if they are not consecutively numbered it will be more complicated.

            Comment


            • #7
              Thank you very much.
              I know the number of tests as I have a count variable. Sorry for my poor answer.
              I have tried your code (if someone else needs so code: in the last line it has to be suffixes) and I get a list of suffixes.

              Afterwards I tried:
              Code:
               foreach s in `suffixes' {  
                 egen _`s' = rowtotal(*_`s')  
              }
              The problem now is that I have duplicates in the suffixes and if I want to name the variable like the suffixes I get an error when the first duplicate appears (because the variable is already defined) Is it possible to delete all duplicates in the "suffixes" -list? Or should I go with capture confirm sothat the egen/rowtotal function is only applied if the variable is not defined yet. I have another question: Your answers have helped me a lot to get rowtotal. Now, I would also like to now how many variables with the same suffixes are there. The previous command only gives me the number of variables which have a value of 1 (because it is the sum) but is it also possible to get the count and include the variables which values are 0? An inelegant solution would be to set all values of the test variables to 1 after I have calculated the rowtotal with zeros and than calculate the rowtotal again. But maybe there is an easier way to get what I am looking for? Thank you for your help
              Last edited by Christian Rose; 01 Oct 2014, 09:40.

              Comment


              • #8
                Please structure your question a little e.g. by numbering or bullet points or at least more paragraphing.

                If you have a local macro suffixes then

                Code:
                local suffixes : list uniq suffixes
                removes duplicate names. To find the documentation on this, go

                Code:
                 
                help macro 
                help extended fcn
                help macrolists
                although once on the first help page you should be to click your way through.

                To count variables, the egen function rownonmiss() should do what you want.

                Comment


                • #9
                  Sorry I didn't think of that duplicates problem before. That is easy to solve. Put the following code
                  Code:
                  local suffixes: list uniq suffixes
                  after you exit the -forvalues- loop that generates the suffixes macro.

                  Concerning the number of Test variables having a given suffix, you can capture those in macros with the following (after the code above has been run):
                  Code:
                  foreach s of local suffixes {
                      unab count_`s': Test*_`s'
                      local count_`s': word count `count_`s''
                  }
                  Finally, let me ask why you want these counts. If you need them in their own right, do the above. If you need them only to calculate mean from your totals, then you can do that without explicitly counting the variables with:

                  Code:
                  foreach s of local suffixes {
                      egen mean_`s' = rowmean(*_`s')
                  }

                  Comment

                  Working...
                  X