Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging 2 variables

    I have 2 variables. Both are values between 0 and 98. I want to create 1 new variable that is the total of the 2 separate variables. The variables are survey responses asking about the number of drinks consumed daily. All I want is a total variable that incorporates all of the responses from the 2 variables. They don't need to be matched. So if var1 has the values (0, 1, 3, 1, 2, 4, 5, 1, 2, 7) and var2 has (2, 2, 4, 5, 1, 2, 1, 7, 9, 11, 2, 3) I want a new variable that is (0, 1, 3, 1, 2, 4, 5, 1, 2, 7, 2, 2, 4, 5, 1, 2, 1, 7, 9, 11, 2, 3) so that I can find the mean value of this new set.

  • #2
    This question was originally asked at https://www.statalist.org/forums/for...ining-datasets. It was not clearly explained there. An answer was given that assumed the two variables came from different data sets. It is still unclear whether these two variables are in the same data set or separate ones. I'll give an answer here that assumes they are in the same data set.

    There are two straightforward way to do this. Assuming the variables really are named var1 and var2

    Code:
    gen obs_no = _n
    reshape long var, i(obs_no) j(source)
    The resulting data set will have the values of var1 and var2 combined into a single variable, var. The variables obs_no and source will identify where they originally came from (which may or may not be useful depending on where you're going with this.)

    The other way to do this is:

    Code:
    stack var1 var2, into(combined)
    The resulting data set will have a new variable combined, which is what was asked for, and also a variable _stack indicating, for each observation, which variable the observation originally came from (which, again, you may or may not need.)

    Comment


    • #3
      OK. I'll try to be clearer. The 2 variables I wish to combine are within the same dataset and are 2 variables out of 1000. All the variables in the dataset relate to different questions in a survey. I want to combine the data for 2 of the variables that are similar questions. When I attempted the stack code above it threatened to override all the data in my dataset. I've looked at merge, joinby, append but none of them seem to do what I think should be a straightforward command. I've tried gen (newvar) = (var1) + (var2) but this does not produce a combined set of variables.

      Comment


      • #4
        Mark:
        perhaps a difefrent -egen- function can do what you're after:
        Code:
        . use "C:\Program Files (x86)\Stata14\ado\base\a\auto.dta", clear
        (1978 Automobile Data)
        
        . egen mpg_trunk=rowtotal(mpg trunk)
        
        . list mpg trunk mpg_trunk in 1
        
             +------------------------+
             | mpg   trunk   mpg_tr~k |
             |------------------------|
          1. |  22      11         33 |
             +------------------------+
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Thanks Carlo, but I don't want the variables added together, I want a new variable that contains the all the original variables from one group and all the ones from another group so that I can determine the mean (for example) of this new bigger group of data.

          Comment


          • #6
            To be clearer I want to form a new group of variables that combines the variables from 2 different groups but does not alter the individual variables. Eg if one group called var1 has the following variables 1, 2, 0, 9 and var2 has 1, 1, 2, 7 I want to form a new group called vartotal that would be 1,2,0,9,1,1,2,7 so that I can determine the mean of vartotal (in this case 2.875)

            Comment


            • #7
              Mark:
              you may want to try something along the following toy-example, then:
              Code:
              . use "C:\Program Files (x86)\Stata14\ado\base\a\auto.dta", clear
              (1978 Automobile Data)
              
              . g new_var=mpg if foreign==0
              (22 missing values generated)
              
              . replace new_var=trunk if foreign==1
              (22 real changes made)
              
              . su new_var
              
                  Variable |        Obs        Mean    Std. Dev.       Min        Max
              -------------+---------------------------------------------------------
                   new_var |         74    17.32432    5.805307          5         34
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment


              • #8
                Thanks Carlo, when I try that I don't get a combination of the number of observations. Var1 has around 11000 observations and var2 has about the same. I want to generate a new variable that combines all of the observations for var1 with those of var2 to achieve a newvar with 22000 observations.

                Comment


                • #9
                  Mark:
                  is your dataset in -wide- or -long- format?
                  Kind regards,
                  Carlo
                  (Stata 18.0 SE)

                  Comment


                  • #10
                    If you want two variables with 11000 observations to be put into one with 22000 observations, then as Clyde already indicated the solution is either stack or reshape.

                    So far the thread -- to experienced Stata users such as Clyde and Carlo -- is executing a perfect circle: you're (seemingly) asking for a different data layout, being advised how to get it, but then saying that it's not what you want, yet asking all over again.

                    We can break the circle if all you want is the mean for the two variables combined, as the mean of all is just the mean of the two means. If there are missing values in either variable then the calculation is more complicated and indeed you need to decide more precisely what you want to do about missings.

                    Here is a silly example.

                    Code:
                    clear
                    input y1 y2
                    1  4
                    2  5
                    3  6
                    end
                    su y1
                    scalar mean1 = r(mean)
                    su y2
                    scalar mean2 = r(mean)
                    di (mean1 + mean2)/2
                    You can run this code -- and will see 3.5 displayed.
                    Last edited by Nick Cox; 26 Sep 2017, 03:15.

                    Comment


                    • #11
                      There is another way to do it. It's not good style and if all you want is the joint mean it's overkill. This example takes 11000 literally. You need the exact number and twice that.

                      Code:
                      preserve
                      set obs 22000
                      gen var3 = cond(_n <= 11000, var1, var2[_n - 11000])
                      su var3
                      restore
                      So var3[11001] is var2[1], var3[11002] is var2[2] and so on.

                      This is in effect a temporary stack that leaves your dataset untouched.
                      Last edited by Nick Cox; 26 Sep 2017, 03:35.

                      Comment


                      • #12
                        I'm going back to basics here, starting with the var1 and var2 as given in post #1.
                        Code:
                        clear
                        input float(id var1 var2)
                        100 0  2
                        101 1  2
                        102 3  4
                        103 1  5
                        104 2  1
                        105 4  2
                        106 5  1
                        107 .  7
                        108 1  9
                        109 2  .
                        110 7 11
                        111 .  2
                        112 .  3
                        end
                        local old_N = _N
                        local new_N = _N+_N
                        set obs `new_N'
                        generate combined = var1
                        replace combined = var2[_n-`old_N'] if _n>`old_N'
                        list, clean
                        summarize combined
                        Code:
                        . list, clean
                        
                                id   var1   var2   combined  
                          1.   100      0      2          0  
                          2.   101      1      2          1  
                          3.   102      3      4          3  
                          4.   103      1      5          1  
                          5.   104      2      1          2  
                          6.   105      4      2          4  
                          7.   106      5      1          5  
                          8.   107      .      7          .  
                          9.   108      1      9          1  
                         10.   109      2      .          2  
                         11.   110      7     11          7  
                         12.   111      .      2          .  
                         13.   112      .      3          .  
                         14.     .      .      .          2  
                         15.     .      .      .          2  
                         16.     .      .      .          4  
                         17.     .      .      .          5  
                         18.     .      .      .          1  
                         19.     .      .      .          2  
                         20.     .      .      .          1  
                         21.     .      .      .          7  
                         22.     .      .      .          9  
                         23.     .      .      .          .  
                         24.     .      .      .         11  
                         25.     .      .      .          2  
                         26.     .      .      .          3  
                        
                        . summarize combined
                        
                            Variable |        Obs        Mean    Std. Dev.       Min        Max
                        -------------+---------------------------------------------------------
                            combined |         22    3.409091    2.872846          0         11
                        Is this the sort of result you are looking for?

                        Please note the presentation of sample data using the dataext command. If I've misunderstood your description of your hypothetical data, please present it using dataex. Run ssc install dataex to install it, and then run help dataex to read the simple instructions for using it.

                        Comment


                        • #13
                          Thank you William!! That is exactly what I was after! Apologies for being unclear...

                          Comment


                          • #14
                            What would therefore be the code for combining the observations from 4 variables?

                            Comment


                            • #15
                              Having shown you how to accomplish what you asked, and thus confirmed that I understand what you were asking for, I'm now going to tell you that what you're asking for makes very little sense to me. Continuing to take you at your word from post #1 that you want to calculate the mean of all the non-missing values of var1 and var2, consider the following approach.
                              Code:
                              cls
                              * Example generated by -dataex-. To install: ssc install dataex
                              clear
                              input float(id var1 var2 year)
                              100 0  2 1995
                              101 1  2 2001
                              102 3  4 2013
                              103 1  5 1999
                              104 2  1 2000
                              105 4  2 2003
                              106 5  1 1986
                              107 .  7 1993
                              108 1  9 2005
                              109 2  . 2005
                              110 7 11 2007
                              111 .  2 2011
                              112 .  3 1996
                              end
                              reshape long var, i(id) j(varnum)
                              list if id<101, clean
                              summarize var if inlist(varnum,1,2)
                              Code:
                              . list if id<101, clean
                              
                                      id   varnum   var   year  
                                1.   100        1     0   1995  
                                2.   100        2     2   1995  
                              
                              . summarize var if inlist(varnum,1,2)
                              
                                  Variable |        Obs        Mean    Std. Dev.       Min        Max
                              -------------+---------------------------------------------------------
                                       var |         22    3.409091    2.872846          0         11
                              Doing the reshape prepares you to
                              • trivially summarize any set of variables by changing the arguments to the inlist function
                              • use multiple summarize commands, each with different arguments to the inlist function, to summarize multiple sets of varaibles with no further setup
                              Using the reshape long command requires that each observation have a distinct identifier, as the sample data does. If your data does not, you can add one that will serve the purposes of reshape long with the following command.
                              Code:
                              genereate id = _n

                              Comment

                              Working...
                              X