Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summing variables with missing values

    Hello,

    I have 3 columns (varA, varB, varC) containing dummy variables (0, 1, missing values "."). I would like to generate "varD" the sum of each row, e.g. if varA==1 varB==0 varC==1 I should have varD==2. This works clearly fine when there are no missing values (varD= varA + varB + varC). But is there an easy way to let Stata read a missing value as "0" so that
    varA==1 varB==. varC==1 would result in varD==2?

    Thank you so much in advance.

    Luisa


  • #2
    You can try
    egen <newvar> = rowtotal(varA varB varC)

    See help egen for more info.

    Comment


    • #3
      Yes, this is precisely what egen, rowtotal() does by default.

      Code:
      egen varD = rowtotal(varA varB varC)
      See the concurrent thread http://www.statalist.org/forums/foru...-egen-rowtotal

      Comment


      • #4
        Thank you so much for your help. It works perfect. The only problem is that when the three variables have missing values, the final sum of that row is 0. Is there any way to let Stata know not to replace by 0 when all the variables have missing values?

        Comment


        • #5
          Do please read either the link given in #3 or the help for egen. Either makes clear that the option missing gives the behaviour you want.

          Comment


          • #6
            I have just read it. Now everything is clear. Thank you!

            Comment


            • #7
              Nick Cox Hi.
              I would like two generate a new variable column3 which should be a value either present in column 1 or column 2 or any one of them .let suppose some values are missing in column1 but present in column2 , contrary some value are present in column 2 but missing in column 1, and some are missing in both columns, and some are present in both columns. i just want to generate column3 which is a new one and it is the value of column1 or column2 or any one of column1 or column2 (but not the sum of column1 and column 2).if missing in both it is ok if missing in column3 also.
              code year column 1 column 2 column 3
              2 2007 11 11 11
              2 2008 11 . 11
              2 2008 . 11 11
              2 2007 . . .

              Comment


              • #8
                #7 is trivial if there are no contradictions between column1 and column2. What do you want to happen if column1 is 11 and column2 is different, say 22?

                Comment


                • #9
                  column1 and column2 are from two different files both represent same variable.it is not possible that in column 1 its value is 11 and in column 2 its value is 22, if both values are present then its 100% sure that they are same but i need only one of them.but if one value is present in any one of these two columns then i just nedd that one value as well., so i just want to creat a new combine file in that no value is missing if it is present in any one of them. i dont want to creaate its sum because it can solve my missing value issue in one column but it will also add the two values if it is present in both code and year., i just want that it should be 11 not 22 in f2007 as well as in 2008 and in 2009. I just mean to do it as a conditional rowtotal or loops. i have 22000 firm year observaation from 2007 to 2016.
                  Last edited by Ayub UOM; 25 Jan 2022, 06:04.

                  Comment


                  • #10
                    I am not asking about sums. I am asking you to tell me which value you prefer if there are two different non-missing values on offer. Which you do want? Or do you want an average?


                    Here's the data example that should have been given in #7 using dataex, including an extra observation.


                    Code:
                    * Example generated by -dataex-. For more info, type help dataex
                    clear
                    input float(code year column1 column2 column3)
                    2 2007 11 11 11
                    2 2008 11  . 11
                    2 2008  . 11 11
                    2 2007  .  .  .
                    2 2009 11 12  .
                    end
                    A solution to your example is

                    Code:
                    gen wanted = min(column1, column2)
                    and another solution is


                    Code:
                    gen wanted = max(column1, column2)
                    and another solution is

                    Code:
                    egen wanted = rowmean(column1 column2)
                    but these solutions give different answers for column1 11 and column2 12.

                    Comment


                    • #11

                      Code:
                       gen wanted = min(column1, column2)

                      is the exact code i wanted, thank you so much for your time . it saved my time.
                      I am not interested in last row 2009 11 12
                      below is the correct data set in above #7 there are some mistakes,
                      Code:
                       Code    Year        Column1           Column2            Column3
                       2        2007          11                 11                   11
                       2        2008          11                 .                    11
                       2        2009          .                  11                   11
                       2        2010          .                  .                     .
                      Thank you once again for your kind reply and suggestion
                      Last edited by Ayub UOM; 25 Jan 2022, 07:08.

                      Comment


                      • #12
                        Is there other ways to sum variables with missing and to replace the latter variable?

                        I tried:
                        replace FIELD2 = FIELD1 + FIELD2
                        but it didn't work.
                        Thank you in advance.
                        C


                        Code:
                        * Example generated by -dataex-. For more info, type help dataex
                        clear
                        input double FIELD1 long FIELD2
                          1   .
                          1   .
                          2   .
                          3   .
                          5   .
                          6   .
                          6   .
                          8   .
                         17   .
                         46  93
                        105 212
                          .  33
                          . 304
                        end
                        Last edited by Chul Lee; 01 Mar 2022, 12:17.

                        Comment


                        • #13
                          I don't understand what you mean, esp by "to replace the latter variable"; if you want to sum variables and have missing values treated as 0, use the -egen- command with the "rowtotal" function; if you do not want a new variable but want instead to have either the sum or field1 if field2 is missing or field2 if field1 is missing you want to use the "cond" function; see
                          Code:
                          help egen
                          help cond()

                          Comment

                          Working...
                          X