Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Averaging Consecutive Variable Values Based on Coupling

    Hi All,

    I would like to average the income values between head's and spouse's in my dataset. Heads and Spouses will have the same interview number within the same year. So for example, the first two observations are a coupled head and spouse, so I would like their individual incomes to to be 4114 instead of 4208 and 4020. But the third observation to stay as is - 15552. How can I do this for all observations? Additionally, is there a way to do this for many income variables at once?

    Many thanks in advance,
    Cora

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(totalincome ID) int year float(intnum head spouse)
     4208  443172 2017 1 . 1
     4020  443003 2017 1 1 .
    15552  459001 2013 2 1 .
    67000 2000030 2019 2 . 1
     7800  581003 2013 3 1 .
    13800  431002 2019 3 1 .
    41170 1459003 2005 4 1 .
    22836  750001 2007 4 1 .
    33000 6838177 2017 4 1 .
    32000 2862003 2019 4 1 .
    32883  161004 2015 5 1 .
    42400 1880003 2015 6 1 .
    17500  855171 2017 6 1 .
    20000   45173 2007 7 1 .
    18000   45003 2007 7 . 1
    20000 2722003 2013 7 . 1
    41000 2722171 2013 7 1 .
    19538  649001 2013 8 1 .
    21800 6112005 2015 8 1 .
    end

  • #2
    Code:
    by intnum year, sort: egen wanted = mean(cond(inlist(1, head, spouse), totalincome, .))
    Notes:
    1. If there are other IDs with the same intnum who are not designated as head or spouse, their incomes are not included in the average.
    2. While it is not important for the present purpose, coding yes-no variables as 1/missing is a recipe for trouble in Stata. I strongly urge you to change that to 1/0 coding before you get bitten by that mistake.
    3. I don't know what you have in mind when you say "doing this for many variables." The only variable you could meaningfully average in this data is the totalincome. None of the others make sense for averaging.

    Comment


    • #3
      See also https://www.stata-journal.com/articl...article=dm0055 Section 9

      https://www.stata-journal.com/articl...article=dm0099

      Comment


      • #4
        Or, more simply,
        Code:
        egen avginc = mean(totalincome), by(intnum year)
        which would work fine if you only have heads and their spouses in your dataset (so that the -cond- in #2 is always satisfied).

        Comment


        • #5
          Thanks Clyde, Nick and Hemanshu - these options work! Thanks so much!

          Clyde and Hemanshu, there are only heads and spouses in the dataset - thanks for these notes!

          Clyde, with respect to point #2 - noted and appreciated. Regarding point #3, I should have included a few other variables, but for simplicity in creating the average, only posted this one. There are about 100 variables in my dataset that I need to do this operation to.

          All, is there a way to do this that simply replaces totalincome instead of generating a new variable? I tried using replace, but receive and "Unknown function mean r(133)" error.


          If it is helpful, here is an extract that includes a few more income variables, so that my question on how to do many at once (if possible) is clearer.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float ID int year float(intnum head spouse totalincome intinc rentinc unemploytran vapensiontran)
           443172 2017  1 . 1   4208    0     0    0     0
           443003 2017  1 1 .   4020    0     0    0     0
           459001 2013  2 1 .  15552  624     0    0     0
          2000030 2019  2 . 1  67000    0     0    0     0
           581003 2013  3 1 .   7800    0     0 3000     0
           431002 2019  3 1 .  13800 6000  6000    0     0
          1459003 2005  4 1 .  41170    0  3300    0     0
           750001 2007  4 1 .  22836   36     0    0     0
          6838177 2017  4 1 .  33000    0     0    0     0
          2862003 2019  4 1 .  32000    0     0    0     0
           161004 2015  5 1 .  32883    0     0    0     0
          1880003 2015  6 1 .  42400    0     0    0     0
           855171 2017  6 1 .  17500    0     0    0     0
            45173 2007  7 1 .  20000    0     0    0     0
            45003 2007  7 . 1  18000    0     0    0     0
          2722003 2013  7 . 1  20000    0     0    0     0
          2722171 2013  7 1 .  41000    0     0    0 16000
           649001 2013  8 1 .  19538   75     0    0     0
          6112005 2015  8 1 .  21800    0  4400    0 12000
          1468176 2017  8 1 .  22000    0     0    0     0
           988031 2019  8 1 . 133001    1     0    0     0
          2829002 2013  9 1 .  19250    0     0    0     0
           644021 2007 10 1 .   9600    0     0    0     0
          6060001 2009 10 1 .  22032    0     0    0     0
          1475004 2011 10 1 .    260    0     0    0     0
           236001 2019 10 1 .   3266   66  2100    0     0
          1435003 2005 11 1 .  17000    0     0    0     0
          1303174 2007 11 . 1   9000    0     0    0     0
          1303002 2007 11 1 .  63000    0     0    0     0
          1438021 2013 11 . 1  94100  200     0    0     0
          1438184 2013 11 1 .  36020   20     0    0     0
          1085001 2019 11 1 .    700    0     0    0     0
             5004 2005 12 . 1   7950    0     0    0     0
          1131172 2011 12 1 . 108414    0     0    0     0
            47001 2015 12 1 .  18592   16 11100    0     0
           145170 2017 12 1 .  62250    0     0    0     0
           145003 2017 12 . 1   5250    0     0    0   250
           415002 2019 12 1 .   2509    0     0    0     0
          5048037 2011 13 1 .  21000    0     0    0     0
           750001 2005 14 1 .  26400    0     0    0     0
            40031 2009 14 . 1  31000    0     0    0     0
            40181 2009 14 1 .  28000    0     0    0     0
          1954002 2015 14 . 1   2340    0     0    0     0
          1954001 2015 14 1 .  19652    0 15600    0     0
          1482002 2017 14 1 .  27289    0     0    0     0
           577030 2019 14 1 .  75000    0     0    0     0
          1328004 2015 15 1 .  73537  500     0    0     0
          2180004 2019 15 1 .   3500    0     0    0     0
           898002 2009 16 1 .  15349  171     0    0     0
           644030 2019 16 1 .  45000    0     0    0     0
           227001 2005 17 1 .  90601   18     0    0     0
           284001 2009 17 1 .  71400    0  2000    0     0
           284002 2009 17 . 1  31000    0     0    0     0
          5459005 2011 17 1 .  50000    0     0    0     0
           722003 2013 17 1 .   3185  375     0    0     0
          2142003 2015 17 1 .   8520    0     0    0     0
          2131001 2017 17 1 .  90000    0 20000    0     0
          6125004 2005 18 1 .  10000    0     0    0     0
          2379172 2015 18 1 .  90000    0     0    0     0
          1687175 2017 18 1 .  22900    0     0    0     0
          1687030 2017 18 . 1  30684    0     0    0     0
          5299002 2009 19 1 .   1552    0     0    0     0
          2210021 2013 19 1 .  31836    0     0    0 15000
          6150004 2015 20 . 1  38000    0     0    0     0
          2483033 2017 20 . 1  48000    0     0    0     0
          6125002 2005 21 1 .   3468    0     0    0     0
          2260004 2013 21 1 .  60000    0     0    0     0
           586004 2017 21 1 .  51000    0     0    0     0
           315002 2019 21 1 .  28400 2000     0    0     0
          6212002 2005 22 1 .   8448    0     0    0     0
            60172 2007 22 1 . -10000    0     0    0     0
            40005 2011 22 1 .  83374    0     0    0 16644
           593171 2017 23 1 .  80000    0     0    0     0
           593004 2017 23 . 1 106000    0     0    0     0
          2662003 2019 23 1 .  15975   75     0    0     0
          2095002 2005 24 1 .  20980 6016     0    0     0
           980003 2015 24 1 .  51000    0     0    0     0
          2483032 2017 24 1 .  40000    0     0    0     0
          1505005 2019 24 . 1  51000 5000     0    0     0
          1505170 2019 24 1 .  60500    0  4500    0     0
           429002 2017 25 1 .  47679    0     0    0     0
          2615006 2011 26 1 .  60072   12     0    0     0
          2758004 2019 26 1 .  14000    0     0    0     0
          1213002 2005 27 1 .  22500    0 16800    0     0
           772021 2009 27 1 .  12936    0     0    0  4236
            84003 2015 27 1 .  58000    0     0    0     0
          5145002 2017 27 1 .  71988    0     0    0     0
          6838177 2019 27 1 .  45000    0     0    0     0
           459004 2019 28 1 . 163150    0     0    0     0
           459170 2019 28 . 1  46900    0     0    0     0
          2011002 2007 29 1 .   1000  400     0    0     0
            18021 2015 30 1 .  54040    0     0    0 18000
          3006001 2017 30 1 .  14400    0     0    0     0
          3006002 2017 30 . 1  39000    0     0    0     0
          2280176 2005 31 1 .   4500    0     0    0     0
          1151032 2009 31 1 .   5540    0     0    0     0
           915005 2019 31 . 1  18060    0     0    0     0
          2867030 2011 32 1 .  10000    0     0    0     0
          2005030 2017 32 . 1  29000    0     0    0     0
          2005176 2017 32 1 .  26000    0     0    0     0
          end

          Many thanks,
          Cora

          Comment


          • #6
            Code:
            sort intnum year
            foreach v of varlist totalincome-vapensiontran {
                by intnum year: egen wanted = mean(`v')
                replace `v' = wanted
                drop wanted
            }
            You cannot use -replace- directly with the -mean()- function; -mean()- is only available with -egen-. But as you can see above, that limitation is easy to work around.

            I have abbreviated the entire list totalincom intinc rentinc unemploytran vapensiontran with the wildcard expression totalincom-vapensiontran. This is possible because, in the example shown, these variables are all immediately next to each other in the data set. So you can abbreviate that by writing the first and last variables joined by a -. But if they are not all consecutive, then this code will not work as desired. In that case you need to find some way to build a list of all the variables you want to do this to.

            One possibility is if you want to do this to all of the variables other than ID, year, intnum, head, and spouse (but these variables are themselves interspersed among the variables you want to average in the real data set). In that case:
            Code:
            sort intnum year
            ds ID year intnum head spouse, not
            foreach v of varlist `r(varlist)' {
                by intnum year: egen wanted = mean(`v')
                replace `v' = wanted
                drop wanted
            }
            The worst case scenario is if the variables you want to average are all scattered and isolated throughout the data set. In that case, you may have to actually list them out. Or, not as bad, perhaps there are some groups of variables that are adjacent, but then separated from the next such group by variables not to be averaged, in which case a series of this_var-that_var the_next_var-the_other_var... will do the job.

            Comment


            • #7
              If this is just one of several things you anticipate needing to do to the income variables, it might be useful to either
              1. spend a bit of time renaming them in a more consistent pattern; in this, you might find some useful ideas in
                Code:
                help rename group
              2. or create a macro as a shortcut to referencing them later, e.g.
              Code:
              local incvarlist totalincome *inc *tran
              
              foreach v of varlist `incvarlist' {
              ...

              Comment


              • #8
                Clyde, Brilliant. Much thanks!

                Hemanshu, thanks for the extra tips!!

                -Cora
                Last edited by Cora Touchstone; 17 Aug 2022, 07:06.

                Comment

                Working...
                X