Finding the sum of a variable within an ID is not working

Vignesh Ponnusamy

Join Date: May 2018
Posts: 29

Finding the sum of a variable within an ID is not working

08 Apr 2019, 22:09

Hi all,

For the following example data set below, I am trying to create a new variable (let's call it as sum3_4), which is the sum of the variable 'votes_margin_u', for each id and for the two possible values of the variable 'order_u'. The variable called 'order_u' is the rank of each individual. It takes 2 possible values, either 3 or 4.

Initially, I create a group ID which is the combination of 3 variables - state_u, const_u and year_u, as below:

egen id = group(state_u year_u const_u )

Then, I try to create the new variable, sum3_4 which is the sum of the variable 'votes_margin_u' for 'order_u' values of 3 and 4, under each id.

bysort id (order_u): gen sum3_4 = votes_margin_u[3] + votes_margin_u[4]

But, the above set of commands are giving me missing values. I have used the same commands for a different data set, which worked perfectly. Moreover, if I create the ID variable based only on the combination of state_u and const_u, then it is working. But if use the combination of variables state_u, year_u and order_u to create the group id, and then create the variable sum3_4, then it is not working.

Any suggestion would be helpful. And the example dataset is as below.

Regards

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str14 state_u str24 const_u float year_u str55 candidate_u float order_u long votes double votes_margin_u
"Andhra Pradesh" "Achampet"     2004 " YAMGONDI VENKATAIAH"      3  5723 .046903301030397415
"Andhra Pradesh" "Achampet"     2004 " Y.P.KASHAIAH"             4  5535  .04536253213882446
"Andhra Pradesh" "Achanta"      2004 " JOSHIP MERIPE"            3  4053 .043371714651584625
"Andhra Pradesh" "Achanta"      2004 " K. SUJATHA KUMARI"        4  1696 .018149131909012794
"Andhra Pradesh" "Alair"        2004 " DR. ETIKALA PURUSHOTHAM"  3  2719  .02296549640595913
"Andhra Pradesh" "Alair"        2004 " THUTI SANJEEVA"           4  2026 .017112208530306816
"Andhra Pradesh" "Allavaram"    2004 " ETHAKOTA THUKKESWARA RAO" 3  4523  .04949498176574707
"Andhra Pradesh" "Allavaram"    2004 " G.RATNAMANI"              4  1454 .015911055728793144
"Andhra Pradesh" "Alur"         2004 " C.GOVINDANNA"             3  1828  .02274056151509285
"Andhra Pradesh" "Alur"         2004 " K.SEKHAPPA"               4  1391 .017304223030805588
"Andhra Pradesh" "Andole"       2004 " POTTIGARI VENKATESHAM"    3  2206  .01894017495214939
"Andhra Pradesh" "Andole"       2004 " MANNE BHARATHI"           4  1957  .01680232211947441
"Andhra Pradesh" "Asifabad"     2004 " JANGAMPELLI RAJAMALLU"    3 14847  .12754058837890625
"Andhra Pradesh" "Asifabad"     2004 " PATI SUBHADRA"            4  8053  .06917790323495865
"Andhra Pradesh" "Bhadrachalam" 2004 " SONDE VEERAIAH"           3 10074  .07486066967248917
"Andhra Pradesh" "Bhadrachalam" 2004 " GONDI NAGARJUNA"          4  3816 .028356989845633507
"Andhra Pradesh" "Boath"        2004 " MADAVI RAJU"              3  3491 .034615423530340195
"Andhra Pradesh" "Boath"        2004 " PENDUR NEELAM"            4  1853   .0183736402541399
"Andhra Pradesh" "Burgampahad"  2004 " NUPA BHASKAR"             3  7290  .05384405329823494
"Andhra Pradesh" "Burgampahad"  2004 " BHIKSHAM KUNJA"           4  3413 .025208469480276108
end

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#2

08 Apr 2019, 23:36

Well in your example data, each id appears in exactly two observations, one of which has order_u = 3, and the other of which has order_u = 4. If this is characteristic of your entire data set, then the simplest solution is:

Code:

by id, sort: egen sum3_4 = total(votes_margin_u)

To be sure that those characteristics apply to your whole data set, I would precede that line of code with:

Code:

by id (order_u), sort: assert _N == 2 & order_u == _n + 2

Now if it turns out that those characteristics do not apply to your whole data set and you really need to pick out specifically those observations where order_u = 3 or 4 from among other observations where that does not hold:

Code:

by id, sort: egen sum3_4 = total(cond(inlist(order_u, 3, 4), votes_margin_u, .))

will do it.

Note that the above code will simply total up the values of votes_margin_u for all of an id's observations that have order_u equal to 3 or 4, even if there are more than one of each, or if there are no observations that fit one or both descriptions. If that is not what you want, post back with details about how to handle these situations.

As for why your code gives you missing values, when you write the expression votes_margin_u[3], I imagine you are thinking this will get you the value of votes_margin_u in the observation where order_u is 3. But that is not what Stata takes it to mean. Rather Stata interprets that as asking for the value of votes_margin_u in the third observation among all those for that id. But you have only two observations per id (at least in your example data), so there is no third observation. Hence the missing value. Similar logic applies to votes_margin_u[4].
1 like
Comment
Vignesh Ponnusamy

Join Date: May 2018

Posts: 29
#3

11 Apr 2019, 20:07

Thank you, Clyde.
I just checked it and works perfectly.

Regards
Comment

Announcement

Finding the sum of a variable within an ID is not working

Comment

Comment