Strange functionality for egen rowtotal?

Nick Cox

Join Date: Mar 2014

Posts: 35811
#16

25 Oct 2016, 03:10

Different problems can call for different solutions.

Niels Henrik: Why is summarize or regress or any other statistical command not irrational in ignoring missings to the extent possible?
Comment
Niels Henrik Bruun

Join Date: Aug 2014

Posts: 555
#17

25 Oct 2016, 03:21

@Jesse: Nice to know that someone can use rowtotal (And I can of course too), however in your case isn't that dangerous if there are negative values in the sum?

Kind regards

nhb
Comment
Niels Henrik Bruun

Join Date: Aug 2014

Posts: 555
#18

25 Oct 2016, 03:30

Nick Cox Ignoring missing values are of course necessary in some commands and some functions. The commands you mention returns multiple outputs in which you can return number of missing values as well as the sum of nonmissing values.

However a single valued function such as rowtotal I would say it should return missing when there are missings. Or at least having the option for doing so.
I agree that this is a matter of opinions

Kind regards

nhb
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#19

25 Oct 2016, 03:33

Originally posted by Niels Henrik Bruun View Post

@Jesse: Nice to know that someone can use rowtotal (And I can of course too), however in your case isn't that dangerous if there are negative values in the sum?

In my case, the values are either zero, missing or positive. But if they weren't you'd be right of course. Another useful case is when figuring out whether percentages add up to 100%. People generally don't fill in zeroes, but rather leave the field blank. Here too a simple rowtotal() gives you the answer in one/two lines.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#20

25 Oct 2016, 12:24

And in my line of work, we often deal with questionnaires where there is a long series of related questions having the same response set (Yes/No, or 1-5 rating scale or something like that). In real life people sometimes skip some of the items. But the desired end result is often the mean of those items to which a response was given (or, less often, but still fairly frequently, the total of those items to which a response was given). (Sometimes, conditional on the number of non-missing responses exceeding some threshold.)

An even more compelling example is when working with data where there is a series of items, headed by (check all that apply) and what is wanted is the total number of boxes checked. Ideally one would like to distinguish not-checked meaning no, from no-checked meaning question not answered. But this response format does not support that distinction. So in adding this up, the convention is to treat non-response as zero-response. While one could argue that this is such a severe flaw of the check-box response set that it shouldn't be used at all (and I might agree with that argument), it is well-entrenched in popular usage and one can't escape it.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment