Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating new variables with total() and nvals(), hoping to avoid creating missing values with "if"

    Hello,

    I've been creating new variables in my dataset of panel data to show total days of follow-up by month of follow-up. I used the following code to do this successfully:
    by month: egen followup= total(days)
    I also needed to know the number of facilities monitored, and added this variable using the nvals() function:
    by month: egen facilities=nvals(facility)

    My question arose when I was trying to add further variables displaying only information regarding intervention and controls, but to all observations in the month (rather than interventions showing missing in the control column and vice versa). I initially tried the following code, which yielded missing values for controls:
    by month: egen followup_int= total(days) if interv_site==1
    by month: egen facilities_int=nvals(facility) if interv_site==1

    I discovered a useful workaround for the total() function in Stata's FAQ pages:
    by month: egen followup_int= total(days*(interv_site==1))

    I'm wondering (a) if there's another way to tell Stata to apply the total (found using the if statement) to all observations (by month) without sort of tricking it as above, and (b) if there is a way to make Stata similarly fill all cells in the new column with the number of intervention facilities monitored by month.

    I'm sure there are multiple ways of working around this, but thought I would ask the experts in case there's a simple command or option I'm not aware of. Thank you!

    Julia

  • #2
    As I understand it the question can be taken to pivot on the fact that the egen function nvals() works only with a variable name argument.

    Note that that function is part of the egenmore package from SSC and must be installed before you can use it, as you are asked to explain.

    You've already noticed that the total() function is much more flexible, as it can feed on any expression. That's really the thread out of the maze here.

    As implied by http://www.stata-journal.com/sjpdf.h...iclenum=dm0042 (see p.563 esp.) nvals() is redundant, and probably was when first written back in 2000. (The less-than-perceptive author was me.)

    Consider this fragment:

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . egen tag = tag(foreign rep78)
    
    . egen nvals = total(tag), by(foreign)
    
    . tabdisp foreign, c(nvals)
    
    ----------------------
     Car type |      nvals
    ----------+-----------
     Domestic |          5
      Foreign |          3
    ----------------------
    In short, although I can't follow all the details of your example, the tag() function of egen can be used to create indicators, which could then be combined with other arguments to total().

    The Stata Journal article cited is a review of counting distinct observations.

    Comment


    • #3
      Thank you for your help!

      Comment

      Working...
      X