This is my first post so I apologize in advance if I missed anything in the protocol for posting.

I am estimating a binary patient-level outcome variable

*CR*in an interrupted time series that will be interpreted as a hospital-level effect, and am using an indicator variable for policy implementation,

*HRRP*for patient data nested by hospital (patient i at hospital j) and am using the following stata code:

*glm cr i.hrrp c.elapsed_qtr i.hrrpXtime2 $season $patient $hospital $market , vce(cl new_hosp_num) family(binomial binomial_denominator_variable) link(logit),*where

*elapsed_qtr*and the interaction term are for the post-period time series, the

*$patient*are patient-level characteristics and

*$hospital*and

*$market*are hospital-level characteristics, and

*new_hosp_num*is the identifier for hospital j.

My post is about this

*binomial_denominator_variable.*

The Stata manual says this about the

*binomial_denominator_variable,*which it calls v

*arnameN:*

"The binomial distribution can be specified as 1) family(binomial), 2) family(binomial #N ), or 3) family(binomial varnameN ). In case 2, #N is the value of the binomial denominator N, the number of trials. Specifying family(binomial 1) is the same as specifying family(binomial).

"The binomial distribution can be specified as 1) family(binomial), 2) family(binomial #N ), or 3) family(binomial varnameN ). In case 2, #N is the value of the binomial denominator N, the number of trials. Specifying family(binomial 1) is the same as specifying family(binomial).

**In case 3, varnameN is the variable containing the binomial denominator, allowing the number of trials to vary across observations**."From two earlier posts, including one by Clyde Schechter on 25 May 2016 ("GLM and blogit for proportion variable:different results") and one by Nick Cox on 19 Aug 2014 ("Entropy measure DV in panel data: Best regression technique?"), I infer that in case 3, varnameN should be the denominator of the dependent variable.

**First, operationally:**

1. What is Stata actually doing when VarnameN is included in this manner? For example, is it taking the average across each varnameN group?

**Second, in my example:**

2. I have the numerator CR{1}, the denominator, CR{0,1} for total number of patients, and the frequency of CR at each hospital (so I don't need to use fracreg or betareg where the denominator isn't available). I would like to have stata take into account the hospital clusters in my output. Note I am including standard errors clustered at the hospital level.

3. To use this function correctly, would I use:

varnameN=total number of hospitals in the cohort over all time periods?

varnameN=total number of patients at each hospital over all time periods?

varnameN=total number of hospitals for each quarter?

varnameN=total number of patients at each hospital for each quarter?

(some of the results change substantially depending on which is used, mostly for the time series and hospital-level variables)

4. Would it be more appropriate to leave varnameN blank and interpret the final patient-level coefficients for a given hospital with a certain number of patients?

5. I have not tried a mixed effects approach yet, as it was not something my advisers were keen on, but would that be a useful approach given the multi-level nature of the data?

Thank you!

## Leave a comment: