Using complex sampling structure with fmlogit command

Ben Scharadin

Join Date: Jun 2016

Posts: 3
#1

Using complex sampling structure with fmlogit command

02 Jun 2016, 09:06

Hello,
I am currently working, in Stata 14, with the user written command fmlogit, by Maarten Buis. I know that fweights and pweights are allowed, however I have stratified sample data, so I would like to include that structure. fmlogit allows for clustering structure, but I am unsure how to include strata structure. Unfortunately, I don't think svyset supports fmlogit.

With this set up, does anyone know how to include the strata structure and replicate weights while estimating with fmlogit? Both of these would be very useful while working with share type data from the American Time Use Survey. I worry that my standard errors and test statistics will be inaccurate without.

If you need more detail, I can try to explain further, however I read the FAQ section and tried to be as detailed as possible.
Thank you for your help,
Ben Scharadin

Last edited by Ben Scharadin; 02 Jun 2016, 09:14.
Tags: fmlogit, weights
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

02 Jun 2016, 10:04

Welcome to Statalist, Ben! I don't know much about the Survey, but pp 37 + in the User's Guide show that the fractions are ratio estimates. = (weighted numerator total)/(weighted denominator total). The ratios are easily calculated with svy: ratio, and their total over categories will be 1. For example, I've used this method with asset sampling to estimate the fractional contribution of different equipment categories to total equipment cost. Thus there is no need for fmlogit, which, as you fear, will give entirely wrong standard errors (much too small).

However you can't do regression analysis with svy: ratio. I think I've seen posts about using svy: glm with a logit link and offset equal to the ratio denominator. I suggest that you search for something like "glm fractional analysis" or start a new topic about this.

To use Stata's survey commands, you'll need to svyset your data with the weight and replicate weight information. If you tell us the type of replication and the names of the weight variables, we can help with the svyset statement, though you might be able figure it out for yourself from the Manual examples.

http://www.bls.gov/tus/atususersguide.pdf

Last edited by Steve Samuels; 02 Jun 2016, 10:48. Reason: added section on svy glm

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#3

02 Jun 2016, 11:17

Actually you can analyze the fractions directly with svy: glm. The great advantage is that you need assume nothing about the distribution of the fractions, because the standard errors are functions of variation in survey replicates. For the non-survey setups, see:

http://www.stata.com/statalist/archi.../msg00504.html

http://www.statalist.org/forums/foru...interpretation

http://www.ats.ucla.edu/stat/stata/faq/proportion.htm

For each fraction (f_i for category i), the statement would apparently be:

Code:

svy: glm f_i <predictors>, link(logit) family(binomial)

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Ben Scharadin

Join Date: Jun 2016

Posts: 3
#4

03 Jun 2016, 08:39

Steve, thank you for the response and the welcome to Statalist.

I didn't go into enough detail in my original post about my estimation process, so there was no way to know this, but I would like to estimate a fractional multinomial model. I am estimating the share of time spent on multiple activities. Since these shares are naturally correlated I want to allow the error terms to be correlated. Quickly reading the links you supplied it seems that glm is a single equation estimation. Estimating them separately wouldn't allow for the error correlation, which is why I originally opted for fmlogit.

I have considered using the time allocations in level form rather than share form and estimating it through sureg, however svyset also doesn't support the sureg command. This also isn't a solution because for the second part of my research I need to estimate multiple food group expenditure shares, which also lends towards a fractional multinomial model.

Ben
Comment

Steve Samuels

Join Date: Mar 2014
Posts: 1786

03 Jun 2016, 09:46

Below is code for a survey mutinomial model with fractional data. It's based on a logistic regression setup from 2005 by Joseph Coveny, who showed how to reshape data and use the observed proportions as sampling weights http://www.stata.com/statalist/archi.../msg00504.html. I've adapted Josephs code to the survey setting. I've interspersed three results sections, which you'll have to remove to run the code.

Code:

clear
set seed 801134
set obs 200
gen id = _n

*****************************************
* Generate sample design variables      *
*****************************************
gen psu = ceil(_n/10)
gen sampwt = ceil(uniform()*10)

***********************************
* Generate binary predictor       *
***********************************
generate byte z = _n > _N / 2

******************************
* Generate fraction         *
* numerators                *
*****************************
gen m1 = rpoisson(4)
gen m2 = rpoisson(3)
gen m3 = rpoisson(2)
gen mtot = m1 + m2 + m3

*******************************
* fractional data                *
******************************
gen float frac1 = m1/mtot
gen float frac2 = m2/mtot
gen float frac3 = m3/mtot
list id frac* in 1/2
 
     +-------------------------------------+
     | id      frac1      frac2      frac3 |
     |-------------------------------------|
  1. |  1   .5555556   .2222222   .2222222 |
  2. |  2   .6363636   .2727273   .0909091 |
     +-------------------------------------+

********************************
* Reshape data to long format  *
* ******************************
quietly reshape long frac , i(id) j(category)
list id  frac in 1/6, sepby(id)

     +---------------+
     | id       frac |
     |---------------|
  1. |  1   .5555556 |
  2. |  1   .2222222 |
  3. |  1   .2222222 |
     |---------------|
  4. |  2   .6363636 |
  5. |  2   .2727273 |
  6. |  2   .0909091 |
     +---------------+

***********************************
* Final Analysis weight          *
***********************************
gen fracwt =  frac*sampwt

*******************************
* Final survey analysis     *
******************************
svyset psu [pw = fracwt]
svy: mlogit category z

Survey: Multinomial logistic regression

Number of strata   =         1                  Number of obs     =        600
Number of PSUs     =        20                  Population size   =      1,159
                                                Design df         =         19
                                                F(   2,     18)   =       0.96
                                                Prob > F          =     0.4032

------------------------------------------------------------------------------
             |             Linearized
    category |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1            |  (base outcome)
-------------+----------------------------------------------------------------
2            |
           z |  -.0026907   .1066719    -0.03   0.980    -.2259576    .2205761
       _cons |  -.3112928   .0687509    -4.53   0.000      -.45519   -.1673955
-------------+----------------------------------------------------------------
3            |
           z |   .1404985   .1267871     1.11   0.282    -.1248701     .405867
       _cons |  -.7519432   .0820974    -9.16   0.000     -.923775   -.5801114
------------------------------------------------------------------------------

Last edited by Steve Samuels; 03 Jun 2016, 10:43.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2

Comment

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#6

03 Jun 2016, 11:52

I apologize to Joseph for misspelling his last name: it is Coveney

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#7

05 Jun 2016, 12:31

I should have added the obvious: that the code above is to be used at your own risk. I've not compared it, for example, to fmlogit, which can be done by setting sampwt = 1 and psu = id.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Steve Samuels

Join Date: Mar 2014
Posts: 1786

06 Jun 2016, 06:24

My last post was incorrect, because I hadn't noticed that fmlogit takes pweights and has a cluster() option. Using the same pweights and clusters, the svy: mlogit approach and fmlogit give the same results, with one exception: fmlogit uses Normal Z and Chi square approximations for confidence intervals, p-values and tests; svy: mlogit uses t and F.

Code:

 . fmlogit frac1 frac2 frac3 [pw = sampwt], eta(z) cluster(psu)

Iteration 0:   log pseudolikelihood = -1230.5735  
Iteration 1:   log pseudolikelihood = -1230.0563  
Iteration 2:   log pseudolikelihood = -1230.0561  
Iteration 3:   log pseudolikelihood = -1230.0561  

ML fit of fractional multinomial logit            Number of obs   =        200
                                                  Wald chi2(2)    =       2.02
Log pseudolikelihood = -1230.0561                 Prob > chi2     =     0.3646

                                   (Std. Err. adjusted for 20 clusters in psu)
------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
eta_frac2    |
           z |  -.0026907   .1066719    -0.03   0.980    -.2117638    .2063823
       _cons |  -.3112927   .0687509    -4.53   0.000     -.446042   -.1765434
-------------+----------------------------------------------------------------
eta_frac3    |
           z |   .1404985   .1267871     1.11   0.268    -.1079997    .3889967
       _cons |  -.7519432   .0820974    -9.16   0.000    -.9128511   -.5910352
------------------------------------------------------------------------------

.
.

Last edited by Steve Samuels; 06 Jun 2016, 06:45. Reason: Added CI to list of differences

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2

Comment

Ben Scharadin

Join Date: Jun 2016

Posts: 3
#9

06 Jun 2016, 07:35

Thank you for all this help Steve. I really appreciate it.
Comment

Announcement