Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using complex sampling structure with fmlogit command

    Hello,
    I am currently working, in Stata 14, with the user written command fmlogit, by Maarten Buis. I know that fweights and pweights are allowed, however I have stratified sample data, so I would like to include that structure. fmlogit allows for clustering structure, but I am unsure how to include strata structure. Unfortunately, I don't think svyset supports fmlogit.

    With this set up, does anyone know how to include the strata structure and replicate weights while estimating with fmlogit? Both of these would be very useful while working with share type data from the American Time Use Survey. I worry that my standard errors and test statistics will be inaccurate without.

    If you need more detail, I can try to explain further, however I read the FAQ section and tried to be as detailed as possible.
    Thank you for your help,
    Ben Scharadin
    Last edited by Ben Scharadin; 02 Jun 2016, 09:14.

  • #2
    Welcome to Statalist, Ben! I don't know much about the Survey, but pp 37 + in the User's Guide show that the fractions are ratio estimates. = (weighted numerator total)/(weighted denominator total). The ratios are easily calculated with svy: ratio, and their total over categories will be 1. For example, I've used this method with asset sampling to estimate the fractional contribution of different equipment categories to total equipment cost. Thus there is no need for fmlogit, which, as you fear, will give entirely wrong standard errors (much too small).

    However you can't do regression analysis with svy: ratio. I think I've seen posts about using svy: glm with a logit link and offset equal to the ratio denominator. I suggest that you search for something like "glm fractional analysis" or start a new topic about this.

    To use Stata's survey commands, you'll need to svyset your data with the weight and replicate weight information. If you tell us the type of replication and the names of the weight variables, we can help with the svyset statement, though you might be able figure it out for yourself from the Manual examples.

    http://www.bls.gov/tus/atususersguide.pdf
    Last edited by Steve Samuels; 02 Jun 2016, 10:48. Reason: added section on svy glm
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Actually you can analyze the fractions directly with svy: glm. The great advantage is that you need assume nothing about the distribution of the fractions, because the standard errors are functions of variation in survey replicates. For the non-survey setups, see:

      http://www.stata.com/statalist/archi.../msg00504.html

      http://www.statalist.org/forums/foru...interpretation

      http://www.ats.ucla.edu/stat/stata/faq/proportion.htm

      For each fraction (f_i for category i), the statement would apparently be:

      Code:
      svy: glm f_i  <predictors>, link(logit) family(binomial)
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Steve, thank you for the response and the welcome to Statalist.

        I didn't go into enough detail in my original post about my estimation process, so there was no way to know this, but I would like to estimate a fractional multinomial model. I am estimating the share of time spent on multiple activities. Since these shares are naturally correlated I want to allow the error terms to be correlated. Quickly reading the links you supplied it seems that glm is a single equation estimation. Estimating them separately wouldn't allow for the error correlation, which is why I originally opted for fmlogit.

        I have considered using the time allocations in level form rather than share form and estimating it through sureg, however svyset also doesn't support the sureg command. This also isn't a solution because for the second part of my research I need to estimate multiple food group expenditure shares, which also lends towards a fractional multinomial model.

        Ben

        Comment


        • #5
          Below is code for a survey mutinomial model with fractional data. It's based on a logistic regression setup from 2005 by Joseph Coveny, who showed how to reshape data and use the observed proportions as sampling weights http://www.stata.com/statalist/archi.../msg00504.html. I've adapted Josephs code to the survey setting. I've interspersed three results sections, which you'll have to remove to run the code.

          Code:
          clear
          set seed 801134
          set obs 200
          gen id = _n
          
          *****************************************
          * Generate sample design variables      *
          *****************************************
          gen psu = ceil(_n/10)
          gen sampwt = ceil(uniform()*10)
          
          ***********************************
          * Generate binary predictor       *
          ***********************************
          generate byte z = _n > _N / 2
          
          ******************************
          * Generate fraction         *
          * numerators                *
          *****************************
          gen m1 = rpoisson(4)
          gen m2 = rpoisson(3)
          gen m3 = rpoisson(2)
          gen mtot = m1 + m2 + m3
          
          *******************************
          * fractional data                *
          ******************************
          gen float frac1 = m1/mtot
          gen float frac2 = m2/mtot
          gen float frac3 = m3/mtot
          list id frac* in 1/2
           
               +-------------------------------------+
               | id      frac1      frac2      frac3 |
               |-------------------------------------|
            1. |  1   .5555556   .2222222   .2222222 |
            2. |  2   .6363636   .2727273   .0909091 |
               +-------------------------------------+
          
          ********************************
          * Reshape data to long format  *
          * ******************************
          quietly reshape long frac , i(id) j(category)
          list id  frac in 1/6, sepby(id)
          
               +---------------+
               | id       frac |
               |---------------|
            1. |  1   .5555556 |
            2. |  1   .2222222 |
            3. |  1   .2222222 |
               |---------------|
            4. |  2   .6363636 |
            5. |  2   .2727273 |
            6. |  2   .0909091 |
               +---------------+
          
          ***********************************
          * Final Analysis weight          *
          ***********************************
          gen fracwt =  frac*sampwt
          
          *******************************
          * Final survey analysis     *
          ******************************
          svyset psu [pw = fracwt]
          svy: mlogit category z
          
          Survey: Multinomial logistic regression
          
          Number of strata   =         1                  Number of obs     =        600
          Number of PSUs     =        20                  Population size   =      1,159
                                                          Design df         =         19
                                                          F(   2,     18)   =       0.96
                                                          Prob > F          =     0.4032
          
          ------------------------------------------------------------------------------
                       |             Linearized
              category |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          1            |  (base outcome)
          -------------+----------------------------------------------------------------
          2            |
                     z |  -.0026907   .1066719    -0.03   0.980    -.2259576    .2205761
                 _cons |  -.3112928   .0687509    -4.53   0.000      -.45519   -.1673955
          -------------+----------------------------------------------------------------
          3            |
                     z |   .1404985   .1267871     1.11   0.282    -.1248701     .405867
                 _cons |  -.7519432   .0820974    -9.16   0.000     -.923775   -.5801114
          ------------------------------------------------------------------------------
          Last edited by Steve Samuels; 03 Jun 2016, 10:43.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment


          • #6
            I apologize to Joseph for misspelling his last name: it is Coveney
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              I should have added the obvious: that the code above is to be used at your own risk. I've not compared it, for example, to fmlogit, which can be done by setting sampwt = 1 and psu = id.
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                My last post was incorrect, because I hadn't noticed that fmlogit takes pweights and has a cluster() option. Using the same pweights and clusters, the svy: mlogit approach and fmlogit give the same results, with one exception: fmlogit uses Normal Z and Chi square approximations for confidence intervals, p-values and tests; svy: mlogit uses t and F.
                Code:
                 . fmlogit frac1 frac2 frac3 [pw = sampwt], eta(z) cluster(psu)
                
                Iteration 0:   log pseudolikelihood = -1230.5735  
                Iteration 1:   log pseudolikelihood = -1230.0563  
                Iteration 2:   log pseudolikelihood = -1230.0561  
                Iteration 3:   log pseudolikelihood = -1230.0561  
                
                ML fit of fractional multinomial logit            Number of obs   =        200
                                                                  Wald chi2(2)    =       2.02
                Log pseudolikelihood = -1230.0561                 Prob > chi2     =     0.3646
                
                                                   (Std. Err. adjusted for 20 clusters in psu)
                ------------------------------------------------------------------------------
                             |               Robust
                             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                eta_frac2    |
                           z |  -.0026907   .1066719    -0.03   0.980    -.2117638    .2063823
                       _cons |  -.3112927   .0687509    -4.53   0.000     -.446042   -.1765434
                -------------+----------------------------------------------------------------
                eta_frac3    |
                           z |   .1404985   .1267871     1.11   0.268    -.1079997    .3889967
                       _cons |  -.7519432   .0820974    -9.16   0.000    -.9128511   -.5910352
                ------------------------------------------------------------------------------
                .
                .
                Last edited by Steve Samuels; 06 Jun 2016, 06:45. Reason: Added CI to list of differences
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment


                • #9
                  Thank you for all this help Steve. I really appreciate it.

                  Comment

                  Working...
                  X