All standard errors missing in output table for mlogit using survey data

Rouslan Karimov

Join Date: Apr 2023
Posts: 8

All standard errors missing in output table for mlogit using survey data

11 Apr 2023, 12:05

I am using Stata 16 on both Mac and Windows (the problem occurs across on both).

DATA
My dataset looks like this (these are fake data made to look like the proprietary data I'm working with):

Code:


	Code:
	* Example generated by -dataex-. For more info, type help dataex
clear
input byte(identity age empl sex marstat urban educ) long country double wgt int WP12258 double(WP12258A WP12259)
2    17    5    2    4    3    1    9    0.918483471    1    2017100001    7
1    21    6    2    2    2    2    9    1.063579888    7    2017100007    7
2    22    6    1    2    2    2    9    0.337893372    1    2017100001    10
1    23    6    2    2    2    2    9    1.567769236    18    2017100018    10
1    27    2    1    2    6    1    9    0.475945213    11    2017100011    16
1    28    6    2    2    2    3    9    1.056964223    19    2017100019    16
2    31    1    1    2    2    2    9    0.265830759    7    2017100007    24
2    31    1    2    1    6    1    9    1.072376993    5    2017100005    25
2    32    6    2    3    1    2    9    0.458895125    6    2017100006    33
2    35    6    2    2    3    1    9    1.478014368    1    2017100001    44
1    36    2    1    2    6    1    9    0.43331561    15    2017100015    45
1    37    6    2    1    2    2    9    1.008920356    8    2017100008    53
3    39    1    1    2    2    1    9    0.293730707    1    2017100001    61
1    41    2    1    2    2    2    9    1.298432642    10    2017100010    61
1    41    1    1    2    1    2    9    1.511035436    9    2017100009    80
2    45    2    2    2    1    2    9    0.398746139    1    2017100001    81
1    60    1    1    2    6    1    9    0.553875541    14    2017100014    87
1    60    1    1    1    2    2    9    0.280035342    9    2017100009    91
2    61    2    1    2    2    2    9    2.464917433    5    2017100005    92
1    67    6    1    1    6    1    9    1.056964223    19    2017100019    96


end
label values identity WP22091
label def WP22091 1 "Being a part of the city or area where you live", modify
label def WP22091 2 "Being a part of this country", modify
label def WP22091 3 "Being a part of the world", modify
label values age WP1220
label values empl EMP_2010
label def EMP_2010 1 "Employed full time for an employer", modify
label def EMP_2010 2 "Employed full time for self", modify
label def EMP_2010 5 "Employed part time want full time", modify
label def EMP_2010 6 "Out of workforce", modify
label values sex WP1219
label def WP1219 1 "Male", modify
label def WP1219 2 "Female", modify
label values marstat WP1223
label def WP1223 1 "Single/Never been married", modify
label def WP1223 2 "Married", modify
label def WP1223 3 "Separated", modify
label def WP1223 4 "Divorced", modify
label values urban WP14
label def WP14 1 "A rural area or on a farm", modify
label def WP14 2 "A small town or village", modify
label def WP14 3 "A large city", modify
label def WP14 6 "A suburb of a large city", modify
label values educ WP3117
label def WP3117 1 "Completed elementary education or less (up to 8 years of basic education)", modify
label def WP3117 2 "Secondary - 3 year TertiarySecondary education and some education beyond secondary education (9-15 years of educatio", modify
label def WP3117 3 "Completed four years of education beyond high school and/or received a 4-year college degree.", modify
label values country country
label def country 9 "Andorra", modify

The variables of interest are:
- Identity: the place that respondents associate themselves with (e.g., city, country, world)
- Age: in years
- Empl: employment status, categorical
- Sex: male/female
- Marstat: marital status, categorical
- Educ: educational status, categorical
- Country: country name

These are the variables I used to declare my survey design:

Code:

. codebook wgt WP12258 WP12258A WP12259

-----------------------------------------------------------------------------
wgt                                                                    Weight
-----------------------------------------------------------------------------

                  type:  numeric (double)

                 range:  [.16155396,5.6427716]        units:  1.000e-11
         unique values:  32,654                   missing .:  0/58,146

                  mean:         1
              std. dev:   .716748

           percentiles:        10%       25%       50%       75%       90%
                           .299751    .48121    .81263   1.30616     1.967

-----------------------------------------------------------------------------
WP12258                                               Sampling Stratification
-----------------------------------------------------------------------------

                  type:  numeric (int)

                 range:  [1,9902]                     units:  1
         unique values:  198                      missing .:  0/58,146

                  mean:   400.069
              std. dev:   1175.67

           percentiles:        10%       25%       50%       75%       90%
                                 3         6        21       131       901

-----------------------------------------------------------------------------
WP12258A                                            Sampling Stratification 2
-----------------------------------------------------------------------------

                  type:  numeric (double)

                 range:  [1.017e+09,1.970e+11]        units:  10
         unique values:  852                      missing .:  0/58,146

                  mean:   6.0e+10
              std. dev:   5.0e+10

           percentiles:        10%       25%       50%       75%       90%
                           8.0e+09   2.4e+10   4.8e+10   7.9e+10   1.5e+11

-----------------------------------------------------------------------------
WP12259                                                Sampling Stage 1 (PSU)
-----------------------------------------------------------------------------

                  type:  numeric (double)

                 range:  [1,1.721e+13]                units:  1
         unique values:  10,023                   missing .:  0/58,146

                  mean:   2.7e+12
              std. dev:   6.2e+12

           percentiles:        10%       25%       50%       75%       90%
                                12        29        62        96   1.7e+13

Note: WP12258 has strata IDs unique to each country and WP12258A has IDs for the same strata that are unique globally.

The survey design then is:

Code:

svyset [pweight = wgt], strata(WP12258A) psu(WP12259)

PROBLEM
I run the following model:

Code:

svy: mlogit identity age i.empl i.sex i.marstat i.urban i.educ i.country

(The country dummies are supposed to be country fixed effects.)

The output this produces includes the coefficients but no standard errors for any of the independent variables. I know sometimes this can be because the model fits the data perfectly, as in this thread https://www.statalist.org/forums/for...-in-regression. I doubt that's the case here. I know that sometimes we see missing standard errors if the variance matrix is nonsymmetric or highly singular, as in here: https://www.stata.com/statalist/arch.../msg00980.html. But I do not get an error message about the variance matrix and trying, just in case, to locate sparse indicators by dropping each one in turn and re-running the model does not fix the problem.

I noticed that if I omit declaring strata or psu in svyset, then I do get standard errors in output. I know it's not a solution but perhaps it will help locate the problem.

Last edited by Rouslan Karimov; 11 Apr 2023, 12:35.

Tags: None

Andrew Musau

Join Date: Oct 2014

Posts: 10188
#2

11 Apr 2023, 12:35

You are probably fitting too many parameters relative to the number of observations. You can verify this by excluding some indicators. If you want to estimate a fixed effects multinomial logit model, see

Code:

help xtmlogit

introduced in Stata 17.
Comment
Rouslan Karimov

Join Date: Apr 2023

Posts: 8
#3

11 Apr 2023, 12:53

Thanks so much, Andrew. There are 58,146 observations in this dataset and I'm using, by my count, 68 indicators (if we count each category from i.var as an indicator, minus 1 for baseline). I don't have access to Stata 17, unfortunately, but isn't what I'm doing here essentially equivalent to what xtmlogit would do? Thanks again.

Last edited by Rouslan Karimov; 11 Apr 2023, 13:32.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10188
#4

11 Apr 2023, 14:25

In survey analysis with stratification, the degrees of freedom can differ significantly from the actual number of observations in the dataset. See https://notstatschat.rbind.io/2019/0...om-brief-note/.

but isn't what I'm doing here essentially equivalent to what xtmlogit would do

No, xtmlogit FE estimator is a conditional maximum likelihood estimator (so the fixed effects are conditioned out of the likelihood and not explicitly estimated).
Comment
Rouslan Karimov

Join Date: Apr 2023

Posts: 8
#5

11 Apr 2023, 17:19

Thanks again, Andrew. My design degrees of freedom are 13,433. I ran a bunch of univariate regressions for each indicator in turn but the problem persisted. I also ran the model without predictors; just the DV. Same problem. What do you think?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30075
#6

11 Apr 2023, 17:38

When I run your code using the example data, I do replicate your problem. However, underneath the output, it also says "Note: Missing standard errors because of stratum with single sampling unit." Do you receive that same message? If so, the solution is to identify all strata with a single sampling unit and then merge those strata with other strata that are, with respect to things that matter for your problem, as similar as possible.

If you do not receive any notes or warnings from Stata, then the problem is more obscure.

Added: Had you, in #1, followed the guidance in the FAQ to show the exact complete output of commands that need troubleshooting, you probably would have had your problem resolved in a matter of minutes rather than hours.

Last edited by Clyde Schechter; 11 Apr 2023, 17:41.
Comment
Rouslan Karimov

Join Date: Apr 2023

Posts: 8
#7

11 Apr 2023, 18:42

Thanks so much, Clyde. This is my first posting; I tried to follow the FAQ but obviously missed a very important part. I do get the same message about single sampling units, so now I know what the problem is. Thanks again to you and Andrew for your time and guidance.
Comment

Announcement

All standard errors missing in output table for mlogit using survey data

Comment

Comment

Comment

Comment

Comment

Comment