How to interpret "Prob > F = ."?

Janet Dolot

Join Date: Jun 2014

Posts: 3
#1

How to interpret "Prob > F = ."?

19 Jun 2014, 20:07

Hello,

I am a novice Stata user. I am performing regression analyses within the survey function. My output for one of the equations includes "Prob F > ." with an R-squared = 0.1608 and P>|t| values listed for each variable. I do not know how to interpret "Prob F > ." or why that might be appearing. I would appreciate any insight that can be offered. Thank you!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

19 Jun 2014, 21:43

You need to show us more. Give us the exact command you typed and show us the full regression output that Stata gave you.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#3

19 Jun 2014, 21:57

If you google

F value missing stata

you get lots of hits, but also lots of ideas as to what the problem may be. The phrase "singleton dummy" shows up a few times, as does references to multicollinearity. Maybe others can explain the singleton dummy problem if indeed that is what is plaguing you. Like Clyde says, you'll have more of a fighting chance of getting help if you show your output. Or maybe you can figure it out yourself if you read some of the links that Google gives you.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
daniel klein

Join Date: Mar 2014

Posts: 3860
#4

20 Jun 2014, 02:26

The missing p-value for the F-statistic should be displayed in blue, right? That means you can click on it, and Stata will refer you to an explanation. For example this happens when you have clustered standard errors and only few clusters. Similar issues arise with survey data, but this is all documented.

Best
Daniel
Comment
Janet Dolot

Join Date: Jun 2014

Posts: 3
#5

21 Jun 2014, 13:51

Thank you, for your responses. Clicking on the blue text was not very helpful...it stated that "your estimation results show an F or chi2 model statistic reported to be missing. Stata has done that so as to not be misleading, not because there is something necessarily wrong with your model." See below for my command and output. Thank you for any guidance you can provide!
Comment
Janet Dolot

Join Date: Jun 2014

Posts: 3
#6

21 Jun 2014, 13:54

The commands and output seem not to be displayed in my browser. Here they are again:

svy linearized, subpop(NSLBPyesPT) : reg PTVISITSPEREPISODE PCS MCS i. keeper MEANPTCOPAYPERVISIT2ln i.sex i.marry i.AG
> EINS i.RACEETH i.EDUCATION i.FAMINCOME i.msa i.region i.CONDSEV if !missing(PTVISITSPEREPISODE, PCS, MCS, keeper, MEANP
> TCOPAYPERVISIT2ln, sex, marry, AGEINS, RACEETH, EDUCATION, FAMINCOME, msa, region, CONDSEV)
(running regress on estimation sample)

Survey: Linear regression

Number of strata = 117 Number of obs = 992
Number of PSUs = 992 Population size = 13668703
Subpop. no. of obs = 262
Subpop. size = 3317647.4
Design df = 875
F( 19, 857) = .
Prob > F = .
R-squared = 0.2703

---------------------------------------------------------------------------------------
| Linearized
PTVISITSPEREPISODE | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------------+----------------------------------------------------------------
PCS | -.3985977 .120923 -3.30 0.001 -.6359307 -.1612646
MCS | -.0659048 .1260552 -0.52 0.601 -.3133108 .1815011
1.keeper | .6395931 3.736371 0.17 0.864 -6.693703 7.972889
MEANPTCOPAYPERVISIT~n | -3.665622 1.246143 -2.94 0.003 -6.1114 -1.219844
|
sex |
female | 3.717134 1.824751 2.04 0.042 .1357334 7.298535
|
marry |
otherwise | 6.436666 2.997961 2.15 0.032 .552632 12.3207
|
AGEINS |
18-64 yo with publ.. | -2.35874 3.464055 -0.68 0.496 -9.157568 4.440087
>= 65 yo | 8.23284 7.013279 1.17 0.241 -5.531974 21.99765
|
RACEETH |
black non-hispanic | -12.32245 3.905989 -3.15 0.002 -19.98865 -4.656245
white | -5.71143 2.97011 -1.92 0.055 -11.5408 .1179419
other non-hispanic | -3.783289 3.460577 -1.09 0.275 -10.57529 3.008713
|
EDUCATION |
12 years | 8.458131 7.329881 1.15 0.249 -5.928071 22.84433
>12 years | 9.40206 8.387894 1.12 0.263 -7.060682 25.8648
|
FAMINCOME |
middle | .275204 3.437917 0.08 0.936 -6.472324 7.022732
high | -2.189801 2.351007 -0.93 0.352 -6.804072 2.424471
|
msa |
rural | 1.713613 4.394905 0.39 0.697 -6.912174 10.3394
|
region |
midwest | -3.955822 5.477475 -0.72 0.470 -14.70635 6.794702
south | -3.806898 5.064443 -0.75 0.452 -13.74677 6.132978
west | -5.631807 4.28008 -1.32 0.189 -14.03223 2.768616
|
CONDSEV |
severe | -15.33442 7.930896 -1.93 0.053 -30.90022 .2313849
_cons | 39.5405 12.15188 3.25 0.001 15.69026 63.39075
---------------------------------------------------------------------------------------
Note: 46 strata omitted because they contain no subpopulation members.
Note: Strata with single sampling unit centered at overall mean.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#7

21 Jun 2014, 15:38

Note: Strata with single sampling unit centered at overall mean.

The above is the source of your problem. In these circumstances, it is impossible to estimate variance within that stratum, and you will get no F statistic.

See http://www.stata.com/support/faqs/st...-with-one-psu/ for some possible ways to solve this problem.

Note, by the way, that you may not have any singleton strata in your overall sample, but you are estimating on a subpopulation, and within that subpopulation you will find a stratum with only one psu, that is the offending configuration.

Last edited by Clyde Schechter; 21 Jun 2014, 15:40. Reason: The link did not appear properly in the original. Not sure why.
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2174
#8

21 Jun 2014, 17:26

Clyde, I'm not sure that's the problem. My understanding is that if PSU is not specified then each observation is its own PSU, and Stata understands that means there were no primary sampling units. I've done examples where it sets the PSU equal to the number of observations but still reports on F p-value. I'm wondering if maybe some of the 117 strata have only one observation.

Frankly, I'm not sure what Stata is computing as F, anyway. It can't be either the usual sum of squared residuals or even a weighted version; after all, the weighting here is not to solve heteroskedasticity. Plus, that would not take advantage of the stratification. So I assume this is really the Wald test using the proper asymptotic variance matrix that is valid with p-weights and stratification.

I'm hardly an expert on svyset, though.

Jeff
1 like
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30117

22 Jun 2014, 09:50

Well, you may well be correct on a theoretical basis. But Stata does not seem to distinguish the situation where no PSU is specified:

Code:

 . sysuse auto, clear
(1978 Automobile Data)
  . gen brand = word(make, 1)
  . svyset, strata(brand)
        pweight: <none>
          VCE: linearized
  Single unit: missing
     Strata 1: brand
         SU 1: <observations>
        FPC 1: <zero>
  . svy: regress mpg headroom weight
(running regress on estimation sample)
  Survey: Linear regression
  Number of strata   =        23                  Number of obs      =        74
Number of PSUs     =        74                  Population size    =        74
                                                Design df          =        51
                                                F(   0,     51)    =         .
                                                Prob > F           =         .
                                                R-squared          =    0.6523
  ------------------------------------------------------------------------------
             |             Linearized
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    headroom |  -.2103507          .        .       .            .           .
      weight |   -.005898          .        .       .            .           .
       _cons |   39.73567          .        .       .            .           .
------------------------------------------------------------------------------
Note: Missing standard errors because of stratum with single sampling unit.

In any case, whether the sampling units were specified, or simply defaulted to the observation level, the problem Janet Dolot is experiencing is, I believe, that one of her strata contains only a single observation or specified psu, at least when restricted to the estimation sample.

Last edited by Clyde Schechter; 22 Jun 2014, 10:03. Reason: Hit Save by accident when aiming to hit Preview

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2174
#10

22 Jun 2014, 14:21

Clive: I'm a bit confused. The problem with the example you generated is that some of the strata (7, to be exact) have only one observation. In my comment I hypothesized that this was the problem Janet was encountering and your example provides support for that. In fact, when I drop the 7 strata the problem goes away. The point I was making was that having all of the PSUs with size one is not the problem. That becomes a problem if some PSUs have more than one unit and some don't. But that isn't the case for Janet's problem. Therefore, I still think she has a least one stratum with only a single observations. Maybe Janet could weigh in.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#11

22 Jun 2014, 18:08

Jeff, I think we're saying the same thing. Janet's result indicates that she has both 992 observations and 992 PSU's, so each observation is its own PSU. Whether this arose because she specified the observation to be the PSU or left PSU unspecified doesn't really affect anything. And what I was trying to say, if perhaps not clearly enough, is that when one or more strata has only a single PSU or only a single observation, then the variance within that stratum and overall cannot be estimated. The problem is not that each PSU is a singleton, nor that every stratum is a singleton. The problem is the existence of at least one singleton stratum--which is the same as what you said.
Comment
Javed Mohmand

Join Date: Oct 2018

Posts: 5
#12

06 Oct 2018, 11:29

How I can interpret this results in the context of regression model? Also what does the results below indicates about the regression model is fit or not in my data context...thanks a lot

Source | SS df MS Number of obs = 52
-------------+------------------------------ F( 14, 37) = 2.12
Model | 9.75690518 14 .696921799 Prob > F = 0.0342
Residual | 12.1661717 37 .328815452 R-squared = 0.4451
-------------+------------------------------ Adj R-squared = 0.2351
Total | 21.9230769 51 .429864253 Root MSE = .57342

---------------------------------------------------------------------------------------------
Productivity | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
LOC | .0746918 .0461836 1.62 0.114 -.018885 .1682686
FP | -.1915477 .1016172 -1.88 0.067 -.3974438 .0143484
selforg | .0960156 .1738346 0.55 0.584 -.2562067 .4482379
agilemeth | -.0120987 .012228 -0.99 0.329 -.0368749 .0126775
methused | .0240235 .0958956 0.25 0.804 -.1702795 .2183266
largestproject | -.0835862 .1029788 -0.81 0.422 -.2922411 .1250688
kindofproject | -.1777672 .1987665 -0.89 0.377 -.5805064 .224972
reqclarity | -.0869767 .1070006 -0.81 0.421 -.3037805 .129827
agilecopechangingreq | .127376 .0764227 1.67 0.104 -.0274712 .2822232
agileratherothermethodology | .0855318 .0735312 1.16 0.252 -.0634564 .2345201
competency | -.0812772 .0970277 -0.84 0.408 -.2778741 .1153196
documentationusedinagile | -.0190196 .1103519 -0.17 0.864 -.2426138 .2045746
agileeffectivness | .2465873 .0858929 2.87 0.007 .0725518 .4206228
AGILEtool | -.1178061 .0430454 -2.74 0.009 -.2050245 -.0305878
_cons | 3.78272 .9562477 3.96 0.000 1.845179 5.720262
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#13

06 Oct 2018, 12:02

Javed:
welcome to this forum.
Please, start a new thread and use CODE delimiters to share what you typed and what Stata gave you back (see the FAQ on this and other posting-related topics). Thanks.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Javed Mohmand

Join Date: Oct 2018

Posts: 5
#14

07 Oct 2018, 04:21

Hi Carlo........
I have typed the following code:

regress Productivity LOC FP selforg agilemeth methused largestproject kindofproject reqclarity agilecopechangingreq agileratherothermethodology competency documentationusedinagile agileeffectivness AGILEtoolThe Result given by Stata:

Source | SS df MS Number of obs = 52
-------------+------------------------------ F( 14, 37) = 2.12
Model | 9.75690518 14 .696921799 Prob > F = 0.0342
Residual | 12.1661717 37 .328815452 R-squared = 0.4451
-------------+------------------------------ Adj R-squared = 0.2351
Total | 21.9230769 51 .429864253 Root MSE = .57342

---------------------------------------------------------------------------------------------
Productivity | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
LOC | .0746918 .0461836 1.62 0.114 -.018885 .1682686
FP | -.1915477 .1016172 -1.88 0.067 -.3974438 .0143484
selforg | .0960156 .1738346 0.55 0.584 -.2562067 .4482379
agilemeth | -.0120987 .012228 -0.99 0.329 -.0368749 .0126775
methused | .0240235 .0958956 0.25 0.804 -.1702795 .2183266
largestproject | -.0835862 .1029788 -0.81 0.422 -.2922411 .1250688
kindofproject | -.1777672 .1987665 -0.89 0.377 -.5805064 .224972
reqclarity | -.0869767 .1070006 -0.81 0.421 -.3037805 .129827
agilecopechangingreq | .127376 .0764227 1.67 0.104 -.0274712 .2822232
agileratherothermethodology | .0855318 .0735312 1.16 0.252 -.0634564 .2345201
competency | -.0812772 .0970277 -0.84 0.408 -.2778741 .1153196
documentationusedinagile | -.0190196 .1103519 -0.17 0.864 -.2426138 .2045746
agileeffectivness | .2465873 .0858929 2.87 0.007 .0725518 .4206228
AGILEtool | -.1178061 .0430454 -2.74 0.009 -.2050245 -.0305878
_cons | 3.78272 .9562477 3.96 0.000 1.845179 5.720262
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#15

07 Oct 2018, 05:35

Javed:
the advice to start a new thread with an informative title and use CODE delimiters (as recommended in my previous reply) still holds.
My first take about your (confused) outcome is that you have far too many predictors for such a scant number of observations (52) (see https://projecteuclid.org/euclid.aos/1176346793).

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement