Reporting coefficients with four decimal places

John Adler

Join Date: Apr 2017
Posts: 173

Reporting coefficients with four decimal places

26 Jun 2018, 12:09

I have what I think is quite a simple problem. I want to report the coefficient from a regression to 3 decimal places, however this would be -0.000 and I don't know if I should report this result or increase the number of decimal places I am using.

I have panel data on mothers across waves and in my analysis I run several regressions for several outcome variables. For the most part this appears as follows:

Code:

. * LPM: Linear Probability Model

. xtreg bin_moderate_ex_y unemployed_y i.own_education_y i.maritalstatus_y i.medical_card_y i.employment_y i.ord_age_y if has_y0_questionnaire==1 &  has_y5_questionnaire==1 | has_y0_questionnaire==1 & has_y10_questionnaire==1 | has_y0_questionnaire==1 & has_y5_questionnaire==1 & has_y10_questionnaire==1, cluster (current_county_y1) re robust

Random-effects GLS regression                   Number of obs     =        907
Group variable: id                              Number of groups  =        549

R-sq:                                           Obs per group:
     within  = 0.0059                                         min =          1
     between = 0.0455                                         avg =        1.7
     overall = 0.0338                                         max =          2

                                                Wald chi2(21)     =          .
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =          .

                                                                    (Std. Err. adjusted for 28 clusters in current_county_y1)
-----------------------------------------------------------------------------------------------------------------------------
                                                            |               Robust
                                          bin_moderate_ex_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------------------------------------+----------------------------------------------------------------
                               unemployed_y |   .0013806   .0106166     0.13   0.897    -.0194276    .0221888
                                                            |
                                            own_education_y |
                                     Some secondary school  |   .5404594   .0552997     9.77   0.000     .4320739    .6488449
                              Complete secondary education  |   .5676648   .0719571     7.89   0.000     .4266314    .7086981
    Some third level education at college, university, RTC  |   .4815662   .0918043     5.25   0.000     .3016332    .6614993
Complete third level education at college, university, RTC  |   .6093938   .0787441     7.74   0.000     .4550582    .7637295
                                                            |
                                            maritalstatus_y |
                                                Cohabiting  |  -.0327028   .0594565    -0.55   0.582    -.1492354    .0838298
                                                 Separated  |   .2635846      .0243    10.85   0.000     .2159575    .3112117
                                                  Divorced  |   -.273516   .2795078    -0.98   0.328    -.8213412    .2743092
                                                   Widowed  |   .2973203   .1757149     1.69   0.091    -.0470746    .6417152
                                      Single/Never married  |   .0112464   .0702769     0.16   0.873    -.1264938    .1489866
                                                            |
                                             medical_card_y |
                                                       Yes  |  -.0808472   .0429042    -1.88   0.060    -.1649379    .0032435
                                                            |
                                               employment_y |
                                                Unemployed  |   .0044467   .0967135     0.05   0.963    -.1851082    .1940017
  Unable to work owing to permanent sickness or disability  |  -.1092915   .1556065    -0.70   0.482    -.4142747    .1956917
                                         At school/student  |  -.0883897   .1619609    -0.55   0.585    -.4058272    .2290479
                           Seeking work for the first time  |  -.3878985   .1322674    -2.93   0.003    -.6471378   -.1286593
                                                  Employed  |  -.0172194   .0427601    -0.40   0.687    -.1010277    .0665889
                                             Self Employed  |   .0026582   .0751358     0.04   0.972    -.1446053    .1499217
                             Wholly retired from paid work  |  -.8036656   .0461205   -17.43   0.000    -.8940601    -.713271
                                                            |
                                                  ord_age_y |
                                                     20-23  |  -.0874091   .1138813    -0.77   0.443    -.3106123    .1357941
                                                     24-27  |  -.0415483   .1230809    -0.34   0.736    -.2827825    .1996859
                                                     28-32  |  -.0277465   .1284665    -0.22   0.829    -.2795362    .2240433
                                                      33 +  |  -.0214883   .1288702    -0.17   0.868    -.2740693    .2310927
                                                            |
                                                      _cons |   .1077595   .1387142     0.78   0.437    -.1641153    .3796343
------------------------------------------------------------+----------------------------------------------------------------
                                                    sigma_u |  .26838849
                                                    sigma_e |  .40199987
                                                        rho |  .30830992   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------------------------------------------

My interest in this analysis is the relationship between unemployment and health, so I report the coefficient on unemployed_y (.0013806) to 3 decimal places as 0.001.

This approach has been fine in all except one regression which is as follows:

Code:

. * LPM: Linear Probability model
. 
. xtreg bin_strenous_ex_y unemployed_y i.own_education_y i.maritalstatus_y i.medical_card_y i.employment_y i.ord
> _age_y if has_y0_questionnaire==1 &  has_y5_questionnaire==1 | has_y0_questionnaire==1 & has_y10_questionnaire==1 | has_y0_que
> stionnaire==1 & has_y5_questionnaire==1 & has_y10_questionnaire==1, cluster (current_county_y1) re robust

Random-effects GLS regression                   Number of obs     =        845
Group variable: id                              Number of groups  =        534

R-sq:                                           Obs per group:
     within  = 0.0128                                         min =          1
     between = 0.0431                                         avg =        1.6
     overall = 0.0376                                         max =          2

                                                Wald chi2(21)     =          .
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =          .

                                                                    (Std. Err. adjusted for 28 clusters in current_county_y1)
-----------------------------------------------------------------------------------------------------------------------------
                                                            |               Robust
                                          bin_strenous_ex_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------------------------------------+----------------------------------------------------------------
                               unemployed_y |  -.0004498    .007705    -0.06   0.953    -.0155512    .0146517
                                                            |
                                            own_education_y |
                                     Some secondary school  |    .202266   .0499905     4.05   0.000     .1042864    .3002456
                              Complete secondary education  |   .2243809   .0565841     3.97   0.000     .1134782    .3352836
    Some third level education at college, university, RTC  |   .2616223   .0763402     3.43   0.001     .1119982    .4112464
Complete third level education at college, university, RTC  |   .3276807   .0533222     6.15   0.000     .2231711    .4321902
                                                            |
                                            maritalstatus_y |
                                                Cohabiting  |  -.0058312   .0691985    -0.08   0.933    -.1414578    .1297954
                                                 Separated  |   .1906868   .1619257     1.18   0.239    -.1266817    .5080553
                                                  Divorced  |  -.1970662   .0509941    -3.86   0.000    -.2970129   -.0971196
                                                   Widowed  |   -.089271   .1622361    -0.55   0.582     -.407248    .2287059
                                      Single/Never married  |   .0790173   .0845919     0.93   0.350    -.0867797    .2448143
                                                            |
                                             medical_card_y |
                                                       Yes  |    .006966   .0395142     0.18   0.860    -.0704805    .0844124
                                                            |
                                               employment_y |
                                                Unemployed  |  -.0128092   .0750195    -0.17   0.864    -.1598446    .1342262
  Unable to work owing to permanent sickness or disability  |  -.3152528   .0300622   -10.49   0.000    -.3741736   -.2563319
                                         At school/student  |  -.1386874    .106164    -1.31   0.191     -.346765    .0693903
                           Seeking work for the first time  |  -.0341262   .0552043    -0.62   0.536    -.1423246    .0740721
                                                  Employed  |  -.0501776    .032376    -1.55   0.121    -.1136333    .0132781
                                             Self Employed  |    .115105   .0463495     2.48   0.013     .0242618    .2059483
                             Wholly retired from paid work  |  -.2090443   .0397977    -5.25   0.000    -.2870463   -.1310422
                                                            |
                                                  ord_age_y |
                                                     20-23  |  -.1228133   .2341136    -0.52   0.600    -.5816675     .336041
                                                     24-27  |  -.1311621   .1730877    -0.76   0.449    -.4704078    .2080835
                                                     28-32  |  -.0779729   .1748741    -0.45   0.656    -.4207198     .264774
                                                      33 +  |  -.1150126   .1751525    -0.66   0.511    -.4583052    .2282801
                                                            |
                                                      _cons |    .114522   .1798594     0.64   0.524    -.2379959    .4670398
------------------------------------------------------------+----------------------------------------------------------------
                                                    sigma_u |  .26173946
                                                    sigma_e |  .34170305
                                                        rho |  .36977434   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------------------------------------------

Again, as my interest in this analysis is the relationship between unemployment and health, I want to report the coefficient on unemployed_y (-.0004498) to 3 decimal places as with my other coefficients, however this would be -0.000.

Would it be acceptable to report -0.000 in the results table in a journal article or would I be better to report to 4 decimal places, i.e. -0.0004. What would a coefficient of -0.000 even imply? I know that this coefficient is not statistically significant but how would I describe -0.000 as an effect of unemployment on health?

I apologize for what is a seemingly simple question, but I am unsure as to what is the accepted norm for reporting in journal articles in this situation so any support would be greatly appreciated.

Tags: coefficients, panel data, regression, results, syntax

Richard Williams

Join Date: Apr 2014

Posts: 5025
#2

26 Jun 2018, 16:02

I don’t know how your variables are measured but often it is a good idea to rescale them, e.g. use income in thousands of dollars rather than dollars. That will shift the decimal point and make things easier to read.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5025
#3

26 Jun 2018, 16:05

Also leaving a variable as is isn’t so bad if it is not statistically significant. What looks weird is if you have something like -.0000***.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
John Adler

Join Date: Apr 2017

Posts: 173
#4

26 Jun 2018, 16:32

Dear Richard,

Thank you for your response (I am a big fan of your panel work by the way),

My outcome variable here is binary, in the first regression reported it is whether the respondent usually engages in 20 minutes of moderate exercise during a typical week, yes or no.

As this is a linear probability model, in the first regression (where the outcome is binary, whether the respondent usually engages in 20 minutes of moderate exercise during a typical week, yes or no) I would report that a one percentage point (pp) increase in unemployment increases the probability of engaging in moderate exercise in a typical week by a statistically insignificant 0.1 percentage points (As I report the coefficient on unemployed_y (.0013806) to 3 decimal places as 0.001).

For the second regression I report my outcome variable is still binary, it is whether the respondent usually engages in 20 minutes of strenuous exercise during a typical week, yes or no.

But with rounding up the coefficient on unemployed_y (-.0004498) to 3 decimal places in the second regression for strenuous exercise I get -0.000 and I am not really sure what to do with that, as it is not statistically significant is it alright to report it as -0.000 in my results table? Or would you expect this to raise questions as being a weird result?

I am not sure how I could rescale these as they are binary outcome variables in a linear probability model.

All the best,

John
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5025
#5

26 Jun 2018, 17:27

Don't rescale the binary dependent variable. Rescale the independent variable that is causing the aesthetic output problems. For example, if income is measured in dollars,

gen xincome = income/1000

would create a new variable measured in thousands of dollars. If you then used xincome, its coefficient would be multiplied by 1000. Substantively, nothing would change, but the aesthetics and the interpretation could be easier.

In my example, rescaling makes substantive sense. But in other cases, the scaling of the rescaled variable may seem weird, in which case you might not want to do it.

If the coefficients are clearly insignificant, reporting .0000 may be fine. If they were significant at the .06 level but not .05, all those 0s might bother me.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

26 Jun 2018, 17:35

So is unemployment not a characteristic of the individual? Before reading post #4 I had assumed it too was a 0/1 variable. Naively, if you divide unemployment by 1000 you will multiply the coefficient estimate by 1000. But if unemployment is as 0/1 variable, a 1% increase in unemployment is meaningless, is it not? In that case, you probably want to use the margins command to report something more realistic.

So I guess what I'm saying is that we need to know somewhat more about your independent variable before we can recommend a strategy for rescaling or redirection your reporting.
Comment
John Adler

Join Date: Apr 2017

Posts: 173
#7

27 Jun 2018, 03:42

Dear all,

Thank you for your feedback, and of course I should have been clearer on my independent variable, unemployment in my analysis is the percentage of the labor market in the respondents local area that is unemployed. In my analysis the respondents local area is defined as the county they live in, but you could also think of this as the State that they live in if this were an American dataset.

It is recorded at the beginning of each wave and can be described as follows:

Code:

. sum unemployed_total_2002 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- unemp_~2002 | 2,132 7.781173 1.835612 5.71 15.6 . sum unemployed_total_2006 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- unemp_~2006 | 2,132 7.54015 1.849402 5.41 12.94 . sum unemployed_total_2011 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- unemp_~2011 | 2,132 16.86173 3.456228 11.23 26.15

I also attach a screenshot of my output to further inform our discussion:

I was thinking that I could probably drop the minus from my original coefficient on strenuous exercise (-.0004498) from -0.000 to 0.000 as 0.000 generally wouldn't be negative anyway.

In my article I describe these results as follows:

Table 2 describes the results of the initial random effects regressions; the estimation sample comprises mothers who appeared in at least 2 waves. Coefficients reflect the change in the outcome that is associated with a unit increase in percentage unemployed at the local area level. A one percentage point (pp) increase in unemployment increases the probability of being obese by a statistically significant 0.6 percentage points in waves 1, 2 and 3, 0.6 percentage points in waves 2 and 3, and 0.8 percentage points in waves 1 and 3. Overweight is similarly positive and significant when analysed in waves 1 & 3, where a one-unit increase in local area unemployment is associated with a 1.6% percentage point increase in the probability of mothers being overweight, but of insignificant effect in other waves. Although self-reporting BMI in wave 1 suggests that these results should be interpreted with caution, analysis of objectively measured BMI only (i.e. results for waves 2 and 3) supports the magnitude and direction of effect seen across other waves. For objectively measured BMI in waves 2 and 3 each increase in local area unemployment causes the obesity probability to increase by 0.6 percentage points, as above.

A one percentage point increase in unemployment is associated with a statistically significant 1.8 percentage point reduction in the probability of at least 20 minutes of mild exercise per week. By contrast, the unemployment coefficient is positive for moderate exercise, and then negative again for strenuous exercise. Since these estimates are not statistically significant, this may reflect chance variation (Ruhm, 2000). Tobacco consumption is also negatively associated with increases in unemployment. A one-unit increase in unemployment lowers the probability of being a smoker by a statistically significant 2.5 percentage points, of self-reporting being a regular (i.e. frequent) smoker by 2.4 percentage points and of consuming more than the national daily average of cigarettes by 1.2 percentage points. The unemployment coefficient is negative for alcohol use, but not statistically significant.

The association between increased levels of unemployment and mental well-being appears to be negative, with both measures of mental well-being suggesting that as unemployment rises, the probability of poor mental well-being falls (a statistically significant decrease of 2.1 percentage points for the GHQ-12 linked to SWEMWBS and 0.9 percentage points for the CES-D linked to the SWEMWBS). Self-rated health is statistically insignificant across all waves bar 2 and 3, where an additional unit of unemployment increases the probability of self-reporting excellent or very good health by 0.4 percentage points.

As I am happy with how my other coefficients look, would it be possible to rescale just this one coefficient or would I be better to rescale all?

Rather than complicate things too much, if a coefficient of 0.000 is not likely to raise eyebrows on review of this article I would be happiest to take the simplest approach and leave it as is. Particularly as I have significant tasks ahead of me in dealing with attrition in this paper.

All the very best,

John
Attached Files
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

27 Jun 2018, 11:19

A few random thoughts that do not address rounding.

In post #1 we see as part of your xtreg command (broken onto multiple lines for clarity)

Code:

/// if has_y0_questionnaire==1 & has_y5_questionnaire==1 /// | has_y0_questionnaire==1 & has_y10_questionnaire==1 /// | has_y0_questionnaire==1 & has_y5_questionnaire==1 & has_y10_questionnaire==1

In post #7 you write

the estimation sample comprises mothers who appeared in at least 2 waves.

For that to be true, I would have expected the if clause to include

Code:

| has_y5_questionnaire==1 & has_y10_questionnaire==1 ///

And if that is the case, the if clause could be rewritten as

Code:

if (has_y0_questionnaire==1) + (has_y5_questionnaire==1) + (has_y10_questionnaire==1) >=2

or if the three variables in question take only 0/1 values

Code:

if has_y0_questionnaire + has_y5_questionnaire + has_y10_questionnaire >=2

I am also concerned that you have a categorical variable for the amount of exercise - strenuous, moderate, mild, and I'm assuming "no exercise" as the omitted category - but you model each category separately, ignoring their interrelationship. That seems more open to criticism than the presentation of .000 as a coefficient estimate for a coefficient that fails to be significant. As presented in your table, it is clear that the standard error dwarves the estimate, and the t statistic is essentially 0. Increased accuracy in the numerator of that t statistic (the estimate) doesn't add anything.
Comment
John Adler

Join Date: Apr 2017

Posts: 173
#9

28 Jun 2018, 03:00

Dear William,

Thank you for your response and for your feedback on other parts of my analysis, I very much appreciate you sharing your views on this piece.

I think that is a very elegant approach to inclusion, and certainly something I will consider employing in the future, the more parsimonious the .do file is the better.

Although it seems logical to include

Code:

| has_y5_questionnaire==1 & has_y10_questionnaire==1 ///

I was not entirely clear when first describing my samples inclusion criterion, I want to include only those mothers who were in year 0 at a baseline and then at least one other wave. I had a number of mothers in year 5 and year 10 but not year 0 and I purposely exclude these mothers from my analysis.

In terms of exercise, each variable was measured in the questionnaire as follows, the respondent was asked in question 1

"Considering a 7-day period (a week), how many times on average do you do the following kinds of exercise for more than 20 minutes during your free/leisure time? (Strenuous exercise, number of times per week)"

They would answer this question with a number, 0 would mean that they didn't engage in strenuous exercise for more than 20 minutes in a typical week and anything greater than zero would mean that they engaged in strenuous exercise for more than 20 minutes that many times in a typical week, i.e. 3 = they engaged in strenuous exercise for more than 20 minutes 3 times in an average week.

After completing this the respondent would move on to question 2 as follows:

"Considering a 7-day period (a week), how many times on average do you do the following kinds of exercise for more than 20 minutes during your free/leisure time? (Moderate exercise, number of times per week)"

They would answer this question with a number, 0 would mean that they didn't engage in moderate exercise for more than 20 minutes in a typical week and anything greater than zero would mean that they engaged in moderate exercise for more than 20 minutes that many times in a typical week.

Finally, the respondent would move on to question 3 as follows:

"Considering a 7-day period (a week), how many times on average do you do the following kinds of exercise for more than 20 minutes during your free/leisure time? (Mild exercise, number of times per week)"

They would answer this question with a number, 0 would mean that they didn't engage in mild exercise for more than 20 minutes in a typical week and anything greater than zero would mean that they engaged in mild exercise for more than 20 minutes that many times in a typical week.

There is another question in the questionnaire that the respondent can answer that is equal to 1 if they engage in any physical activity enough to build up a sweat and 0 if they engage in no physical activity to build up a sweat, at least once a week, but this is a separate variable again, recorded as its own answer to a separate question in the questionnaire.

Based on the way that this data was recorded, mild, moderate and strenuous exercise, each recorded as a separate variable, I thought it ok to model each variable separately, do you feel this approach was incorrect? What approach would be best? It is worth noting that none of these exercise variables exclude the other, a respondent could report engaging in mild exercise 0 times in a typical week but engaging in moderate exercise 2 times a week and strenuous exercise 3 times a week. Which is reasonable when you consider someone whose exercise regimen is a certain amount of high intensity training at the gym, coupled with a leisurely cycle to and from this gym, every week.

In terms of the presentation of .000 as a coefficient estimate for a coefficient that fails to be significant, my biggest concern is that this is ok as presented in my table, i.e. that presenting .000 as a coefficient estimate is not inappropriate. I am an early stage researcher and my concern is for doing something that is generally not the "done thing" and thus might look "weird" in an article submitted to a journal for review.

Again I appreciate your input,

Kindest regards,

John
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#10

28 Jun 2018, 05:18

The tips on writing the compound if clause were because it took me a while to sort out that the if clause was inconsistent with the description in the note on the table. I now understand that the note is where the inconsistency needs to be corrected. But in general, my subsequent comments on the syntax were directed at making the code clearer so that it will be clear to you when you return to this code as you revise and resubmit. Even splitting the pieces of the if clause onto separate lines makes its meaning clearer. With my new understanding of your criteria I would write

Code:

if has_y0_questionnaire & ( has_y5_questionnaire | has_y10_questionnaire )

since you don't need to separately check for having both questionnaires.

I understand your exercise variables now. The fact that they are not mutually exclusive 0/1 indicators resolves my concern.

With regard to the -0.000, I would include that as-is.

And a general note: no matter what you do, the reviewers will find something to be critical about. If it's only tabular presentation, that's easy to deal with downstream, the equivalent of a typo. And if criticizing the tabular presentation distracts them from criticizing the methodology, you're ahead of the game. Good luck!
Comment

Announcement

Reporting coefficients with four decimal places

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment