Simple graphic question

Rasmus Tolstrup

Join Date: Sep 2016

Posts: 31
#1

Simple graphic question

13 Nov 2018, 10:00

Hi Statlisters,

This is probably a simple question. I have searched the "STATAGRAPHICS REFERENCEMANUAL RELEASE 15" for hours, so now I ask you guys.

The y values are in percent and absolute values so I would like to stop the yaxis at 0. But I cannot do that probably because of the blue fitted line.

Here is my command:

Code:

twoway (scatter absolute_perc_error Truestep, mcolor(black) msize(vsmall)) (lfitci absolute_perc_error Truestep if rollator_1==1, range(. .) lcolor(red) clwidth(medium) fcolor(%40) alcolor(%0)) (lfitci absolute_perc_error Truestep if rollator_1==0, lcolor(blue) clwidth(medium) fcolor(%40) alcolor(%0)) if HM1_HG2_HU3_WM4_WG5_WU_6==1, ysc(r(0 100)) title(Hip-worn Misfit Shine) legend(off) saving(HM,replace)

Can you help me with this (probably) easy fix?

I thank you in advance
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

13 Nov 2018, 12:15

No; it's difficult as you have set up the problem. The fitted lines go negative and the graph must accommodate that.

The real problem is statistical. The model is a poor choice. It seems that your response variable must be positive (or just possibly zero) and so its mean must be positive in every part of the support. In these circumstances I'd fit a generalized linear model with log link or logit link (depending on whether in principle the response could exceed 100%).
Comment
Rasmus Tolstrup

Join Date: Sep 2016

Posts: 31
#3

13 Nov 2018, 12:48

Hi Nick, thank you for your response.

Okay, I am sorry for asking. Why is the model a poor choice and how do I fit a GLM to my scatterplot?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#4

13 Nov 2018, 12:54

You don't have to be sorry for asking. Expect frank comments if people think the analysis can be improved (greatly) as I do.

I explained. The principle has been described as respecting the range of the response. Also look at the pattern of points. A convex downward curve is surely a better fit than straight lines. If you post the data people can experiment.
Comment

Rasmus Tolstrup

Join Date: Sep 2016
Posts: 31

13 Nov 2018, 13:18

Okay. I can see that a convex downward curve could be a better fit.

Here is my data.

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double Truestep int Count double(difference absolute_perc_error) byte rollator_1
  229  50  -179  78.16593886462883 1
  294  32  -262   89.1156462585034 1
  313  99  -214    68.370607028754 0
315.5 595 279.5  88.58954041204437 0
  387  53  -334  86.30490956072352 1
  397 125  -272  68.51385390428212 0
  398 331   -67 16.834170854271356 0
  427 346   -81 18.969555035128806 1
  427  34  -393  92.03747072599532 0
  435 365   -70 16.091954022988507 1
  437 387   -50 11.441647597254006 1
  449 369   -80 17.817371937639198 1
  451 185  -266  58.98004434589801 1
  455 468    13  2.857142857142857 0
  458 349  -109 23.799126637554586 0
  465 372   -93                 20 1
  467 445   -22  4.710920770877944 1
  467 416   -51 10.920770877944326 1
  478  88  -390  81.58995815899581 1
  482  55  -427  88.58921161825725 1
  486 450   -36 7.4074074074074066 0
  489 457   -32    6.5439672801636 0
  490 498     8 1.6326530612244898 0
  498 511    13  2.610441767068273 1
  505 516    11  2.178217821782178 0
  506 447   -59   11.6600790513834 1
  510 462   -48  9.411764705882353 1
  513 466   -47  9.161793372319687 1
  517 512    -5  .9671179883945842 1
  518 472   -46  8.880308880308881 0
  520 493   -27 5.1923076923076925 1
  520 502   -18 3.4615384615384617 0
  521 517    -4  .7677543186180422 0
  527 555    28  5.313092979127135 0
  528 126  -402  76.13636363636364 1
  529 540    11 2.0793950850661624 0
  531 524    -7 1.3182674199623352 1
  532  94  -438  82.33082706766918 0
  533 551    18  3.377110694183865 1
  537 386  -151  28.11918063314711 1
  550 609    59 10.727272727272727 0
  551 580    29  5.263157894736842 0
  555 612    57  10.27027027027027 0
560.5 558  -2.5 .44603033006244425 0
  569 602    33  5.799648506151142 0
  570 477   -93 16.315789473684212 1
  574 576     2 .34843205574912894 1
  575 598    23                  4 1
  576 590    14  2.430555555555556 0
  581 541   -40  6.884681583476763 1
  582 554   -28  4.810996563573884 0
  584 562   -22  3.767123287671233 0
  586 593     7 1.1945392491467577 0
  589 523   -66 11.205432937181664 0
  589 600    11 1.8675721561969438 0
  589 583    -6 1.0186757215619695 0
  591 599     8  1.353637901861252 0
  592 579   -13  2.195945945945946 0
  593 604    11  1.854974704890388 1
  593 568   -25  4.215851602023609 0
  601 562   -39    6.4891846921797 1
  603 590   -13  2.155887230514096 0
  607 576   -31  5.107084019769357 0
  611 618     7 1.1456628477905073 0
  612 612     0                  0 0
  619 621     2 .32310177705977383 0
  624 672    48 7.6923076923076925 0
  624 590   -34  5.448717948717949 0
  630 646    16 2.5396825396825395 0
  632 626    -6   .949367088607595 0
  633 631    -2   .315955766192733 0
  635 665    30  4.724409448818897 0
  635 590   -45  7.086614173228346 1
  636 655    19 2.9874213836477987 0
  637 681    44  6.907378335949764 0
  638 620   -18 2.8213166144200628 1
  639 650    11 1.7214397496087637 1
  639 668    29  4.538341158059469 0
  640 640     0                  0 1
  641 670    29 4.5241809672386895 0
  642 649     7 1.0903426791277258 1
  643 652     9 1.3996889580093312 0
  645 657    12 1.8604651162790697 0
  648 649     1 .15432098765432098 0
  655 664     9 1.3740458015267176 0
  656 671    15 2.2865853658536586 1
  658 640   -18  2.735562310030395 0
  661 551  -110  16.64145234493192 0
664.5 659  -5.5  .8276899924755455 0
  669 679    10 1.4947683109118086 0
  674 709    35  5.192878338278932 1
  676 713    37  5.473372781065089 0
  677 694    17  2.511078286558346 0
  680 694    14 2.0588235294117645 0
  680 706    26  3.823529411764706 0
  689 724    35  5.079825834542816 0
  697 706     9  1.291248206599713 0
  706 682   -24   3.39943342776204 0
  767 803    36   4.69361147327249 0
  802 832    30 3.7406483790523692 0
end

------------------ copy up to and including the previous line ------------------

Listed 100 out of 202 observations
Use the count() option to list more

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35698

13 Nov 2018, 15:06

Thanks for the data example. This is indicative, and certainly not definitive.

Code:

set scheme s1color
gen absolute_pr = absolute/100 
glm absolute_pr True i.rollator_1 , link(logit) f(binomial) vce(robust) 
predict logit 
replace logit = logit * 100 
scatter absolute_perc True if rollator_1, ms(none) mla(rollator_1) mlabpos(0) mlabc(orange) || ///
scatter absolute_perc True if !rollator_1, ms(none) mla(rollator_1) mlabpos(0) mlabc(blue) || ///
mspline logit True if rollator_1, lp(dash) lc(orange) sort || mspline logit True if rollator_1 == 0, sort lc(blue) /// 
legend(order(3 "logit predicting 0" 4 "logit predicting 1") ring(0) pos(1) col(1)) ytitle(absolute_perc_error)

Click image for larger version

Name: logit_predictions.png
Views: 1
Size: 44.9 KB
ID: 1470287

Comment

Rasmus Tolstrup

Join Date: Sep 2016

Posts: 31
#7

14 Nov 2018, 11:33

Hi Nick, thank you for your help! This is the models for each device.

Thx
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#8

14 Nov 2018, 11:38

Seems to work quite well?
Comment
Rasmus Tolstrup

Join Date: Sep 2016

Posts: 31
#9

15 Nov 2018, 08:48

Yes, they work very well

Do you know how to get the R squared value for each model?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#10

15 Nov 2018, 08:52

Some resources:

https://www.stata.com/support/faqs/s...ics/r-squared/

. ssc desc glmcorr

-------------------------------------------------------------------------------
package glmcorr from http://fmwww.bc.edu/repec/bocode/g
-------------------------------------------------------------------------------

TITLE
'GLMCORR': module for correlation measure of predictive power for GLMs

DESCRIPTION/AUTHOR(S)

glmcorr calculates the correlation and the jackknifed correlation
between the response and the fitted or predicted response
together with the root mean square error after glm. Zheng and
Agresti (Statistics in Medicine 2000) discuss this correlation as
a general measure of predictive power for GLMs.

Distribution-Date: 20040804

Author: Nicholas J. Cox, University of Durham
Support: email [email protected]

INSTALLATION FILES (type net install glmcorr)
glmcorr.ado
glmcorr.hlp
-------------------------------------------------------------------------------
(type ssc install glmcorr to install)
Comment
Rasmus Tolstrup

Join Date: Sep 2016

Posts: 31
#11

15 Nov 2018, 10:01

Okay, thank you.

3. Why is R-squared not supplied?

If Stata refuses to give you an R-squared, there may be a good explanation other than that the developers never got around to implementing it. Perhaps the R-squared does not seem to be a good measure for this model, on some technical grounds. You have to consult the literature or an expert to take this further, unless you are an expert, in which case you probably disagree with the other experts.

So maybe I should not try to to calculate the R2? I interpret this as R2 only being valid for linear models and not the best estimate of goodness of fit for logit models, is that correct?

Best regards
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#12

15 Nov 2018, 10:33

It's neither correct nor incorrect. It depends on how you think of R-square and what's fundamental in your conceptualisation (ugly word, but I think it's key). If you think it's in essence the square of the correlation between observed and predicted, you can always calculate it. If you think R-square only makes sense thought of in terms of sums of squared deviations, then that is utterly irrelevant to logit models as usually construed. What is a danger zone: there is no sense in which a generalized linear model is bound to maximize R-square, unless it happens to reduce to a linear regression. That's my understanding.
Comment
Rasmus Tolstrup

Join Date: Sep 2016

Posts: 31
#13

15 Nov 2018, 11:24

Okay, thank you again!

It is just a visualisation and only explorative. So I wont include the R2 for this.

Best
Comment

Announcement

Simple graphic question

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment