Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • squareroot variable conversion

    Hi,

    I tried to convert my variable inactive_total into a variable which I could easily plot as seen in this thread
    https://www.statalist.org/forums/for...r-on-same-axes

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    
    clear
    
    input double(sqrt_inactive_pf sqrt_inactive_tk) float(inactive_total procedure)
    
    87.88155092047192                 . 7723.167 1
    
                    . 90.17482477630605 8131.499 2
    
    88.91053372808253                 . 7905.083 1
    
                    . 81.45755940595691 6635.334 2
    
    86.23080062577698                 . 7435.751 1
    
                    .                 .  7667.25 .
    
                    . 81.59197264437232  6657.25 2
    
    89.88325761786786                 .     8079 1
    
    89.20715774077493                 . 7957.917 1
    
                    . 86.89984469383418 7551.583 2
    
    82.84423330677579                 . 6863.167 1
    
    82.89852842858551                 . 6872.166 1
    
                    . 85.05634621384814 7234.582 2
    
    81.81330571604781                 . 6693.417 1
    
    88.98313898395303                 . 7917.999 1
    
    87.87348860720166                 .  7721.75 1
    
                    . 86.03633521113332 7402.251 2
    
    84.60644760632904                 . 7158.251 1
    
                    .  88.5009434536576 7832.417 2
    
    81.20857718130826                 . 6594.833 1
    
                    .    87.57996346197  7670.25 2
    
    86.5158888496067                 . 7484.999 1
    
    88.99063986945481                 . 7919.334 1
    
                    . 84.64583269238658 7164.917 2
    
                    . 77.71958559060258 6040.334 2
    
    80.86150510479322                 . 6538.583 1
    
                    . 82.38325072488946     6787 2
    
                    . 90.36731143816607 8166.251 2
    
                    . 86.28537544465458 7445.166 2
    
    86.8106157552894                 . 7536.083 1
    
    86.08280329681126                 . 7410.249 1
    
    end
    
    label values procedure procedurelbl
    
    label def procedurelbl 1 "PF (proximal femur", modify
    
    label def procedurelbl 2 "TK (Total Knee)", modify



    I generated the square root variables by using this code:




    generate double sqrt_inactive_pf = sqrt(inactive_total) if procedure==1

    generate double sqrt_inactive_tk = sqrt(inactive_total) if procedure==2





    Does the output make sense?

    As some months ago, I generated another graph, but I hadn't saved the code.

    I got the following graph as you can see it's a different scale







    Click image for larger version

Name:	Screenshot 2025-09-08 at 19.12.32.png
Views:	1
Size:	811.7 KB
ID:	1781674









  • #2
    I'm honestly a bit confused by what this is asking. There are multiple issues:

    First, the code below is redundant:

    Code:
    generate double sqrt_inactive_pf = sqrt(inactive_total) if procedure==1
    generate double sqrt_inactive_tk = sqrt(inactive_total) if procedure==2
    Instead, you can just use one step:

    Code:
    gen sqrt_inactive = sqrt(inactive_total)
    graph box sqrt_inactive, over(procedure)
    Second, I am not sure if you are (i) trying to recreate the first graph, or (ii) inquiring if taking square root makes sense. I can address a bit on both.
    • Pretty sure this graph has problems. In a week, there are totally 7 * 24 * 60 = 10,080 minutes. The median of the so-called "square root scaled" of mins/week is about 1,700, which squared to 2,890,000 mins, which is about 287 weeks worth of time.
    • And I do not think it makes sense to take a square root on total number of minutes of activity per week. If it's skewed, then box-plot is a good candidate for that as a start; if the numbers being in 1,000s feel too overwhelming, convert that to hours. Taking a square root on this variable only serves to making interpretation more difficult.
    Last edited by Ken Chui; 08 Sep 2025, 13:28.

    Comment


    • #3
      I wanted to check whether the output of the square root function makes sense in relation to the dataex data provided. When I plotted it, I got a different result compared to the supplied graph.

      In my opinion, the supplied graph is incorrect, but since I hadn’t saved my code in a do-file, I couldn’t go back and verify exactly what went wrong.

      My question was simply to check that the data in the dataex matches what my code would have produced.


      Comment


      • #4
        Originally posted by Tara Boyle View Post
        I wanted to check whether the output of the square root function makes sense in relation to the dataex data provided. When I plotted it, I got a different result compared to the supplied graph.

        In my opinion, the supplied graph is incorrect, but since I hadn’t saved my code in a do-file, I couldn’t go back and verify exactly what went wrong.

        My question was simply to check that the data in the dataex matches what my code would have produced.
        Then probably "no," otherwise there would be some outliers at the lower end hovering around 85.

        Comment


        • #5
          Revisiting the linked thread

          https://www.statalist.org/forums/for...r-on-same-axes

          and in turn a previous thread

          https://www.statalist.org/forums/for...-of-this-graph

          helps resolve some but not all of the puzzles and even contradictions here.

          The graph in #1 I can't explain if you can't find the code. It's not square root scale. It doesn't seem to be original data either on any plausible scale. You don't need to show zero when all values are so far from zero. I agree with Ken Chui's comments generally.

          This code below repeats the code from #6 of 176181 with one extra detail and one crucial correction. Minimally, it leads to an explanation of the square root scale puzzling Ken.

          The four activity variables have units of minutes/day -- not minutes/week, for which I share the blame -- and indeed a check shows that they add to 1440 with some minor rounding error.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float(inactive light_activity mod_to_vigorous_activity vigorous_activity procedure)
          1301.083  95.667  43.167  .083 1
              1345   72.25  22.583  .167 2
              1280 122.583  37.333  .083 1
          1082.417 192.667   164.5  .417 2
          1222.083   172.5  45.417     0 1
          1337.583  65.167   37.25     0 .
           1297.75 108.583  33.667     0 2
            1361.5  62.833      15  .667 1
          1302.917  99.833   37.25     0 1
            1238.5  125.75  75.083  .667 2
          1097.917 215.833 124.667 1.583 1
           1162.25 167.667 108.833  1.25 1
          1214.167 184.667  41.167     0 2
           1078.25 142.833 216.083 2.833 1
          1326.583  86.833  26.583     0 1
          1263.417 122.167  53.917    .5 1
              1368  44.583  26.417     1 2
          1185.667 159.667  93.667     1 1
            1294.5   122.5      23     0 2
            1308.5 113.083  18.417     0 1
          end
          
          rename inactive inactive_activity
          
          egen double total = rowtotal(*activity)
          
          su total
          
          gen id = _n if procedure < .
          
          reshape long @activity, string i(id) j(WHICH)
          
          replace WHICH = trim(subinstr(WHICH, "_", " ", .))
          
          replace WHICH = subinstr(WHICH, "mod", "moderate", .)
          
          label define which 1 inactive 2 light 3 "moderate to vigorous" 4 vigorous
          
          encode WHICH, gen(which) label(which)
          
          graph box activity, over(procedure) by(which, b1title(procedure) row(1) note(""))
          
          gen sqrt_activity = sqrt(activity)
          
          separate sqrt_activity, by(which) veryshortlabel
          
          graph box sqrt_activity?, over(procedure) nofill by(which, b1title(procedure) legend(off) row(1) note("")) ///
          yla(0 10 "100" 20 "400" 30 "900" 40 "1600") ytitle("Activity (min/day)" "square root scale")
          Click image for larger version

Name:	tara1.png
Views:	1
Size:	35.8 KB
ID:	1781689



          The rationale for a square root scale is just empirical: any display is dominated by the first variable's values otherwise, and there are some zeros, so a logarithmic scale is ruled out without some further complication such as adding an arbitrary constant.

          I note that inactive time in this example dataset varies from about 75% to about 95% of the total time. (Evidently it includes all sleep.)

          Now back to this thread. I first take the data example from #1 and tidy it up by making value labels consistent and removing the blank lines:

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float(inactive_total procedure)
          7723.167 1
          8131.499 2
          7905.083 1
          6635.334 2
          7435.751 1
           7667.25 .
           6657.25 2
              8079 1
          7957.917 1
          7551.583 2
          6863.167 1
          6872.166 1
          7234.582 2
          6693.417 1
          7917.999 1
           7721.75 1
          7402.251 2
          7158.251 1
          7832.417 2
          6594.833 1
           7670.25 2
          7484.999 1
          7919.334 1
          7164.917 2
          6040.334 2
          6538.583 1
              6787 2
          8166.251 2
          7445.166 2
          7536.083 1
          7410.249 1
          end
          label values procedure procedurelbl
          label def procedurelbl 1 "PF (Proximal Femur)", modify
          label def procedurelbl 2 "TK (Total Knee)", modify
          Various comments now follow.

          The units are not minutes per day. Are they minutes per week? As a percent of total, they vary from about 60 to 81%, so that is not impossible, but evidently the patients are different on the whole from those in the previous data example.

          If you are focusing on this variable alone, zeros are not an issue and indeed it's not obvious that a transformed scale is needed at all.

          I'd start with a quantile plot. Here i use qplot from the Stata Journal.

          Code:
          egen mean = mean(inactive), by(procedure)
          bysort procedure (inactive) : gen x = cond(_n == 1, 0, cond(_n == _N, 1, .))
          
          qplot inactive, by(procedure, note("horizontal lines show means") legend(off)) ms(O) msize(large) xla(0 0.25 "0.25" 0.5 "0.5" 0.75 "0.75" 1) xtitle(Fraction of data) ytitle(Inactive total (minutes)) addplot(line mean x)
          Click image for larger version

Name:	tara2.png
Views:	1
Size:	54.4 KB
ID:	1781688



          If this were my problem, I would use a dual scale -- % of time and minutes per clearly stated base.

          For quantile plots and box plots, see e.g. https://www.statalist.org/forums/for...ercentile-sets
          Last edited by Nick Cox; 09 Sep 2025, 03:09.

          Comment


          • #6
            I readily admit that square root scale may seem exotic or at least puzzling. (Some of us have experienced hostility or incomprehension over logarithmic scales....)

            But here I revisit the first data example in the previous post and just apply different scales to outcomes of quite different magnitude. (Naturally I have no idea whether these are all the data or just s small subset.)

            Code:
            egen median = median(activity), by(which procedure)
            
             scatter activity  median procedure, by(which, legend(pos(12)) yrescale b1title(procedure) row(1) note("")) ms(o Dh) msize(medium large ) xla(1 2) legend(order(2 3) row(1)) ytitle(minutes/week)
            As usual, whether using medians, means or some other summary is best, and whether box plots show enough detail to be ideal, remain open questions.
            Click image for larger version

Name:	taranew.png
Views:	1
Size:	53.4 KB
ID:	1781693

            Comment

            Working...
            X