Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to make categories using standardized variable and its standard deviation

    I am interested in making categories(Large, small, medium) based on the log of [/CODE]market value of equity (z_ln_mve)
    the variable (z-ln_mve) is standardized using commands:
    Code:
    gen z_ln_mve=0
    Code:
    replace z_ln_mve=ln_mve-mean(ln_mve)
    I want the categories to be
    small= less than or equal to _1 standard deviation in the market value variable (z_ln_mve)
    medium= between greater than -1 and less than +1 standard deviation
    large= greater than or equal to +1 standard deviation
    the sample data is given below
    where ln_mve= log of market value
    and
    z_ln_mve= standardized values of log of market values.

    Code:
    sum z_ln_mve
    Variable | Obs Mean Std. Dev. Min Max
    ----------------------------------------------------------------------
    z_ln_mve | 24,605 1.09e-06 2.304831 -6.974789 6.907081


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double ln_mve float z_ln_mve
    17.565108747112916  -3.331611
     17.40258981761514   -3.49413
      17.8344416808965  -3.062278
    17.688934097225264  -3.207786
    16.811052813465977  -4.085667
    17.639541341895686 -3.2571785
     17.57640830236685  -3.320312
     17.46979856730859 -3.4269214
    17.856760203852197   -3.03996
    17.865213339563255  -3.031507
    18.012122858133953  -2.884597
     18.10107034415045   -2.79565
    18.137627939884247  -2.759092
    17.875263675416754  -3.021456
    18.181522133441472  -2.715198
    17.993046711073138  -2.903673
     17.86689542574624 -3.0298245
    17.788252298427125  -3.108468
    17.692942118622803  -3.203778
    17.875263675416754  -3.021456
    17.753096041442546  -3.143624
    17.830963416520177 -3.0657566
    17.788252298427125  -3.108468
    17.764332114709475  -3.132388
     17.87025113359321  -3.026469
    17.955306383090292 -2.9414136
    18.029414355244015  -2.867306
     17.88356247823145 -3.0131576
     17.93980219655433  -2.956918
     17.93980219655433  -2.956918
    17.891792977367967  -3.004927
    17.875263675416754  -3.021456
     18.34004536068134  -2.556675
     18.86851544842704 -2.0282044
    18.907854862987072  -1.988865
    18.865424255857366 -2.0312958
    18.922582669697317 -1.9741373
     18.92841359000811 -1.9683064
    18.856717733197325 -2.0400023
     18.84667370651199 -2.0500462
     18.87773210353196  -2.018988
    18.959889722110127 -1.9368303
    18.869749254875966 -2.0269709
    18.589030143179436   -2.30769
    18.641506411047253 -2.2552135
     18.78419405288462  -2.112526
    19.458357612511254 -1.4383624
    20.288196825579668  -.6085232
    19.792186287598817 -1.1045337
      19.4210515607744 -1.4756684
     19.36027150947268 -1.5364485
    19.174546659547016 -1.7221733
    19.179081814712408  -1.717638
     19.00127493827298  -1.895445
    18.920240748252233 -1.9764793
    19.044127302629125 -1.8525927
     19.05032246957805 -1.8463975
     18.74376374345456 -2.1529562
     18.75765285561523 -2.1390672
    18.974431365376567 -1.9222887
    18.615612457460994 -2.2811074
    18.291778619711504 -2.6049414
    18.386089299182746 -2.5106306
     18.46859052069449 -2.4281294
     18.43487946335218 -2.4618406
    18.605224829099416  -2.291495
    18.636069504450514 -2.2606504
    18.648453563650236 -2.2482665
    18.481399478987072 -2.4153206
       18.335848082746  -2.560872
     18.28072878352492  -2.615991
     18.28072878352492  -2.615991
     18.28072878352492  -2.615991
    18.345267304662492 -2.5514526
     18.35563009169804   -2.54109
    18.386089299182746 -2.5106306
    18.258255927672863  -2.638464
     18.42049072590008  -2.476229
    18.481399478987072 -2.4153206
     18.43487946335218 -2.4618406
    18.258255927672863  -2.638464
    18.386089299182746 -2.5106306
    18.308127757713034  -2.588592
     18.39702923922108  -2.499691
     18.49941798448975  -2.397302
     18.43487946335218 -2.4618406
     18.49941798448975  -2.397302
     18.55160373766032 -2.3451164
     18.55160373766032 -2.3451164
    18.481399478987072 -2.4153206
    18.391076840693785  -2.505643
    18.391076840693785  -2.505643
    18.385088798849164  -2.511631
     18.33374281881054  -2.562977
     18.33374281881054  -2.562977
    22.984427670946943  2.0877078
    22.984427670946943  2.0877078
     22.99255779703019  2.0958378
    22.833960674210953  1.9372407
     22.92828914500774  2.0315692
    end
    please provide me with the necessary commands for the operations.
    Thank you

  • #2
    Zulfiqar:
    you may want to consider:
    Code:
    . sum z_ln_mve
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
        z_ln_mve |        100   -2.281381    1.132245  -4.085667   2.095838
    
    . g wanted=0 if z_ln_mve<=r(sd)
    
    . replace wanted=1 if z_ln_mve>-r(sd) & z_ln_mve<r(sd)
    
    . replace wanted=2 if z_ln_mve>=r(sd)
    
    . label define wanted 0 Small 1 Medium 2 Large
    
    . label val wanted wanted
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      I guess #1 did not mean what it said, although Carlo Lazzaro helpfully gave what was asked for.
      z_ln_mve is already standardized. I imagine zulfiqar wants values less than mean - SD, more than mean + SD, and so on.
      Code:
      gen wanted = cond(z < -1, 1, cond(z >= 1, 3, 2)) if z < .
      will map standardized values less than -1 to 1, those more than or equal to 1 to 3 and others to 2. Then define and apply value labels.

      But why do this -- as it's throwing away useful detail? I don't understanding the enthusiasm for coarse binning of measured variables.

      Comment


      • #4
        thank you for the response.
        It worked but in further steps I want to make interactions of the (z_ln_mve) and oil price returns (z_rBrent) in order to check the size moderated effect of (z_ln_mve) in explaining relationship of (z_rBrent) and dependent variable "X" which is problematic using the dummy variables. Please explain
        The relationship is like:
        Code:
        X= B1(_ln_mve)+ B2(z_rBrent) +B3[(z_ln_mve)*(z_rBrent)]
        data sample is:
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(z_ln_mve z_rBrent)
         -3.331611     .0829995
          -3.49413    .06447985
         -3.062278    .05643204
         -3.207786   -.15744352
         -4.085667   .005509714
        -3.2571785    .04209122
         -3.320312    -.0456385
        -3.4269214    .09851485
          -3.03996   .010852398
         -3.031507   .033469226
         -2.884597    .09852401
          -2.79565    .06467648
         -2.759092    .10219084
         -3.021456    .04923337
         -2.715198   .070861176
         -2.903673   -.07484613
        -3.0298245  -.036389347
         -3.108468    .03787261
         -3.203778   -.01562351
         -3.021456   -.11053196
         -3.143624   .064774975
        -3.0657566   .009422956
         -3.108468  -.028123755
         -3.132388    .03367486
         -3.026469     .1007651
        -2.9414136   .002490769
         -2.867306  -.027444176
        -3.0131576     -.158669
         -2.956918   -.04007411
         -2.956918   .070972376
         -3.004927    .08868664
         -3.021456  -.018512225
         -2.556675   -.03268437
        -2.0282044   .023707135
         -1.988865  -.000380628
        -2.0312958    .03988143
        -1.9741373  -.036056757
        -1.9683064   -.01158681
        -2.0400023   -.07136966
        -2.0500462    -.0188323
         -2.018988    .01817641
        -1.9368303    .05350817
        -2.0269709    .05763538
          -2.30769   -.05003607
        -2.2552135   .005026416
         -2.112526   .008478091
        -1.4383624   .010767369
         -.6085232    -.0398224
        -1.1045337     .0254831
        -1.4756684  -.011384546
        -1.5364485   .003571433
        -1.7221733   .013021928
         -1.717638    .02730451
         -1.895445   -.05738145
        -1.9764793  -.026357006
        -1.8525927   -.08547599
        -1.8463975    -.0969803
        -2.1529562   -.20138346
        -2.1390672   -.20111296
        -1.9222887   -.07802203
        -2.2811074     .1670413
        -2.6049414   -.12641574
        -2.5106306    .19277124
        -2.4281294   -.01773908
        -2.4618406   -.02981073
         -2.291495   -.19648337
        -2.2606504   .037182726
        -2.2482665   -.11217938
        -2.4153206    .02500307
         -2.560872    -.1045272
         -2.615991   -.17880227
         -2.615991   -.06986643
         -2.615991   .035492297
        -2.5514526    .09684266
          -2.54109    .19577536
        -2.5106306   .032596823
         -2.638464   .000497532
         -2.476229    -.1563412
        -2.4153206    .10313465
        -2.4618406    .04274454
         -2.638464  -.014913678
        -2.5106306    .04464634
         -2.588592    .11920808
         -2.499691   -.01920943
         -2.397302  -.001278018
        -2.4618406   -.05022532
         -2.397302  -.020342527
        -2.3451164   -.02713522
        -2.3451164   -.04797211
        -2.4153206    .09483209
         -2.505643 -.0044425996
         -2.505643    .09465432
         -2.511631    .06513956
         -2.562977   .035919346
         -2.562977    .05130757
         2.0877078     .0829995
         2.0877078    .06447985
         2.0958378    .05643204
         1.9372407   -.15744352
         2.0315692   .005509714
        end

        Comment


        • #5
          Nick Cox Sir actually I am interested in Size moderated effect of oil prices on my dependent variable. I have already mentioned it in my reply to Carlo Lazzaro sir. Please just check it. You are absolutely right. It results in loss of many observations and the results are very confusing. What will be suitable so that would not effect the number of observations?

          Comment


          • #6
            #3
            Nick Cox sir.
            I want to make three different categories on the basis of (z_ln_mve) as small medium and large and the impact of (z_rBrent) on (X-variable) of different sizes.

            Comment


            • #7
              #4 #5 #6 make the context a little clearer. You can be interested in the interaction of predictors without needing to degrade predictors to categorical.

              Comment


              • #8
                Nick Cox I am interested in both. First I am looking the impact of overall interaction of predictors on the dependent variable.
                like
                Code:
                (z_ln_mve)*(z_rBrent
                than the impact of interaction of different categories based on
                Code:
                z_ln_mve
                (size1, size2, size3) on the concerned variable like:
                Code:
                (size1)*(z_rBrent
                Code:
                (size2)*(z_rBrent
                Code:
                (size3)*(z_rBrent

                Comment

                Working...
                X