Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to rescale A variable by range

    Hello statalist members,
    i want to rescale my contious variables(x,y) by range(maximum-minimum) because of my other variables which are dichotomous.so i want to rescale my contious variables(x,y) so that thier values have a maximum range of 1.variable x is scaled by range of x by code and year, and variable y is by range of y in the focal company.
    My data set is individual level i,e age, tenure ,race
    Could you please guide me how can i do it please?
    best regards.

  • #2
    Here is some technique, but whether this makes sense for race or even tenure I really doubt.

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . su
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
            make |          0
           price |         74    6165.257    2949.496       3291      15906
             mpg |         74     21.2973    5.785503         12         41
           rep78 |         69    3.405797    .9899323          1          5
        headroom |         74    2.993243    .8459948        1.5          5
    -------------+---------------------------------------------------------
           trunk |         74    13.75676    4.277404          5         23
          weight |         74    3019.459    777.1936       1760       4840
          length |         74    187.9324    22.26634        142        233
            turn |         74    39.64865    4.399354         31         51
    displacement |         74    197.2973    91.83722         79        425
    -------------+---------------------------------------------------------
      gear_ratio |         74    3.014865    .4562871       2.19       3.89
         foreign |         74    .2972973    .4601885          0          1
    
    * this is the core part 
    foreach v of var price-gear {
        su `v', meanonly
        gen `v'_s = (`v' - r(min)) / (r(max) - r(min))
        label var `v'_s "`: var label `v'', scaled"
    }
    
    
    . su *_s
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
         price_s |         74    .2278444    .2338086          0          1
           mpg_s |         74    .3205965    .1995001          0          1
         rep78_s |         69    .6014493    .2474831          0          1
      headroom_s |         74    .4266409    .2417128          0          1
         trunk_s |         74    .4864865    .2376336          0          1
    -------------+---------------------------------------------------------
        weight_s |         74    .4089154    .2523356          0          1
        length_s |         74     .504752    .2446851          0          1
          turn_s |         74    .4324324    .2199677          0          1
    displaceme~s |         74    .3418997    .2654255          0          1
    gear_ratio_s |         74    .4852146    .2684042          0          1
    
    . d *_s
    
                  storage   display    value
    variable name   type    format     label      variable label
    --------------------------------------------------------------------------------------------------------
    price_s         float   %9.0g                 Price, scaled
    mpg_s           float   %9.0g                 Mileage (mpg), scaled
    rep78_s         float   %9.0g                 Repair Record 1978, scaled
    headroom_s      float   %9.0g                 Headroom (in.), scaled
    trunk_s         float   %9.0g                 Trunk space (cu. ft.), scaled
    weight_s        float   %9.0g                 Weight (lbs.), scaled
    length_s        float   %9.0g                 Length (in.), scaled
    turn_s          float   %9.0g                 Turn Circle (ft.) , scaled
    displacement_s  float   %9.0g                 Displacement (cu. in.), scaled
    gear_ratio_s    float   %9.0g                 Gear Ratio, scaled

    Comment


    • #3
      Thank you so much sir, it is ok for my continous variables.

      Comment


      • #4
        I think you might be interested in -nscale-, ssc command doing exactly what you want. It uses exactly the same technique described in #2 but also supports some options to deal with missing values and is able to do rescaling & reverse coding at once. Try
        Code:
        ssc describe nscale

        For these kinds of tasks(managing variables with simple criteria), this post might be useful: https://blog.stata.com/2018/10/09/ho...-common-tasks/
        Last edited by JeongHoon Min; 30 Jan 2020, 15:29.

        Comment


        • #5
          JeongHoon Min thank you so much for your kind reply.
          normally we are standradizing (normalizing) variable by z statistic, having mean (0) and SD (1), and using egen function.But in my this case the base paper suggested and used range (maximum-minimum) for scaling in R program.Furthermore they standardized nominal variables by SD (1/square root 2) and contious variables by thier range,so that to give equal weights to all variables, for example age is contious variable, so we will divide it by range of age,and then we will use weights for allvariables.Further i dont know the technical point,as in standardization we are divideing it by standard deviation,but in my case i wantd to use range.but as above Nick cox sir suggestd i will use it for scaling or i will wait for some other suggestions.

          Comment


          • #6
            Hi Nick
            Thank you for your code for min-.max scaling.

            I was trying to invert the variable back to the original scale since we used imputation for some of the original values, which is why I need the inverted data. It should be simple but I can only get it to give the same.

            I used the code below.

            Thank you,
            Anne


            *Code from post #2:
            sysuse auto, clear
            summarize
            brow
            foreach v of var price-gear {
            su `v', meanonly
            gen `v'_s = (`v' - r(min)) / (r(max) - r(min))
            label var `v'_s "`: var label `v'', scaled"
            }
            su *_s
            d *_s


            *My try to inverse variables back to original:
            foreach v of var price_s-gear_ratio_s {
            su `v', meanonly
            gen n_`v' = `v'*(r(max) - r(min))+ r(min)
            }
            Last edited by Anne Ahrens; 30 Nov 2023, 01:49.

            Comment


            • #7
              #6 Anne Ahrens

              Your *s variables were all created to have bounds [0, 1] and so minimum 0 and range 1. Hence the recipe

              new = old * range + minimum

              as you say boils down to

              new = old * 1 + 0

              or more simply

              new = old.

              A standardized variable whether it's a matter of

              [min, max] -> [0, 1]

              or

              (value - mean) / SD

              has the limitation that the original values have been discarded and that can be inverted or reversed only with original values or summary statistics. So, you need access to the original variables and if you have them you can go straight there.

              Imputation may add some twist to this, especially if imputation can produce values not exactly consistent with the original summaries.

              I fear that i am missing the question here.

              Comment


              • #8
                Thank you for your quick reply, Nick.

                You are the wizard of Stata, but even you have to follow the rules of math.

                Comment


                • #9
                  Thanks for the compliment!

                  I'd phrase it as that no one can escape the rules of math, or maths, as British people say.

                  Comment

                  Working...
                  X