Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Re-scaling a variables mean, st. dev., min and max

    I have a problem. I have a variable let’s call it “open”. So variable “open” has 500 observations, it has a mean of 5, standard deviation of 1.5, minimum of 1 and maximum of 8.

    However what I want to do is scale the variable “open” using a variable from another dataset. This variable is called “closed”. “closed” has a mean of 7 st. dev. of 2, min of 1 and max of 12.

    I want it so that “open” has the same mean, st. dev.,min and max as the variable “closed” while at the same time the ranking of observations remains the same for the variable “open”.

    I can scale so that “open” and “closed” have the same mean values but that is all.

    Is there any way of recoding a variable so that you can set the mean, standard deviation, minimum and maximum equal to a particular value?

    Regards,
    Paul Kilgar.

  • #2
    Your problem, as I understand it, is intractable in general. In your example, the ranges are 7 and 12 and so not in the same ratio as your standard deviations of 1.5 and 2. So, even if you adjust for different means, adjustment to the same scale or spread requires different multipliers depending on whether the target is "same range" or "same SD".

    More and better advice depends on knowing why you want to do this. It doesn't sound like anything needed for any mainstream statistical purpose, and could not be because it is impossible to satisfy.

    Comment


    • #3
      The mean, sd, min and max values I gave were to just illustrate my problem, sorry if they are misleading. What I am trying to do is replicate another dataset. I have values for a variable rainfall but my mean is significantly different to the one in the dataset I am trying to reproduce. I want to have the same mean, sd, min and max values as the dataset I am replicating but at the same time have the observations in the same order as they under my definition of rainfall.

      I am thinking of ranking my observations (1-500), then generate random numbers but set the mean, sd, min and max to the values I want. Rank the new randomly generated variable 1-500 and match based on rank. That way I keep the same ranking order of observations but also have the mean, sd, min and max of the rainfall variable in the dataset I am trying to replicate.

      Comment


      • #4
        I dunno about the wisdom of what you want to do, but think this roughly does what you seem to want, assuming 1000 observations. Depending on the # observations, adjust accordingly.


        Code:
        sort open
        gen open2=0
          
        local counter = 0
        foreach bin of numlist 50 150 300 500 700 850 950 {
            replace open2=open2+1 if _n>`bin' 
            local counter=`counter'+1
            }
            
        sort closed
        gen closed2=0
          
        local counter = 0
        foreach bin of numlist 50 150 300 500 700 850 950 {
            replace closed2=closed2+1 if _n>`bin' 
            local counter=`counter'+1
            }

        Comment


        • #5
          BTW -- amongst other problems, the code I gave ignores ties. So you would presumably need to bootstrap or something if you ever used these variables in anything complex -- the ties will bounce around, which would mess up anything but the crudest descriptives. On the other hand, you didn't mention if you have a scale or continuous variable. With a fine-grained continuous variable, I suppose it just might work. You're throwing away information, but with seven or maybe ten bins, not throwing away *that* much. You're also losing skewness and kurtosis, for good or for bad.
          Last edited by ben earnhart; 13 Nov 2014, 13:23.

          Comment


          • #6
            I don't think your examples were misleading, in so far as I don't change my mind on hearing further details. I have worked a fair amount with hydrological and climatological data, including rainfall data, and what you propose doesn't seem to match any standard procedure. Naturally that doesn't mean it's wrong, but I'd start with a quantile-quantile plot for the two sets and consider the minimum simple transformation that matches the empirical quantiles.

            Comment


            • #7
              variable open: Mean=5, st dev=1.5 min=1 max=8

              set seed 1234567

              set obs 500

              gen open= max(min(8,invnormal(uniform())*1.5+ 5),1)


              I can now rank the variable open, rank the variable closed from my own dataset and match on rank. I keep the same order as my old dataset but get mean, st. dev, max and min values that are re-scaled to the values I want.

              Comment


              • #8
                There is only half the code here but in effect you got what you asked for only by sheer brute force, by truncating values at 1 and 8. It is hard to see how that will match any other variable satisfactorily unless it was produced in some equivalent manner. Also, although you start out with specified mean and standard deviation, are they preserved after your truncation?

                I stand by my comments in #2 and #6.

                Please note our preference for full real names. You signed your first post but it is easy for members to miss that fact as a thread lengthens.

                Comment

                Working...
                X