Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • INTERVALS OF TWO VARIABLES

    Hello,

    I want to ask something easy. If I have a variable (var1) and I computed two more variables like: var2=var1 - 5, var3=var1 + 5. How can I create an interval of these two vars, where var2 is the lower limit and var3 is the upper limit?
    An easy example:
    var1 var2 var3
    10 5 15
    20 15 25
    30 25 35
    40 35 45
    50 45 55
    60 55 65
    70 65 75
    80 75 85
    90 85 95
    100 95 105

    So I want to create a new var which will be the interval of var2, var3. That is:
    var4
    5-15
    15-25
    25-35
    35-45
    45-55
    55-65
    65-75
    75-85
    85-95
    95-105

    I tried concat but it did not work the way I wanted. I need var4 so later I can count the number of observations which are observed in every interval.

  • #2
    Please do not refer to problems as "easy" (see the Forum FAQ on this). Also, your question is not very clear to me. Or, put differently, are you really sure that you want to produce a string variable to describe the interval (band) that var2 observations lie in? (I ask because you refer to concat.) Why not use a numerical variable, and show the interval values using labels? In addition, there is an inherent ambiguity in your description of what you want, because there are overlapping values in your definitions of interval boundaries. For example, an observation with var2 = 15 cannot be in both the first and second intervals. So you need a rule about which interval such an observation should be in (and I implement one below). Below I show one (of the many) ways of creating something like your var2 variable


    Code:
    . input var1 var2 var3
    
    . input var1 var2 var3
    
              var1       var2       var3
      1.  10 5 15
      2.  20 15 25
      3.  30 25 35
      4.  40 35 45
      5.  50 45 55
      6.  60 55 65
      7.  70 65 75
      8.  80 75 85
      9.  90 85 95
     10.  100 95 105
     11. 
    .  end
    
    . 
    .  ge byte grouped_var2 = .
    (10 missing values generated)
    
    .  replace grouped_var2 = 1 if inrange(var2, 5, 14)
    (1 real change made)
    
    .  replace grouped_var2 = 2 if inrange(var2, 15, 24)
    (1 real change made)
    
    .  replace grouped_var2 = 3 if inrange(var2, 25, 34)
    (1 real change made)
    
    .  replace grouped_var2 = 4 if inrange(var2, 35, 44)
    (1 real change made)
    
    .  replace grouped_var2 = 5 if inrange(var2, 45, 54)
    (1 real change made)
    
    .  replace grouped_var2 = 6 if inrange(var2, 55, 64)
    (1 real change made)
    
    .  replace grouped_var2 = 7 if inrange(var2, 65, 74)
    (1 real change made)
    
    .  replace grouped_var2 = 8 if inrange(var2, 75, 84)
    (1 real change made)
    
    .  replace grouped_var2 = 9 if inrange(var2, 85, 94)
    (1 real change made)
    
    .  replace grouped_var2 = 10 if inrange(var2, 95, 150)
    (1 real change made)
    
    . 
    . 
    . label def gvar2_labels ///
    >         1 "5-14" 2 "15-24" 3 "25-34" 4 "35-44" 5 "45-54" ///
    >         6 "55-64" 7 "65-74" 8 "75-84" 9 "85-94" 10 "95+"
    
    . label val grouped_var2 gvar2_labels
    
    .
    . list
    
         +-------------------------------+
         | var1   var2   var3   groupe~2 |
         |-------------------------------|
      1. |   10      5     15       5-14 |
      2. |   20     15     25      15-24 |
      3. |   30     25     35      25-34 |
      4. |   40     35     45      35-44 |
      5. |   50     45     55      45-54 |
         |-------------------------------|
      6. |   60     55     65      55-64 |
      7. |   70     65     75      65-74 |
      8. |   80     75     85      75-84 |
      9. |   90     85     95      85-94 |
     10. |  100     95    105        95+ |
         +-------------------------------+
    
    . list, nolabel
    
         +-------------------------------+
         | var1   var2   var3   groupe~2 |
         |-------------------------------|
      1. |   10      5     15          1 |
      2. |   20     15     25          2 |
      3. |   30     25     35          3 |
      4. |   40     35     45          4 |
      5. |   50     45     55          5 |
         |-------------------------------|
      6. |   60     55     65          6 |
      7. |   70     65     75          7 |
      8. |   80     75     85          8 |
      9. |   90     85     95          9 |
     10. |  100     95    105         10 |
         +-------------------------------+
    
    
    . tab grouped_var2
    
    grouped_var |
              2 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
           5-14 |          1       10.00       10.00
          15-24 |          1       10.00       20.00
          25-34 |          1       10.00       30.00
          35-44 |          1       10.00       40.00
          45-54 |          1       10.00       50.00
          55-64 |          1       10.00       60.00
          65-74 |          1       10.00       70.00
          75-84 |          1       10.00       80.00
          85-94 |          1       10.00       90.00
            95+ |          1       10.00      100.00
    ------------+-----------------------------------
          Total |         10      100.00
    
    . tab grouped_var2, nolabel
    
    grouped_var |
              2 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |          1       10.00       10.00
              2 |          1       10.00       20.00
              3 |          1       10.00       30.00
              4 |          1       10.00       40.00
              5 |          1       10.00       50.00
              6 |          1       10.00       60.00
              7 |          1       10.00       70.00
              8 |          1       10.00       80.00
              9 |          1       10.00       90.00
             10 |          1       10.00      100.00
    ------------+-----------------------------------
          Total |         10      100.00

    Comment


    • #3
      <enter grumpy guru mode>

      Stephen is right as usual.

      I want to ask something easy.
      If it is easy, you should be able to solve it. If it is easy, no experienced user is much interested in writing about it. If you can't solve it, you can't be sure it is easy.

      You have a good question, and no need to qualify what it is.

      <exit grumpy guru mode, temporarily>

      Now to be more constructive:

      Code:
      egen interval = concat(var2 var3), p(-)
      is a way to do what you want.

      Code:
      gen interval = string(var2) + "-" + string(var3)
      is another way, but the question of ambiguous intervals is more important.


      Last edited by Nick Cox; 29 Aug 2014, 05:55.

      Comment


      • #4
        If there is a pattern to the ranges, then, yes there are efficient ways to do it. If the ranges are defined by a long sequence of seemingly arbitrary numbers, no. So, if there is some rule that defines what the boundaries of the ranges are, tell us what it is and someone will probably show you how to program it in a few lines of code. For example, to program the specific intervals that Stephen Jenkins showed you above:

        Code:
        gen grouped_var2 = floor((var2+5)/10)
        replace grouped_var2 = . if var2 < 5
        replace grouped_var2 = 10 if var2 > 95
        The first line takes care of most of the cases, based on its algebraic pattern. The last two lines clean up the details on the boundary.

        Comment


        • #5
          Also, to generate the labels automatically for this example:

          Code:
          capture label drop grouped_var2
          forvalues j = 1/9 {
              local lb = 10*`j'-5
              local ub = `lb' + 9
              label define grouped_var2 `j' `"`lb'-`ub'"', modify
          }
          label define grouped_var2 10 "95+", modify

          Comment


          • #6
            In my dataset I have thousands of observations so I cannot proceed manually for each interval.
            Why does the number of observations matter? Remember that the code shown by Clyde and me works on all observations in the active data set. The hard part is working out what intervals you want and then deciding how to implement code, trading off being concise and potentially faster (note well Clyde's suggestions) and verbosity (which may help transparency and understanding -- see my code). If the issue is more "automation", then you might profit from looking at some of the tools already provided. Type search cut and follow the relevant links.

            Comment


            • #7
              The boundaries of each range should always be (var2-var3) for each observation
              Your statement doesn't address the issue of how to handle "boundary" values that could be in two intervals. For example, in what band -- "5-15" and "15-25" -- should the value 15 be placed?

              It appears to me that part of your problem is arising because you are thinking of your boundary definitions in terms of variables (var2, var3). It is possible to do it that way, but conceptually cleaner to separate out the issue of consistent definition of the boundaries. Observe that both my and Clyde's "solutions" take this latter approach.

              Perhaps you could tell us more about your ultimate purpose for creating the banded variable, and we might be able to advise on that directly.

              Comment


              • #8
                You have tried to explain this several times, and clearly nobody has understood what you want. You refer to "these intervals," but we don't know what intervals you mean. You refer to your "case variable," but we don't know what variable that is. Clearly attempts to explain this in words are failing.

                Please calculate by hand the results you would like to see from the data above, and then repost the data, with an additional column showing the corresponding desired results.

                Also, when you repost the results, please put them in a CODE block so that they will line up better and be easier to read.

                Comment

                Working...
                X