Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Countinuos to Interval format

    How to convert age variable which in continuos format to interval format for creating a table?

  • #2
    See https://journals.sagepub.com/doi/pdf...867X1801800311 for a basic overview. For example. a good systematic method with integers binned to taste or custom is to represent each bin by its lower limit (or its upper limit if you prefer) and to use value labels customised for the purpose.


    Code:
    clear
    set seed 2803
    set obs 100
    gen age = runiformint(0, 99)
    
    gen age2 = 5 * floor(age/5)
    forval x = 0(5)95 {
        local X = `x' + 4
        label def age2 `x' "`x'-`X'", add
    }
    
    label val age2 age2
    
    tab age2
    
    
    
    
           age2 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
            0-4 |          3        3.00        3.00
            5-9 |          4        4.00        7.00
          10-14 |          6        6.00       13.00
          20-24 |         11       11.00       24.00
          25-29 |          6        6.00       30.00
          30-34 |         11       11.00       41.00
          35-39 |          2        2.00       43.00
          40-44 |          5        5.00       48.00
          45-49 |          6        6.00       54.00
          50-54 |          1        1.00       55.00
          55-59 |          5        5.00       60.00
          60-64 |          4        4.00       64.00
          65-69 |          3        3.00       67.00
          70-74 |          4        4.00       71.00
          75-79 |          6        6.00       77.00
          80-84 |          5        5.00       82.00
          85-89 |          6        6.00       88.00
          90-94 |          4        4.00       92.00
          95-99 |          8        8.00      100.00
    ------------+-----------------------------------
          Total |        100      100.00
    The term format is heavily overloaded in computing. In Stata display format and file format might be the leading senses. The term is best not used for

    variable or storage type (e.g. float or double)

    whether values are held as integers or have decimal parts (which would be how I would use the term "continuous")

    whether values are original data, or binned, classed, grouped or rounded in some sense.
    Last edited by Nick Cox; 05 Jul 2023, 00:25.

    Comment


    • #3
      You can also explore the cut() function of the -egen- command.

      Comment


      • #4
        Hemanshu Kumar made an interesting suggestion. In practice I never use this cut() function because

        1. To know what it does at bin boundaries -- do values get binned upwards or downwards -- you have to consult the documentation or the code. This may seem a weak objection -- surely in the limit it applies to any method! -- but it's important for most users of Stata (unless they use it so often that they are familiar with what it does) and even more important for non-users of Stata looking at a script and wondering about replication in some other language. In contrast, floor and ceiling functions are, or at least should be, widely known across applied mathematical sciences and can be Googled.

        2. What it can do at the upper end is documented but not what I ever want. If a value is above the upper bin limit specified it is binned to missing! Consider this example: the maximum value of mpg is 41 but specifying lower limits for bins of 0 10 20 30 40 is not sufficient for all values to be binned, even though otherwise cut() advertises that bins are defined by their lower limits. In other words to get 41 binned to 40 I need to specify a bin limit (say 50) that I know will never be used!

        Code:
        . sysuse auto, clear
        (1978 automobile data)
        
        . egen mpg2 = cut(mpg), at(0(10)40)
        (1 missing value generated)
        
        
        . tab mpg mpg2, missing
        
           Mileage |                    mpg2
             (mpg) |        10         20         30          . |     Total
        -----------+--------------------------------------------+----------
                12 |         2          0          0          0 |         2
                14 |         6          0          0          0 |         6
                15 |         2          0          0          0 |         2
                16 |         4          0          0          0 |         4
                17 |         4          0          0          0 |         4
                18 |         9          0          0          0 |         9
                19 |         8          0          0          0 |         8
                20 |         0          3          0          0 |         3
                21 |         0          5          0          0 |         5
                22 |         0          5          0          0 |         5
                23 |         0          3          0          0 |         3
                24 |         0          4          0          0 |         4
                25 |         0          5          0          0 |         5
                26 |         0          3          0          0 |         3
                28 |         0          3          0          0 |         3
                29 |         0          1          0          0 |         1
                30 |         0          0          2          0 |         2
                31 |         0          0          1          0 |         1
                34 |         0          0          1          0 |         1
                35 |         0          0          2          0 |         2
                41 |         0          0          0          1 |         1
        -----------+--------------------------------------------+----------
             Total |        35         32          6          1 |        74
        3. cut() can be used for irregular bins (those with varying width) which may seem a feature, especially if they are standard in your field for some perverse or idiosyncratic reason. Researchers make choices, and coders set examples. I always want to encourage people (starting with my students or colleagues) to use regular bins (which could be regular on some transformed scale, say logarithm) and I don't want to encourage anyone to use arbitrary bins. Indeed, I want to discourage binning unless it helps.

        Comment


        • #5
          My doubt is clear, thank you for this help

          Comment

          Working...
          X