Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • encode strings

    hi, i've some trouble encoding string (months) in numerical variable (i need jan to be 1 and so on).

    i've used this code

    enc month, gen(month_num)
    label define label_mon 1 "Jan" 2 "Feb" 3 "Mar" 4 "Apr" 5 "May" 6 "Jun" 7 "Jul" 8 "Aug" 9 "Sep" 10 "Oct" 11 "Nov" 12 "Dec"
    label values month_num label_mon
    tab month_num

    but it gives me random number, even if i ask to define jan as 1

    could you help me?

  • #2
    if you did the lines in the order shown, -encode- will already have put your data in alpha order prior to your defining the labels; put the label define command prior to the encode command; if that doesn't solve the problem show a data example using -dataex- (see the FAQ)

    Comment


    • #3
      More specifically, type

      Code:
      label define label_mon 1 "Jan" 2 "Feb" 3 "Mar" 4 "Apr" 5 "May" 6 "Jun" 7 "Jul" 8 "Aug" 9 "Sep" 10 "Oct" 11 "Nov" 12 "Dec"
      encode month , generate(month_num) label(label_mon)
      Not crucial to the question but perhaps useful in general, c(Mons) containers the list of months.

      Comment


      • #4
        I prefer Daniel's approach to the following because the resulting numeric variable is created with a value label, but someone might find this approach useful in limited circumstances. It takes advantage of the fact that Stata monthly dates begin at 0 in January 1960. It also is robust to capitalization or its lack. I guess it's also robust to spelling out months other than May in full ... .
        Code:
        . generate month_num = monthly(var1+"1960","MY")+1
        
        . list, clean noobs abbreviate(12)
        
            var1   month_num  
             Jan           1  
             FEB           2  
             mar           3  
             Apr           4  
             May           5  
             Jun           6  
             Jul           7  
             Aug           8  
             Sep           9  
             Oct          10  
             Nov          11  
             Dec          12

        Comment


        • #5
          Originally posted by William Lisowski View Post
          I prefer Daniel's approach to the following because the resulting numeric variable is created with a value label, but someone might find this approach useful in limited circumstances. It takes advantage of the fact that Stata monthly dates begin at 0 in January 1960. It also is robust to capitalization or its lack. I guess it's also robust to spelling out months other than May in full ... .
          Code:
          . generate month_num = monthly(var1+"1960","MY")+1
          
          . list, clean noobs abbreviate(12)
          
          var1 month_num
          Jan 1
          FEB 2
          mar 3
          Apr 4
          May 5
          Jun 6
          Jul 7
          Aug 8
          Sep 9
          Oct 10
          Nov 11
          Dec 12
          ok, this works!! thank you

          Comment


          • #6
            Originally posted by daniel klein View Post
            More specifically, type

            Code:
            label define label_mon 1 "Jan" 2 "Feb" 3 "Mar" 4 "Apr" 5 "May" 6 "Jun" 7 "Jul" 8 "Aug" 9 "Sep" 10 "Oct" 11 "Nov" 12 "Dec"
            encode month , generate(month_num) label(label_mon)
            Not crucial to the question but perhaps useful in general, c(Mons) containers the list of months.
            also with this code I have in output random values for each month
            for example aug is 24 instead of 8 while jul is like 13, but i can't find a linear combination to fix it

            Comment


            • #7
              Indeed. Stata is literal. With those definitions, it won't map jan or JAN or January or JANUARY or anything other than Jan to 1. So, you need to clean up before this approach can possibly work.

              Showing the results of

              Code:
              tab month
              would show you and us the extent of the mess, but it should be that

              Code:
              gen better = proper(month) 
              tab better
              improves all the problems you mention and perhaps some others.

              Comment

              Working...
              X