Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Strange Results When Dividing Numeric Values

    Dear Statlist community,

    I faced a strange issue working with the following dataset when I changed a string variable to numeric and wanted to create a new variable by dividing: let's look at my codes and data below. The problem is that my new variable (visit_ratio) is simply wrong, where in fact, it should be a simple arithmetic. Any help would be appreciated.

    Code:
    *count number of observation per year and state
    bysort year state_name : gen visit = _n
    
    *create a numerical value for number of physican
    encode Number_of_Active_Physicians, gen(num_phy)
    
    *create a ratio
    gen visit_ratio = visit/num_phy
    and here is the data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year str20 state_name str27 Number_of_Active_Physicians
    2000 "alaska" "1,764"
    2013 "alaska" "1,764"
    2013 "alaska" "1,764"
    2014 "alaska" "1,764"
    2014 "alaska" "1,764"
    2014 "alaska" "1,764"
    2014 "alaska" "1,764"
    2015 "alaska" "1,764"
    2017 "alaska" "1,764"
    2017 "alaska" "1,764"
    end
    The visit ratio should be in decimal format, and I got confused why I cannot obtain the right results. Thanks.

  • #2
    The issue is that encode creates a labelled categorical variable. So the data shows as having the label "1,764" but has the value "1", which you can confirm by running:

    Code:
    tab num_phy, nolab
    To resolve, replace the second line with:

    Code:
    destring Number_of_Active_Physicians, gen(num_phy)

    Comment


    • #3
      Originally posted by Mike Murphy View Post
      The issue is that encode creates a labelled categorical variable. So the data shows as having the label "1,764" but has the value "1", which you can confirm by running:

      Code:
      tab num_phy, nolab
      To resolve, replace the second line with:

      Code:
      destring Number_of_Active_Physicians, gen(num_phy)
      Thanks for your response Mike, but running your codes gives me the following error message:

      "Number_of_Active_Physicians: contains nonnumeric characters; no generate"

      What do you suggest I should do? Thanks.

      Comment


      • #4
        Ah, sorry destring can't handle the comma. Just add:

        Code:
        replace Number_of_Active_Physicians = subinstr(Number_of_Active_Physicians,",","",.)
        in the line above the destring command

        Comment


        • #5
          Note as well that _n gives you a running number for the order of observations. I'm not sure exactly what you're calculating, but I suspect you want to use _N or collapse the data in some way

          Comment


          • #6
            Originally posted by Mike Murphy View Post
            Note as well that _n gives you a running number for the order of observations. I'm not sure exactly what you're calculating, but I suspect you want to use _N or collapse the data in some way
            Thank you so much Mike, the issue is resolved, and I appreciate your last comment as well.

            Comment

            Working...
            X