Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculations with if statements

    Hi,

    If this is my data-
    day totalpay hours
    Sunday 500 5
    Monday 750 50
    Tuesday 325 25
    Wednesday 900 100
    And I'm trying to create a new variable "dayavg" where "dayavg = totalpay/hours" on Monday, Tuesday and Wednesday, else "dayavg = totalpay/50."

    This is the code I'm using -

    Code:
    encode totalpay, generate(totalpay2)
    encode hours, generate(hours2)
    
    gen dayavg = 0
    replace dayavg = totalpay2/hours2 if day == "Monday" | day == "Tuesday" | day == "Wednesday"
    else replace dayavg = totalpay2/50
    And this is the output I'm getting-
    day totalpay hours totalpay2 hours2 dayavg
    Sunday 500 5 500 5 0.04
    Monday 750 50 750 50 0.06
    Tuesday 325 25 325 25 0.02
    Wednesday 900 100 900 100 0.08
    Where am I going wrong?




  • #2
    You're going wrong in a lot of places.

    Apparently you are starting from string variables that look like numbers. The correct way to convert these to numbers is with the -tostring- command. That converts string variables that look like numbers into their actual numeric values. -encode- does something completely different and leads to nonsensical results like the ones you are getting.

    Next, the -if...else-- syntax you are using also incorrect, both syntactically and semantically. It is syntactically incorrect because what you have is a hybrid of the -if- condition (which follows a command) and the -else- command, which is used in conjunction with the -if- command--not the -if- condition. While you can accomplish what you want with a -gen ... if...- and -replace ... if ...- syntax, you can more compactly do this with a single command using the -cond()- function:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str9 day str3(totalpay hours)
    "Sunday"    "500" "5"  
    "Monday"    "750" "50"
    "Tuesday"   "325" "25"
    "Wednesday" "900" "100"
    end
    
    
    destring totalpay hours, replace
    
    gen dayavg = totalpay/cond(inlist(day, "Monday", "Tuesday", "Wednesday"), hours, 50)
    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.


    Added: https://www.stata-journal.com/articl...article=dm0099 has a good explanation of the -encode- and -destring- commands and what sorts of situations each is appropriate for.
    Last edited by Clyde Schechter; 02 May 2019, 21:56.

    Comment


    • #3
      Thank you so much! The explanation is extremely clear and well detailed! I will use -dataex- next time.

      The code worked perfectly! Also for the if condition, if I have more than 20 options for the variable 'days' due to spelling errors, how would the code change?
      I'm currently getting "expression too long r(130);" error!


      Comment


      • #4
        Simply, I would fix the spelling errors first. Otherwise any amount of code downstream of here will need to build in complications. If you show us the results of

        Code:
        tab day
        then it may well be that there is fairly short code to fix the errors. For example, although there might be various small variations on Friday, most of them will start with F, or so I hope, so

        Code:
        replace day = "Friday" if substr(day, 1, 1) == "F"
        might help. Also trim any leading or trailing spaces first:

        Code:
        replace day = trim(day)

        Comment


        • #5
          Thank you so much! I have Tuesday and Thursday in my dataset, with spellings like Taesady and Taursday. Otherwise it's pretty straightforward. What should I do to deal cases such as this?

          Comment


          • #6
            I didn't promise it would all be easy. Some changes will apply to single observations. Once it's done it's done.

            Comment


            • #7
              Thank you for taking time to help me out!

              Comment

              Working...
              X