Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • If command not working

    Hello everyone,

    I have a dataset with airline carriers. I want to write an if command that runs if the name of the airline carrier (string variable) on the row is the same as the name of the airline carrier six rows above it. After much trying, I have come to realize that you cannot do an if command to compare strings (at least I do not think it can be done). However, I believe you can generate an indicator variable to see if the names are the same, and write an if command based on that indicator variable. However, when I do this, the if command does not execute when it is supposed to. Below is my code with more explanation after.

    Code:
    gen stderr = 0
    gen name1 = unique_carrier_name 
    gen name2 = unique_carrier_name[_n-6] 
    gen firmcheck = name1 != name2   
    
    if firmcheck == 0 { 
        replace stderr = 1
    }
    The first line creates my variable stderr which may be changed by the if statement. I then create name1 and name2 to make sure when browsing the data that I am looking at the correct names. Then "gen firmcheck" outputs a zero if the names are the same (it does this correctly when I browse the data). However, the if command never runs, even if the firmcheck variable is 0 when I browse. I understand this function may be easier to do with a replace stderr = 0 if [expression] however I have simplified the if commands within the brackets, it will do a lot more. I just need to get the if command to do what it is supposed to.

    Thanks for all of the help,
    -David

  • #2
    David,

    Actually you can compare strings with an if command. However, I think what you actually want is:

    Code:
    replace stderr=1 if name1==name2
    You said you were trying to avoid doing this (for reasons that are not obvious), but this is really the only way. The if command, be default, only looks at the first observation, which is not what you want. The if command is typically reserved for evaluating conditional expressions involving macros.

    Regards,
    Joe

    Comment


    • #3
      In addition to Joe's good advice, note documentation I have an if or while command in my program that only seems to evaluate the first observation. What’s going on?

      http://www.stata.com/support/faqs/pr...-if-qualifier/


      I have to say that in my experience only rarely do users' questions match the title of this FAQ. It's more commonly something like What's going on with my if command?

      Often the user has an identity as a past or present SAS user.....

      Comment


      • #4
        Originally posted by DTHORNBLAD View Post
        I have a dataset with airline carriers. I want to write an if command that runs if the name of the airline carrier (string variable) on the row is the same as the name of the airline carrier six rows above it.
        Why do you need name1 and name2?

        Code:
        replace stderr=1 if unique_carrier_name==unique_carrier_name[_n-6]
        For everyone starting with Stata, understand that the following:
        Code:
         
         if (firmcheck == 0) {   do something }
        means "do something if the value of variable firmcheck in the first observation is equal to zero". The action may or may not have anything to do with the first or any observation at all. For example, file can be saved or Stata can be closed. Best, Sergiy Radyakin

        Comment


        • #5
          Wow, thanks for the quick replies everyone. I did not know that Stata only looks at the first observation with an if command, perhaps that is why comparing strings was not working well.

          Joe: I was not trying to do that since I have to do a lot of commands within an if. First I need to run a regression on only the last 6 observations within an airline, and then I need to get the stderror of one of the regression's beta coefficients. I thought this multi-lined code would best nested within an if command.
          Nick: Yes, I am coming from SAS. Good call.
          Sergiy: I knew I could do that, but I made the name1 and name2 variables to make sure they were doing the correct thing when I browsed my data since when I used an if statement to compare strings it was not working.

          Do you have any advice on how to loop over observations so that I can generate a unique standard error for each observation based on the last 6 observations assuming it is the same airline? Previously I tried the following (yes I know this is not how you calculate standard error's, I just want to understand the functioning of how to loop):
          Code:
          local N = _N
          forvalues i = 1/`N' {
          if unique_carrier_name == unique_carrier_name[_n-6] {
              replace name1 = unique_carrier_name
              replace name2 = unique_carrier_name[_n-6]
          replace stderr = 1
          }
          However this ended up replacing all name1 and name2 with the last name of the airline in my dataset. I figured it was rewriting each replace function when it went through the last observation.

          Comment


          • #6
            I wrote the code wrong above, the _n should be replaced with `i', as shown below.

            Code:
            local N = _N
            forvalues i = 1/`N' {
            if unique_carrier_name[`i'] == unique_carrier_name[`i'-6] {
                replace name1 = unique_carrier_name[`i']
                replace name2 = unique_carrier_name[`i'-6]
            replace stderr = 1
            }
            }

            Comment


            • #7
              Absolutely, don't even think of looping here. Rather, learn today about the by: command as a framework. In addition to the usual help and manual entries, there is a discursive tutorial at http://www.stata-journal.com/sjpdf.h...iclenum=pr0004

              Assuming a panel structure of identifier and date, the last 6 observations for each airline are indicated by

              Code:
              bysort id (date) : gen inlast6 = (_N - _n) < 6
              Then you can run regressions on each airline by

              Code:
              keep if inlast6 
              statsby <stuff>, by(id): regress <something>
              All that said, my statistical persona feels that 6 observations is not much on which to base a regression.

              Comment


              • #8
                Thank you Nick. Yes it is a small N, but looking at the volatility of a airline's revenues is how a uncertainty construct is operationalized in my field. Thanks.

                Comment


                • #9
                  Regarding first observation: when you write
                  Code:
                  if (vector==scalar)
                  what should Stata otherwise do? Remember that variables in Stata are columns of numbers, and you are comparing it to a scalar (a number or a string).

                  Things are different when if is used not as a command, but as a qualifier. You absolutely need to understand this FAQ, as well as carefully read the help for each of them. Doing so before asking more questions would help us communicate at the same language.
                  http://www.stata.com/help.cgi?if
                  http://www.stata.com/manuals13/pif.pdf
                  http://www.ats.ucla.edu/stat/stata/modules/if.htm

                  Both Joe and myself recommended you to use the qualifier, because this is what is appropriate in your case.
                  You insist on if as a command, which makes the program tremendously inefficient. Explicit looping on observations like the one you quote above should be avoided at all costs in Stata. I think it is a legacy of SAS's DATASTEP. Joe may explain more on how to adjust to Stata's world, since he also has SAS's background.

                  Best, Sergiy Radyakin

                  Comment


                  • #10
                    Thank you Sergiy and Nick. I think I am beginning to understand the use of if as a command vs. a qualifier. One last question, hopefully.
                    Nick, you code seems to only regress the data if it is the airline's last six years of data, is that correct? What I am seeking is a code that iteratively obtains the standard error for the previous six years (assuming six years of data). For example, in 2000 I want to get the standard error for the six years of 1995-2000. In 2001 it would be 1996-2001, ect. Do I not need a loop for this? the by command would seem, from my understanding and reading of it, give me a result per by-group, not per observation.

                    -David

                    Comment


                    • #11
                      You mean the previous 6 observations....

                      Check out rolling.

                      Comment

                      Working...
                      X