Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to filter data?

    I am working with airline data. Each observation in the data is a ticket, which provides the route, carrier, price, among other things.
    I want to capture and keep all tickets on routes that were serviced by a certain airline (let's say United).
    So if United services a certain route, let's say LAX-SFO, then I want to keep all tickets from all airlines from LAX-SFO.
    And then I want to discard all tickets from routes that aren't serviced by United.
    How can I do so?

  • #2
    see the help for if.
    e.g.
    Code:
    help if
    keep if route=="LAX-SFO" & airline=="United"

    Comment


    • #3
      But I don't think that she wants to discard the other airlines. Something more like:
      Code:
      gen trash=1 if airline="United"
      egen goodroute=max(trash), by route
      keep if goodroute==1

      Comment


      • #4
        Irina (as per FAQ, please note the strong preference on this forum for family name, too. Thanks).
        As far as your query is concerned, I share the previous suggestions and propose something more minimal, that makes dropping variables unnecessary:
        Code:
        sum price if airline=="United"//this chunk of code assumes that you are willing to perform descriptive statistics on the price of the tickets for United, which is in -string- format in your dataset
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          I have a similar problem, and the suggestion above hasn't fixed it.

          I have used the splitsample command to create two sub-samples (in-sample and out-of-sample). I then have some code that creates and maximizes a maximum likelihood function, but I only want to do this for the in-sample. I then want to go back to the entire sample and run the predict command to create a forecast for both samples. I'm doing the same thing for a GMM model as well.

          The if command doesn't work with ml and gmm and I don't want to use drop/keep because I want to temporarily suppress the out-of-sample observations before "unsuppressing" them. Its what the SPSS filter command would do.

          Any ideas how to do this? Thanks

          Comment


          • #6
            There is a difference between the if command and the -if- qualifier, see https://journals.sagepub.com/doi/abs...urnalCode=stja. What is discussed here is the latter. If you are talking about the gmm command in Stata, it allows the -if- qualifier, see the syntax below.

            Syntax

            Interactive version

            gmm ([reqname1:]rexp_1) ([reqname2:]rexp_2) ... [if] [in] [weight] [, options]


            Moment-evaluator program version

            gmm moment_prog [if] [in] [weight], {equations(namelist)|nequations(#)} {parameters(namelist)|nparameters(#)} [options] [program_options]
            You should probably start a new thread and properly explain your issue, preferably showing Stata code and output. If possible, enclose a reproducible example of your problem. See FAQ Advice #12 for advice on posting.

            Comment


            • #7
              Something like this might work, assuming all United routes are in the sample.
              Code:
              g dunited = carrier=="United"  //market all united observations
              egen unitedroute = max(dunited), by(route)  // variable marking any route that United served in sample
              keep if unitedroute
              could do for all carriers and then "if" your estimates by the dummy
              Last edited by George Ford; 13 Dec 2023, 08:45.

              Comment

              Working...
              X