How to filter data?

Irina Z

Join Date: Apr 2015

Posts: 4
#1

How to filter data?

06 Jul 2015, 15:04

I am working with airline data. Each observation in the data is a ticket, which provides the route, carrier, price, among other things.
I want to capture and keep all tickets on routes that were serviced by a certain airline (let's say United).
So if United services a certain route, let's say LAX-SFO, then I want to keep all tickets from all airlines from LAX-SFO.
And then I want to discard all tickets from routes that aren't serviced by United.
How can I do so?
Tags: None
E. David Aja

Join Date: Mar 2014

Posts: 58
#2

06 Jul 2015, 15:15

see the help for if.
e.g.

Code:

help if keep if route=="LAX-SFO" & airline=="United"
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#3

06 Jul 2015, 15:21

But I don't think that she wants to discard the other airlines. Something more like:

Code:

gen trash=1 if airline="United" egen goodroute=max(trash), by route keep if goodroute==1
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

08 Jul 2015, 00:08

Irina (as per FAQ, please note the strong preference on this forum for family name, too. Thanks).
As far as your query is concerned, I share the previous suggestions and propose something more minimal, that makes dropping variables unnecessary:

Code:

sum price if airline=="United"//this chunk of code assumes that you are willing to perform descriptive statistics on the price of the tickets for United, which is in -string- format in your dataset

Kind regards,
Carlo
(Stata 19.0)
Comment
Richard Blades

Join Date: Dec 2023

Posts: 1
#5

13 Dec 2023, 07:53

I have a similar problem, and the suggestion above hasn't fixed it.

I have used the splitsample command to create two sub-samples (in-sample and out-of-sample). I then have some code that creates and maximizes a maximum likelihood function, but I only want to do this for the in-sample. I then want to go back to the entire sample and run the predict command to create a forecast for both samples. I'm doing the same thing for a GMM model as well.

The if command doesn't work with ml and gmm and I don't want to use drop/keep because I want to temporarily suppress the out-of-sample observations before "unsuppressing" them. Its what the SPSS filter command would do.

Any ideas how to do this? Thanks
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#6

13 Dec 2023, 08:33

There is a difference between the if command and the -if- qualifier, see https://journals.sagepub.com/doi/abs...urnalCode=stja. What is discussed here is the latter. If you are talking about the gmm command in Stata, it allows the -if- qualifier, see the syntax below.

Syntax

Interactive version

gmm ([reqname1:]rexp_1) ([reqname2:]rexp_2) ... [if] [in] [weight] [, options]

Moment-evaluator program version

gmm moment_prog [if] [in] [weight], {equations(namelist)|nequations(#)} {parameters(namelist)|nparameters(#)} [options] [program_options]

You should probably start a new thread and properly explain your issue, preferably showing Stata code and output. If possible, enclose a reproducible example of your problem. See FAQ Advice #12 for advice on posting.
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#7

13 Dec 2023, 08:42

Something like this might work, assuming all United routes are in the sample.

Code:

g dunited = carrier=="United" //market all united observations egen unitedroute = max(dunited), by(route) // variable marking any route that United served in sample keep if unitedroute

could do for all carriers and then "if" your estimates by the dummy

Last edited by George Ford; 13 Dec 2023, 08:45.
Comment

Announcement

How to filter data?

Comment

Comment

Comment

Comment

Comment

Comment