Keeping different observations with different ranges

Beste KAYGISIZ

Join Date: Jul 2018

Posts: 7
#1

Keeping different observations with different ranges

13 Jul 2018, 12:11

Hi Everyone,

I am super new to Stata like 2 weeks, and I have a pretty big data set in which I need to keep different observations defined in different ranges. I was wondering how can I do it in one time but not every single one of them separately. Because if I use the keep command everything else is getting dropped and I do not want to start again for each observation. The data set looks like this

id . num_date sales
1 . 179
1 . 180
1 . 181
1 . 182
1 . 183
1 . 184
2 . 179
2 . 180
2 . 181
2 . 182

So I wanna keep the observations with num_date 180-184 for id 1, but 179-181 for the id 2. Is there a loop or anything I can create to make everything in one go? If not what is the most convenient way of doing it?

Thanks

Last edited by Beste KAYGISIZ; 13 Jul 2018, 12:14.
Tags: None
Bruce Weaver

Join Date: May 2014

Posts: 1133
#2

13 Jul 2018, 13:15

Hello Beste. There would have to be a general rule that applies the same way for all values of id to achieve this with one command. Is there such a general rule? If so, I can't work it out from what you've said in #1.

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Beste KAYGISIZ

Join Date: Jul 2018

Posts: 7
#3

13 Jul 2018, 13:27

General rule? You mean something like another variable that is consistent across the board for all observations?
Comment
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#4

13 Jul 2018, 13:29

Without some sort of rule suggested by Bruce in #2, you can issue the drop command instead of keep:

Code:

drop if inrange(num_date, 180, 184) & id==1 drop if inrange(num_date, 179, 181) & id==2

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#5

13 Jul 2018, 13:32

By rule, I mean I would need to know which values of num_date you want to drop for each id. Is it based on the values of another variable? Is it particular sequence based on a formula?

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1133
#6

13 Jul 2018, 14:49

Thanks for jumping in, Carole. I think you had a clearer grasp of the original question than I did. Using drop rather than keep will indeed solve the problem.

In #1, Beste said:

So I wanna keep the observations with num_date 180-184 for id 1, but 179-181 for the id 2.

So I think you need !inrange in #4, like this:

Code:

drop if !inrange(num_date, 180, 184) & id==1 drop if !inrange(num_date, 179, 181) & id==2

Beste: An exclamation mark (!) is the symbol for 'not' in Stata code.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
1 like
Comment
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#7

13 Jul 2018, 15:24

Bruce is correct—the “!” is needed before inrange().

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
Beste KAYGISIZ

Join Date: Jul 2018

Posts: 7
#8

14 Jul 2018, 00:11

Thank you all for the replies, I have figured out the missing ! , so no worries. The only rule is the range is going be 4 for each, like 175- 179, 182-186; however, there is no rule when it comes to what determines the starting number. The last numbers in each range correspond to patent expiration dates of certain drugs, hence there is no rule, unfortunately. Thus what I am wondering is for such cases do we need to write everything one by one for each id?

Is there a way I can log in the patent expiration dates in order like 179(for id1), 186(for id2), and so on for each of the ids and Stata will directly apply the ranges (t-4,t) to its corresponding id's and drop everything else?

Last edited by Beste KAYGISIZ; 14 Jul 2018, 00:30.
Comment
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#9

14 Jul 2018, 07:22

Yes, it is likely that you can use that information to get what you want in a few lines. Please include a sample of your data with the patent date info using the dataex command (see the 12.2 of the FAQ).

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
Beste KAYGISIZ

Join Date: Jul 2018

Posts: 7
#10

18 Jul 2018, 12:38

product expiredate launchdate quarter num_date std_unit_collapsed
CAMPRAL 07/01/2009 12/01/2004 2008 194 1.16e+07
CAMPRAL 07/01/2009 12/01/2004 2008 195 1.07e+07
CAMPRAL 07/01/2009 12/01/2004 2009 196 1.06e+07
CAMPRAL 07/01/2009 12/01/2004 2009 197 1.01e+07
CAMPRAL 07/01/2009 12/01/2004 2009 198 9774540
CAMPRAL 07/01/2009 12/01/2004 2009 199 9193120
CAMPRAL 07/01/2009 12/01/2004 2010 200 9127080
CAMPRAL 07/01/2009 12/01/2004 2010 201 8730160
CAMPRAL 07/01/2009 12/01/2004 2010 202 8366260
CAMPRAL 07/01/2009 12/01/2004 2010 203 8074960
CAMPRAL 07/01/2009 12/01/2004 2011 204 8252240
CAMPRAL 07/01/2009 12/01/2004 2011 205 7518680
CAMPRAL 07/01/2009 12/01/2004 2011 206 7932660
CAMPRAL 07/01/2009 12/01/2004 2011 207 7063880
CAMPRAL 07/01/2009 12/01/2004 2012 208 6484240
CAMPRAL 07/01/2009 12/01/2004 2012 209 5996180
CAMPRAL 07/01/2009 12/01/2004 2012 210 5950620
CAMPRAL 07/01/2009 12/01/2004 2012 211 5809860
CAMPRAL 07/01/2009 12/01/2004 2013 212 5889420
CAMPRAL 07/01/2009 12/01/2004 2013 213 5704020
SEMPREX-D 03/01/2008 04/01/1994 2007 188 1154700
SEMPREX-D 03/01/2008 04/01/1994 2007 189 1162500
SEMPREX-D 03/01/2008 04/01/1994 2007 190 1045000
SEMPREX-D 03/01/2008 04/01/1994 2007 191 1004400
SEMPREX-D 03/01/2008 04/01/1994 2008 192 1036500
SEMPREX-D 03/01/2008 04/01/1994 2008 193 1016300
SEMPREX-D 03/01/2008 04/01/1994 2008 194 910000
SEMPREX-D 03/01/2008 04/01/1994 2008 195 944300
SEMPREX-D 03/01/2008 04/01/1994 2009 196 920500
SEMPREX-D 03/01/2008 04/01/1994 2009 197 853100
SEMPREX-D 03/01/2008 04/01/1994 2009 198 795200
SEMPREX-D 03/01/2008 04/01/1994 2009 199 835400
SEMPREX-D 03/01/2008 04/01/1994 2010 200 849500
SEMPREX-D 03/01/2008 04/01/1994 2010 201 844200

Each of the repeated years corresponds to a quarter. Num_date is the codes for each quarters. As you can see Semprex -D has expire date 03/01/2008, which is in 2008q1, thus num_date 192, whereas for Campral it is 2009q3, num_date 198, I want to be able to keep the values in the last for quarters, so for Semprex I want to keep the rows with num_date 189-192, and for Campral 195-198.

I know that the"drop if!" code you guys suggested works; however, I am wondering whether there are other ways, because for the cases in which we have millions of products it would be impossible write for each one of them separately.
Comment
Beste KAYGISIZ

Join Date: Jul 2018

Posts: 7
#11

01 Aug 2018, 09:43

No further suggestions?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#12

01 Aug 2018, 10:12

I expect the reason you did not get further suggestions is because you did not provide your sample data using the dataex command as requested in post #9, and thus there remain many questions about your data.

Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question.

If you are running version 15.1 or a fully updated version 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use dataex.

The more you help others understand your problem, the more likely others are to be able to help you solve your problem.
Comment

Announcement

Keeping different observations with different ranges

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment