Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keeping different observations with different ranges

    Hi Everyone,

    I am super new to Stata like 2 weeks, and I have a pretty big data set in which I need to keep different observations defined in different ranges. I was wondering how can I do it in one time but not every single one of them separately. Because if I use the keep command everything else is getting dropped and I do not want to start again for each observation. The data set looks like this

    id . num_date sales
    1 . 179
    1 . 180
    1 . 181
    1 . 182
    1 . 183
    1 . 184
    2 . 179
    2 . 180
    2 . 181
    2 . 182

    So I wanna keep the observations with num_date 180-184 for id 1, but 179-181 for the id 2. Is there a loop or anything I can create to make everything in one go? If not what is the most convenient way of doing it?

    Thanks
    Last edited by Beste KAYGISIZ; 13 Jul 2018, 13:14.

  • #2
    Hello Beste. There would have to be a general rule that applies the same way for all values of id to achieve this with one command. Is there such a general rule? If so, I can't work it out from what you've said in #1.

    HTH.
    --
    Bruce Weaver
    Email: bweaver@lakeheadu.ca
    Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
    Stata version: 15.1 IC (Windows)

    Comment


    • #3
      General rule? You mean something like another variable that is consistent across the board for all observations?

      Comment


      • #4
        Without some sort of rule suggested by Bruce in #2, you can issue the drop command instead of keep:

        Code:
        drop if inrange(num_date, 180, 184) & id==1
        drop if inrange(num_date, 179, 181) & id==2
        Stata/MP 14.1 (64-bit x86-64)
        Revision 19 May 2016
        Win 8.1

        Comment


        • #5
          By rule, I mean I would need to know which values of num_date you want to drop for each id. Is it based on the values of another variable? Is it particular sequence based on a formula?
          Stata/MP 14.1 (64-bit x86-64)
          Revision 19 May 2016
          Win 8.1

          Comment


          • #6
            Thanks for jumping in, Carole. I think you had a clearer grasp of the original question than I did. Using drop rather than keep will indeed solve the problem.

            In #1, Beste said:
            So I wanna keep the observations with num_date 180-184 for id 1, but 179-181 for the id 2.
            So I think you need !inrange in #4, like this:

            Code:
            drop if !inrange(num_date, 180, 184) & id==1
            drop if !inrange(num_date, 179, 181) & id==2
            Beste: An exclamation mark (!) is the symbol for 'not' in Stata code.
            --
            Bruce Weaver
            Email: bweaver@lakeheadu.ca
            Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
            Stata version: 15.1 IC (Windows)

            Comment


            • #7
              Bruce is correct—the “!” is needed before inrange().
              Stata/MP 14.1 (64-bit x86-64)
              Revision 19 May 2016
              Win 8.1

              Comment


              • #8
                Thank you all for the replies, I have figured out the missing ! , so no worries. The only rule is the range is going be 4 for each, like 175- 179, 182-186; however, there is no rule when it comes to what determines the starting number. The last numbers in each range correspond to patent expiration dates of certain drugs, hence there is no rule, unfortunately. Thus what I am wondering is for such cases do we need to write everything one by one for each id?

                Is there a way I can log in the patent expiration dates in order like 179(for id1), 186(for id2), and so on for each of the ids and Stata will directly apply the ranges (t-4,t) to its corresponding id's and drop everything else?
                Last edited by Beste KAYGISIZ; 14 Jul 2018, 01:30.

                Comment


                • #9
                  Yes, it is likely that you can use that information to get what you want in a few lines. Please include a sample of your data with the patent date info using the dataex command (see the 12.2 of the FAQ).
                  Stata/MP 14.1 (64-bit x86-64)
                  Revision 19 May 2016
                  Win 8.1

                  Comment


                  • #10
                    product expiredate launchdate quarter num_date std_unit_collapsed
                    CAMPRAL 07/01/2009 12/01/2004 2008 194 1.16e+07
                    CAMPRAL 07/01/2009 12/01/2004 2008 195 1.07e+07
                    CAMPRAL 07/01/2009 12/01/2004 2009 196 1.06e+07
                    CAMPRAL 07/01/2009 12/01/2004 2009 197 1.01e+07
                    CAMPRAL 07/01/2009 12/01/2004 2009 198 9774540
                    CAMPRAL 07/01/2009 12/01/2004 2009 199 9193120
                    CAMPRAL 07/01/2009 12/01/2004 2010 200 9127080
                    CAMPRAL 07/01/2009 12/01/2004 2010 201 8730160
                    CAMPRAL 07/01/2009 12/01/2004 2010 202 8366260
                    CAMPRAL 07/01/2009 12/01/2004 2010 203 8074960
                    CAMPRAL 07/01/2009 12/01/2004 2011 204 8252240
                    CAMPRAL 07/01/2009 12/01/2004 2011 205 7518680
                    CAMPRAL 07/01/2009 12/01/2004 2011 206 7932660
                    CAMPRAL 07/01/2009 12/01/2004 2011 207 7063880
                    CAMPRAL 07/01/2009 12/01/2004 2012 208 6484240
                    CAMPRAL 07/01/2009 12/01/2004 2012 209 5996180
                    CAMPRAL 07/01/2009 12/01/2004 2012 210 5950620
                    CAMPRAL 07/01/2009 12/01/2004 2012 211 5809860
                    CAMPRAL 07/01/2009 12/01/2004 2013 212 5889420
                    CAMPRAL 07/01/2009 12/01/2004 2013 213 5704020
                    SEMPREX-D 03/01/2008 04/01/1994 2007 188 1154700
                    SEMPREX-D 03/01/2008 04/01/1994 2007 189 1162500
                    SEMPREX-D 03/01/2008 04/01/1994 2007 190 1045000
                    SEMPREX-D 03/01/2008 04/01/1994 2007 191 1004400
                    SEMPREX-D 03/01/2008 04/01/1994 2008 192 1036500
                    SEMPREX-D 03/01/2008 04/01/1994 2008 193 1016300
                    SEMPREX-D 03/01/2008 04/01/1994 2008 194 910000
                    SEMPREX-D 03/01/2008 04/01/1994 2008 195 944300
                    SEMPREX-D 03/01/2008 04/01/1994 2009 196 920500
                    SEMPREX-D 03/01/2008 04/01/1994 2009 197 853100
                    SEMPREX-D 03/01/2008 04/01/1994 2009 198 795200
                    SEMPREX-D 03/01/2008 04/01/1994 2009 199 835400
                    SEMPREX-D 03/01/2008 04/01/1994 2010 200 849500
                    SEMPREX-D 03/01/2008 04/01/1994 2010 201 844200

                    Each of the repeated years corresponds to a quarter. Num_date is the codes for each quarters. As you can see Semprex -D has expire date 03/01/2008, which is in 2008q1, thus num_date 192, whereas for Campral it is 2009q3, num_date 198, I want to be able to keep the values in the last for quarters, so for Semprex I want to keep the rows with num_date 189-192, and for Campral 195-198.

                    I know that the"drop if!" code you guys suggested works; however, I am wondering whether there are other ways, because for the cases in which we have millions of products it would be impossible write for each one of them separately.

                    Comment


                    • #11
                      No further suggestions?

                      Comment


                      • #12
                        I expect the reason you did not get further suggestions is because you did not provide your sample data using the dataex command as requested in post #9, and thus there remain many questions about your data.

                        Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question.

                        If you are running version 15.1 or a fully updated version 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                        When asking for help with code, always show example data. When showing example data, always use dataex.

                        The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

                        Comment

                        Working...
                        X