Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with quarterly data

    Hi all,
    I am sorry about writing that often today. I really appreciate your help!

    The problem is the following: I have a dataset like this:

    id quarter salesmnf Year
    1 2004q1 0 2004
    1 2004q2 0 2004
    1 2004q3 9 2004
    ...
    1 2015 q1 0 2015
    1 2015 q2 9 2015
    1 2015 q3 0 2015
    1 2015 q4 5 2015

    The same structure is repeated over all products. Now, what I would like to do is to find the first positive occurrence for each year and drop the rows with the 0 above the first occurrence, so that for instance the example will result in:

    id quarter salesmnf Year
    1 2004q3 9 2004
    ...

    1 2015 q2 9 2015
    1 2015 q3 0 2015
    1 2015 q4 5 2015

    Notice that this is just for the 0s above the first positive occurrence.

    many thanks again!!!

  • #2
    This is an FAQ: https://www.stata.com/support/faqs/d...t-occurrences/

    The first positive occurrence is at

    Code:
    egen firstpos = min(cond(salesmnf > 0, quarter, .)), by(id Year)
    so that you then seem to want

    Code:
    drop if quarter < firstpos
    I am assuming that you want this done separately for each id, although you don't say so.

    Comment


    • #3
      Many thanks for your prompt reply. Yes I have read the FAQ but actually the problem is that I have data quarterly separated and I would like the minimum within the quarters.
      I have generated a sequence of 1,2,3,4 repeated along all the panel for each product and each year and trying to repeat the procedure suggested in the FAQ but actually it is not working...

      Comment


      • #4
        Sorry, but I don't understand #3. I suggested code in #2. Did you try it? "not working" doesn't allow any help: what code did you try? what happened?

        Please give a data example using dataex as requested (FAQ Advice #12).

        (It seems to me that you want the first quarter with positive sales within each year and for each panel.)

        Comment


        • #5
          Yes, you are right, sorry. I have tried the code in #2 but seems not to do the job I would like to do. I'll try to be clearer, sorry again.
          So, the dataset looks like the following:

          Code:
          input float(idproduct trimestre Year) double salesmnf
          1 176 2004   0
          1 177 2004   0
          1 178 2004   34
          1 179 2004   67
          1 180 2005   789
          1 181 2005   0
          1 182 2005   0
          1 183 2005   0
          1 184 2006   0
          1 185 2006   0
          1 186 2006   35466
          1 187 2006   0
          1 188 2007   0
          1 189 2007   0
          1 190 2007   0
          1 191 2007   0
          1 192 2008   3245
          1 193 2008   0
          1 194 2008   0
          1 195 2008   0
          1 196 2009   0
          1 197 2009   0
          1 198 2009   0
          1 199 2009   0
          1 200 2010   0
          1 201 2010   0
          1 202 2010   0
          1 203 2010   0
          1 204 2011   0
          1 205 2011   0
          1 206 2011   0
          1 207 2011   32324
          1 208 2012   0
          1 209 2012   0
          1 210 2012   0
          1 211 2012   34556
          1 212 2013   0
          1 213 2013   0
          1 214 2013   0
          1 215 2013   0
          1 216 2014   0
          1 217 2014   9875
          1 218 2014   0
          1 219 2014   0
          1 220 2015   0
          1 221 2015   0
          1 222 2015 395
          1 223 2015   0
          2 176 2004   0
          2 177 2004   0
          2 178 2004   0
          2 179 2004   0
          2 180 2005   0
          2 181 2005   34678
          2 182 2005   908
          2 183 2005   567
          2 184 2006   0
          2 185 2006   0
          2 186 2006   0
          2 187 2006   0
          2 188 2007   0
          2 189 2007   0
          2 190 2007   0
          2 191 2007   0
          2 192 2008   0
          2 193 2008   0
          2 194 2008   0
          2 195 2008   0
          2 196 2009   0
          2 197 2009   0
          2 198 2009   575
          2 199 2009   0
          2 200 2010   0
          2 201 2010   0
          2 202 2010   0
          2 203 2010   0
          2 204 2011   2
          2 205 2011   0
          2 206 2011   0
          2 207 2011   0
          2 208 2012  12
          2 209 2012   0
          2 210 2012   0
          2 211 2012   0
          2 212 2013   0
          2 213 2013   0
          2 214 2013   0
          2 215 2013   0
          2 216 2014   0
          2 217 2014   0
          2 218 2014   0
          2 219 2014   0
          2 220 2015   5
          2 221 2015   3456
          2 222 2015   958
          2 223 2015   0
          2 176 2004   0
          2 177 2004   0
          2 178 2004   0
          2 179 2004   0
          where trimestre is the quarter and is in the format %tq. What I would like to do i the following: as you can see each product is observed in more years. Specifically from 2004 to 2015. Each year is divided into 4 quarters. My aim is:
          1) for each slot of 4 quarters in a year, look for the first positive occurrence of salesmnf Then: if all sales are positive for all the 4 quarters do nothing. If instead the positive occurrence is in the 2nd, 3rd or 4th quarter, drop the rows above the first positive occurrence.
          As an example, iff I have:
          id quarter Year sales
          2 2005q1 2005 0
          2 2005q2 2005 0
          2 2005q3 2005 3
          2 2005q4 2005 4

          I would like this to turn into:

          id quarter Year sales
          2 2005q3 2005 3
          2 2005q4 2005 4

          2) As a second step I would like to aggregate by year but performing a mean whenever I dropped the rows?
          Now what I would like to do is the following:
          whenever I have four quarters, simply sum the sales together. Whenever I have less than 4 quarters (as above we dropped some of the rows), do the mean with the remaining rows. So if for instance I have:
          id quarter Year sales
          2 2005q3 2005 4
          2 2005q4 2005 5

          The result should be: yearly_sales = 5+4/2...

          Thanks again...sorry but I am quite new in STATA...
          Last edited by Federico Nutarelli; 31 Oct 2018, 03:27.

          Comment


          • #6
            P.s. for the task 1) take for instance also the year 2009. Here I have:
            2 196 2009 0
            2 197 2009 0
            2 198 2009 575
            2 199 2009 0

            For this year the first positive occurrence is in 198 (which I suppose is the encoding for 2009q3). Therefore, what I would like to obtain is:

            2 198 2009 575
            2 199 2009 0

            Hence, in task 2) this should result in year_sales = 575/2.

            Or, again as another example:

            2 220 2009 5
            2 221 2009 3456
            2 222 2009 958
            2 223 2009 0
            Here, I do not have 0 before the first positive occurrence hence we do nothing and in task 2 the aggregation will be simply the sum of the components: yearly_sales = 5+3456+958+0. The mean in the aggregation should be taken only whereby rows have been dropped.
            Finally, if I have a complete year wit 0:
            2 220 2013 0
            2 221 2013 0
            2 222 2013 0
            2 223 2013 0

            I would like to leave it (because no first positive occurrence happens) and in step 2) this will result in yearly_sales = 0.
            Code in #2 had the following problems to me:
            a) sometimes it produced a position of first positive occurrence whenever there were not positive occurrences; for example, it gave me:
            firstpos idproduct quarter Year salesmnf
            194 1 2009q1 2009 0
            194 1 2009q2 2009 0
            194 1 2009q3 2009 0
            194 1 2009q4 2009 0

            2) once I drop the rows, it made confusion with trimestre and Year (for example it appeared twice the year 2008 with the 4 quarters for product 1 and the second occurrence went right after year 2011...)

            Hope to have been clear enough...many thanks again!
            Last edited by Federico Nutarelli; 31 Oct 2018, 03:54.

            Comment


            • #7
              Originally posted by Federico Nutarelli View Post

              2) once I drop the rows, it made confusion with trimestre and Year (for example it appeared twice the year 2008 with the 4 quarters for product 1 and the second occurrence went right after year 2012...)
              Code:
              input float(idproduct trimestre Year) double salesmnf
              1 222 2015    395
              1 223 2015      0
              2 194 2008      0
              2 195 2008      0
              2 196 2009      0
              2 197 2009      0
              2 198 2009      0
              2 199 2009      0
              2 200 2010      0
              2 201 2010      0
              2 202 2010      0
              2 203 2010      0
              2 204 2011      2
              2 205 2011      0
              2 206 2011      0
              2 207 2011      0
              2 208 2012     12
              2 209 2012      0
              2 210 2012      0
              2 211 2012      0
              2 194 2008   8872
              2 195 2008  69068
              2 196 2009  96190
              2 197 2009 126118
              2 198 2009  83918
              2 199 2009  78896
              2 200 2010  89414
              2 201 2010  98711
              2 202 2010  76112
              2 203 2010 120427
              2 204 2011   1785
              2 205 2011      3
              2 206 2011      0
              2 207 2011      0
              2 208 2012      8
              2 209 2012      0
              2 210 2012      0
              2 211 2012      0
              2 194 2008   8872
              2 195 2008  69068
              2 196 2009  96190
              2 197 2009 126118
              2 198 2009  83918
              2 199 2009  78896
              2 200 2010  89414
              2 201 2010  98711
              2 202 2010  76112
              2 203 2010 120427
              2 204 2011   1785
              2 205 2011      3
              2 206 2011      0
              2 207 2011      0
              2 208 2012      8
              2 209 2012      0
              2 210 2012      0
              2 211 2012      0
              2 194 2008      0
              2 195 2008      0
              2 196 2009      0
              2 197 2009      0
              2 198 2009      0
              2 199 2009      0
              2 200 2010      0
              2 201 2010      0
              2 202 2010      0
              2 203 2010      0
              2 204 2011      2
              2 205 2011      0
              2 206 2011      0
              2 207 2011      0
              2 208 2012     12
              2 209 2012      0
              2 210 2012      0
              2 211 2012      0

              Comment


              • #8
                Given https://www.statalist.org/forums/for...y-hard-reshape and your use of a very different data layout please advise on whether this is still an open question.

                Comment


                • #9
                  Actually no more since I think that once the reshape problem is solved also this question will be.
                  Thank you very very much Professor.

                  Comment

                  Working...
                  X