Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Very hard reshape

    Hi all,

    I have a dataset with an id that it is not unique and would like to do a reshape long. Now, dataex cannot be used since the columns are too much.
    So I report here a sample also of the column which expand til 2015 with the same shape (four quarters each year):
    Code:
    input double idproduct long(salesmnf2004Q1 salesmnf2004Q2 salesmnf2004Q3 salesmnf2004Q4 salesmnf2005Q1 salesmnf2005Q2 salesmnf2005Q3 salesmnf2005Q4)
    1      0      0      0      0      0      0      0      0
    2      0      0      0      0      0      0      0      0
    2      0      0      0      0      0      0      0      0
    2      0      0      0      0      0      0      0      0
    2      0      0      0      0      0      0      0      0
    3    244    342   5966   9472    629      0      0      0
    3 339329 324989 426281 283110  24306      0      0      0
    3     36      0      0      0      0      0      0      0
    3  68188  37382  28741  18225    533      0      0      0
    3    124      0      0      0      0      0      0      0
    3 246552 152074 182095 174629  14832      0      0      0
    3    227    174      0      0      0      0      0      0
    3   7659   5855   1930   1788    122      0      0      0
    3     56     11      0      0      0      0      0      0
    3  37894  40017  51435  42338   4143      0      0      0
    3      0    272    282    235      0      0      0      0
    3    252    180      0      0      0      0      0      0
    3  27520  30257  23095  13895    771      0      0      0
    3    528   1009    588    805    121      0      0      0
    3 108352 127944 123135 112263  31098      0      0      0
    3  16695   8061   9343   9790   2425      0      0      0
    3     87      0      0      0      0      0      0      0
    3   1263   1197   1253   1383     41      0      0      0
    3    128     63    370    195      0      0      0      0
    3    234      0    329    338     40      0      0      0
    3   1678    628    307    203      9      0      0      0
    3    780    535    948      0      0      0      0      0
    3    662    634    533    591      0      0      0      0
    3      0      0     31      0      0      0      0      0
    3  88284 216147  70070  62426   9134      0      0      0
    3 138154 148580 163605 133421   7557      0      0      0
    3    104    172    558    415      9      0      0      0
    3  31695  33263  19931   9743    492      0      0      0
    3 572550 605649 784501 664120  53733      0      0      0
    4      0      0      0      0      0      0      0      0
    5  16911  26506  16190  16224  19013  25620  32119  50588
    5  15687  19615  12401  12910  15598  12736  13489  12955
    5      0      0      0      0      0      0      0      0
    5   6124   4219   4345   4317   4537   4846   5973   7038
    5     76      0      0      0      0      0      0    110
    5   3224   2896   2545   1539   2503   2319   3962   5368
    5      0      0      0      0      0      0      0      0
    5    244    247    230    212    261    204    125    534
    5      0      0      0      0      0      0      0      0
    6     29     54     29      0      5     43     14     31
    7    175      0      0      0      0      0      0      0
    7    175      0      0      0      0      0      0      0
    7    175      0      0      0      0      0      0      0
    8      0      0      0      0      0      0      0      0
    8      0      0      0      0      0      0      0      0
    8      0      0      0      0      0      0      0      0
    9      0      0      0      0      0      0      0      0
    9    386    302    172    340    649   2230    202    385
    9   7334   6198   5362   6881   6297   3189   3342   3822
    9      0      0      0      0      0      0      0      0
    9 363775 361731 246456 265171 333133 336635 445020 404845
    9 398457 292293 232842 252645 368193 308239 247530 257376
    9      0      0      0      0      0      0      0      0
    9      0      0      0      0      0      0      0      0
    9 398457 292293 232842 252645 368193 308239 247530 257376
    9    913    895   1109   1715   1898   1060    884   1407
    9      0      0      0      0      0      0      0      0
    9 278705 278967 223250 317347 331716 278549 269238 348433
    9   2883   2478   2800   3450   2545    354    503    456
    9      0      0      0      0      0      0      0      0
    9      0      0      0      0      0      0      0      0
    9 398457 292293 232842 252645 368193 308239 247530 257376
    9   7334   6198   5362   6881   6297   3189   3342   3822
    9   2883   2478   2800   3450   2545    354    503    456
    9   7334   6198   5362   6881   6297   3189   3342   3822
    9 278705 278967 223250 317347 331716 278549 269238 348433
    9 308966 162855  74246  87758 220874 180348 384623 161786
    9 152529 137185 130706 183804  14027  29956  37088  49404
    9  82716  55264  35327  51997  53842  28729  26095  56379
    9   2883   2478   2800   3450   2545    354    503    456
    9 308966 162855  74246  87758 220874 180348 384623 161786
    9  31757  16536  13854  14973   4774   2396   3008   4219
    9 363775 361731 246456 265171 333133 336635 445020 404845
    9      0      0      0      0      0      0      0      0
    9  82716  55264  35327  51997  53842  28729  26095  56379
    9    386    302    172    340    649   2230    202    385
    9   2883   2478   2800   3450   2545    354    503    456
    9    386    302    172    340    649   2230    202    385
    9 363775 361731 246456 265171 333133 336635 445020 404845
    9   7334   6198   5362   6881   6297   3189   3342   3822
    9    913    895   1109   1715   1898   1060    884   1407
    9      0      0      0      0      0      0      0      0
    9 278705 278967 223250 317347 331716 278549 269238 348433
    9 363775 361731 246456 265171 333133 336635 445020 404845
    9      0      0      0      0      0      0      0      0
    9    913    895   1109   1715   1898   1060    884   1407
    9 308966 162855  74246  87758 220874 180348 384623 161786
    9 308966 162855  74246  87758 220874 180348 384623 161786
    9   5987   4611   4213   3631   1947   2438   3259   2532
    9      0      0      0      0      0      0      0      0
    9    386    302    172    340    649   2230    202    385
    9      0      0      0      0      0      0      0      0
    9 398457 292293 232842 252645 368193 308239 247530 257376
    9 278705 278967 223250 317347 331716 278549 269238 348433
    9  82716  55264  35327  51997  53842  28729  26095  56379
    I HAVE TRIED THE FOLLOWING

    Code:
    reshape long salesmnf, i(idproduct) j(quarter) string 
    
    gen trimestre = quarterly(quarter, "YQ")
    format trimestre %tq 
    gen Year = year(dofq(trimestre))
    without success since of course idproduct does not uniquely identify each observation. Also the creation of a unique identifier like

    Code:
    gen id_numeric = _n
    seems to me unfair since there is no unique identifier actually in my variables.

    How can I reshape the data long quarterly in this case?


    Many thanks for the help!!!

  • #2
    Whether or not you think creating a unique identifier with the generate command is "unfair" is beside the point, the advice in the documentation for the reshape command (found in the Stata Data Management Reference Manual PDF included in your Stata installation and accessible through Stata's Help menu) tells you to do exactly that when there is no unique identifier in the data.

    Technical note

    If your data are in wide form and you do not have a group identifier variable (the i(varlist) required option), you can create one easily by using generate; see [D] generate. For instance, in the last example, if we did not have the id variable in our dataset, we could have created it by typing

    . generate id = _n
    This is the advice you were given yesterday, and for anyone new to the discussion, that advice includes an explanation of why the reshape command requires a distinct identifier.

    https://www.statalist.org/forums/for...reshaping-long

    Code:
     generate id = _n
    
    . reshape long salesmnf, i(id) j(quarter) string
    (note: j = 2004Q1 2004Q2 2004Q3 2004Q4 2005Q1 2005Q2 2005Q3 2005Q4)
    
    Data                               wide   ->   long
    -----------------------------------------------------------------------------
    Number of obs.                      100   ->     800
    Number of variables                  10   ->       4
    j variable (8 values)                     ->   quarter
    xij variables:
    salesmnf2004Q1 salesmnf2004Q2 ... salesmnf2005Q4->salesmnf
    -----------------------------------------------------------------------------
    
    .
    . gen trimestre = quarterly(quarter, "YQ")
    
    . format trimestre %tq
    
    . gen Year = year(dofq(trimestre))
    
    .
    . list if idproduct==6, clean noobs
    
        id   quarter   idprod~t   salesmnf   trimes~e   Year  
        45    2004Q1          6         29     2004q1   2004  
        45    2004Q2          6         54     2004q2   2004  
        45    2004Q3          6         29     2004q3   2004  
        45    2004Q4          6          0     2004q4   2004  
        45    2005Q1          6          5     2005q1   2005  
        45    2005Q2          6         43     2005q2   2005  
        45    2005Q3          6         14     2005q3   2005  
        45    2005Q4          6         31     2005q4   2005
    Last edited by William Lisowski; 31 Oct 2018, 12:25.

    Comment


    • #3
      Hi Federico,

      EDITED: This crossed with William's post #2
      Why doesn't idproduct uniquely identify each observation? Is it because this is product sales by region / store / salesperson? (i.e. there is another variable not listed above such that i(idproduct region) would uniquely identify each observation?

      If there is not an actual variable region but each observation above is for an idproduct region sales (and assuming the sort order is correct. In other words, the $244 in sales for idproduct==3 (in 2004q1) (i.e. the first observation for idproduct==3) and the $16,911 for 2004q1 for idproduct==5 are for the same region, then you could do something like:

      Code:
      by idproduct: gen region_id = _n
      Alternatively, if the current order does not mean anything, and you would rather collapse all of product#3's sales per quarter (regardless of region), then you could use Stata's collapse command, and then reshape the new collapsed dataset (because idproduct would uniquely identify each observation at that point).
      Last edited by David Benson; 31 Oct 2018, 13:00.

      Comment


      • #4
        After looking at your topic at

        https://www.statalist.org/forums/for...quarterly-data

        I think you perhaps misunderstand what reshaping is all about, and I agree with David's suggestion that perhaps the following is what you want.
        Code:
        . collapse salesmnf*, by(idproduct)
        
        . list if idproduct==3, clean
        
               idprod~t   salesmn~4Q1   salesmn~4Q2   salesmn~4Q3   salesmn~4Q4   salesmn~5Q1   sale~5Q2   sale~5Q3   sale~5Q4  
          3.          3   58250.86207   59842.58621   65356.10345   53082.24138   5172.241379          0          0          0  
        
        .
        . reshape long salesmnf, i(idproduct) j(quarter) string
        (note: j = 2004Q1 2004Q2 2004Q3 2004Q4 2005Q1 2005Q2 2005Q3 2005Q4)
        
        Data                               wide   ->   long
        -----------------------------------------------------------------------------
        Number of obs.                        9   ->      72
        Number of variables                   9   ->       3
        j variable (8 values)                     ->   quarter
        xij variables:
        salesmnf2004Q1 salesmnf2004Q2 ... salesmnf2005Q4->salesmnf
        -----------------------------------------------------------------------------
        
        .
        . gen trimestre = quarterly(quarter, "YQ")
        
        . format trimestre %tq
        
        . gen Year = year(dofq(trimestre))
        
        .
        . list if idproduct==3, clean noobs
        
            idprod~t   quarter      salesmnf   trimes~e   Year  
                   3    2004Q1   58250.86207     2004q1   2004  
                   3    2004Q2   59842.58621     2004q2   2004  
                   3    2004Q3   65356.10345     2004q3   2004  
                   3    2004Q4   53082.24138     2004q4   2004  
                   3    2005Q1   5172.241379     2005q1   2005  
                   3    2005Q2             0     2005q2   2005  
                   3    2005Q3             0     2005q3   2005  
                   3    2005Q4             0     2005q4   2005
        The key is that you have not, in any of your posts, shown us what you expect the result of your reshape to be. If you wanted just one observation per product/quarter, reshape alone is not enough to accomplish that - it does not "add up" your data.
        Last edited by William Lisowski; 31 Oct 2018, 12:45.

        Comment


        • #5
          One quick clarifying note, collapse defaults to means, so the $58250.86 that William shows above represents the average sales (over the 29 observations of product #3 listed in post#1.) To get the total, just modify the code to:

          Code:
          collapse (sum) salesmnf*, by(idproduct)

          Just to verify, I copied the data into Excel (see screenshot)
          Click image for larger version

Name:	Stata - screen clip.png
Views:	1
Size:	31.7 KB
ID:	1468407

          Comment


          • #6
            David is correct, my code gave means rather than totals. Of course, as I pointed out in post #4, we're just guessing at what is actually required.

            One other point to make, so that others who read this are not misled about the capabilities of the dataex command. From post #1

            dataex cannot be used since the columns are too much
            As the output of help dataex tells us, the dataex command accepts a list of variables to be output, so the data shown in post #1 could have been created with
            Code:
            dataex idproduct salesmnf2004Q1 salesmnf2004Q2 salesmnf2004Q3 salesmnf2004Q4 salesmnf2005Q1 salesmnf2005Q2 salesmnf2005Q3 salesmnf2005Q4
            or
            Code:
            dataex idproduct salesmnf2004* salesmnf2005*
            or, if all the salesmnf variables appeared in order with no other variables intervening
            Code:
            dataex idproduct salesmnf2004Q1-salesmnf2005Q4

            Comment


            • #7
              Hi David and William,

              Many thanks to all of you for the help.

              So: idproduct does not uniquely identify each observation because it is constructed on the name of the product. Actually there are much more string variables in the dataset and I think that a combination of all of them would result in a unique identifier. The problem is that they are too many to be included into i() of the reshape command.

              William, you are right in saying that I did not show what I expected the reshape result to be. The point in your code is that you are first collapsing with the mean and then performing the reshape. In doing that however, the mean is performed for a "complete" quarter if I understand correctly, so that all the sums of sales are divided by 4 (a complete year). The point is that I would like to reshape first, drop all the 0s and then perform the mean on the remaining quarters. Maybe I should try your code after having removed the 0s and see what happens...I will let you know.

              I mean, I would also like to do biennial aggregations and threeyearly ones...I don't know if this flexibility is allowed by doing the collapse first...

              Thank you very much again and sorry for my ignorance!

              Federico
              Last edited by Federico Nutarelli; 01 Nov 2018, 05:39.

              Comment


              • #8
                I did not show what I expected the reshape result to be.
                And you still have not done so. Across this topic and your two earlier topics, it has never been clear what your objective is, nor have you shown what results you expect from the process you describe.

                Here is made-up data for one product. It has six observations of 8 variables.
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input double idproduct long(salesmnf2004Q1 salesmnf2004Q2 salesmnf2004Q3 salesmnf2004Q4 salesmnf2005Q1 salesmnf2005Q2 salesmnf2005Q3 salesmnf2005Q4)
                9999 16911 26506 16190 16224 19013 25620 32119 50588
                9999 15687 19615 12401     0     0 12736 13489 12955
                9999     0     0     0     0     0     0     0     0
                9999     0     0  4345  4317  4537  4846  5973  7038
                9999    76     0     0     0     0     0     0   110
                9999  3224  2896     0     0     0     0     0     0
                end
                Here is the made-up data reshaped into a long layout as I described in post #2 above.
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float id str6 quarter double idproduct long salesmnf float(trimestre Year)
                1 "2004Q1" 9999 16911 176 2004
                1 "2004Q2" 9999 26506 177 2004
                1 "2004Q3" 9999 16190 178 2004
                1 "2004Q4" 9999 16224 179 2004
                1 "2005Q1" 9999 19013 180 2005
                1 "2005Q2" 9999 25620 181 2005
                1 "2005Q3" 9999 32119 182 2005
                1 "2005Q4" 9999 50588 183 2005
                2 "2004Q1" 9999 15687 176 2004
                2 "2004Q2" 9999 19615 177 2004
                2 "2004Q3" 9999 12401 178 2004
                2 "2004Q4" 9999     0 179 2004
                2 "2005Q1" 9999     0 180 2005
                2 "2005Q2" 9999 12736 181 2005
                2 "2005Q3" 9999 13489 182 2005
                2 "2005Q4" 9999 12955 183 2005
                3 "2004Q1" 9999     0 176 2004
                3 "2004Q2" 9999     0 177 2004
                3 "2004Q3" 9999     0 178 2004
                3 "2004Q4" 9999     0 179 2004
                3 "2005Q1" 9999     0 180 2005
                3 "2005Q2" 9999     0 181 2005
                3 "2005Q3" 9999     0 182 2005
                3 "2005Q4" 9999     0 183 2005
                4 "2004Q1" 9999     0 176 2004
                4 "2004Q2" 9999     0 177 2004
                4 "2004Q3" 9999  4345 178 2004
                4 "2004Q4" 9999  4317 179 2004
                4 "2005Q1" 9999  4537 180 2005
                4 "2005Q2" 9999  4846 181 2005
                4 "2005Q3" 9999  5973 182 2005
                4 "2005Q4" 9999  7038 183 2005
                5 "2004Q1" 9999    76 176 2004
                5 "2004Q2" 9999     0 177 2004
                5 "2004Q3" 9999     0 178 2004
                5 "2004Q4" 9999     0 179 2004
                5 "2005Q1" 9999     0 180 2005
                5 "2005Q2" 9999     0 181 2005
                5 "2005Q3" 9999     0 182 2005
                5 "2005Q4" 9999   110 183 2005
                6 "2004Q1" 9999  3224 176 2004
                6 "2004Q2" 9999  2896 177 2004
                6 "2004Q3" 9999     0 178 2004
                6 "2004Q4" 9999     0 179 2004
                6 "2005Q1" 9999     0 180 2005
                6 "2005Q2" 9999     0 181 2005
                6 "2005Q3" 9999     0 182 2005
                6 "2005Q4" 9999     0 183 2005
                end
                format %tq trimestre
                Using one or the other of these, show us what you want the final result to be. Show your intermediate steps. Explain your calculations and show your intermediate results so we can compare our understanding of your explanation to the results you obtain. If we don't get the same results as you, it tells us we misunderstood you. If you don't show us numbers to compare to, we have no way of knowing if we correctly understand you.

                Comment


                • #9
                  Let me add, with regard to posts #3-#6, that my use of mean was a mistake that Daniel pointed out and I acknowledged. I meant to use sum. But it is not clear what you want, and as you suggest in post #7, maybe collapse is not correct, either. Who can tell?
                  Last edited by William Lisowski; 01 Nov 2018, 07:00.

                  Comment


                  • #10
                    So since I have solved the issue for the annual aggregation, I'll post the result for the annual:

                    Code:
                    input float(idproduct Year) double salesmnf str18 internationalproduct str33 atc4 str60 molecule str18(crp prd)
                      1 2015                395 "2-OXOGLUTARIC MEDJ" "V3X0 (ALL OTH.THERAPEUTIC PRDS) "  "2-OXOGLUTARIC ACID"            "MEDISCA INC"        "ALPHA-KETOGLUTARIC"
                      2 2008              38970 "2-PROP/BENZ   L.U." "D8A0 (ANTISEPTICS+DISINFECTANT) "  "BENZOYL PEROXIDE"              "LAB UNKNOWN"        "ISOPROPYL ALC/BENZ"
                      3 2004  72204.44210526315 "2-PROPANOL    L.U." "M2A0 (TOP A-RHEUMATICS & ANALG) "  "ETHANOL"                       "LAB UNKNOWN"        "ALCOHOL RUBBING"  
                      4 2015            1617.25 "2-PROPANOL    MEDJ" "V3X0 (ALL OTH.THERAPEUTIC PRDS) "  "2-PROPANOL"                    "MEDISCA INC"        "ISOPROPYL ALCOHOL"
                      5 2004  7936.285714285715 "2-PROPANOL    PERR" "V5A0 (ANTISEPTIC NON-HUMAN USE) "  "2-PROPANOL"                    "PERRIGO"            "ALCOHOL"          
                      6 2004 37.333333333333336 "20/20 EYE GLSS CLN" "S1L0 (CONTACT LENS SOLUTIONS  ) "  "OTHER CONTACT LENS SOLUTIONS"  "SSS COMPANY"        "20/20 EYE GLSS CLN"
                      7 2004                175 "20/20 REWETTING"    "S1L0 (CONTACT LENS SOLUTIONS  ) "  "POTASSIUM"                     "SSS COMPANY"        "20/20 REWETTING"  
                      8 2007  87640.11111111111 "360 OTC EXTRA STR"  "N2B0 (NON-NARCOTIC ANALGESICS ) "  "PARACETAMOL"                   "THREE SIXTY"        "360 OTC EXTRA STR"
                      9 2004 117664.93589743589 "4-WAY"              "R1A7 (NASAL DECONGESTANTS     ) "  "PHENYLEPHRINE"                 "NOVARTIS"           "4-WAY"            
                     10 2004 1551.1813186813188 "666"                "R5F0 (OTHER COUGH & COLD PREPS) "  "PHENYLPROPANOLAMINE"           "MONTICELLO"         "666"              
                     11 2009              13353 "7-KETO DEHYDR HAWK" "V3X0 (ALL OTH.THERAPEUTIC PRDS) "  "7-KETO DEHYDRANDROSTERONE"     "HAWKINS CHEM INC."  "7-KETO DHEA"      
                     12 2008  609.9166666666666 "7-KETO DEHYDR LETC" "V3X0 (ALL OTH.THERAPEUTIC PRDS) "  "7-KETO DEHYDRANDROSTERONE"     "LETCO MEDICAL"      "7-KETO DHEA"      
                     13 2013  558.7142857142857 "7-KETO DHEA"        "V3X0 (ALL OTH.THERAPEUTIC PRDS) "  "7-KETO DEHYDRANDROSTERONE"     "MEDISCA INC"        "7-KETO DHEA"      
                     14 2004                3.4 "A & D PERSONAL CAR" "D3A9 (OTH WOUND HEALING AGENTS) "  "COMPOSITION UNKNOWN"           "BAYER"              "A & D PERSONAL CAR"
                     15 2004  89.91666666666667 "A SATI/C PETR MAJR" "C10B0 (ANTI-ATHEROMA NATRL ORIG) " "CARUM PETROSELINUM"            "MAJOR PHARM CORP"   "GARLIC/PARSLEY"    
                     16 2004  6662.416666666667 "A SATI/C PETR NBTY" "C10B0 (ANTI-ATHEROMA NATRL ORIG) " "ALLIUM SATIVUM"                "NBTY"               "GARLIC/PARSLEY"    
                     17 2004  8242.666666666666 "A SATI/C PETR OTSU" "C10B0 (ANTI-ATHEROMA NATRL ORIG) " "CARUM PETROSELINUM"            "OTSUKA"             "GARLIC/PARSLEY"    
                     18 2008              114.6 "A&D CRKD SKIN RLF"  "D2A0 (EMOLLIENTS & PROTECTIVES) "  "COMPOSITION UNKNOWN"           "BAYER"              "A&D CRKD SKIN RLF"
                     19 2007          233261.25 "A+D FIRST AID"      "D2A0 (EMOLLIENTS & PROTECTIVES) "  "EMOLLIENT"                     "BAYER"              "A+D FIRST AID"    
                     20 2007 2448.3636363636365 "A-COF DH"           "R5D2 (ANTITUSSIVES COMB       ) "  "HYDROCODONE"                   "AMNEAL PHARM"       "A-COF DH"          
                     21 2004               3151 "A-FIL"              "D2A0 (EMOLLIENTS & PROTECTIVES) "  "METHYLANTHRANILATE"            "NOVARTIS"           "A-FIL"            
                     22 2004  276.3333333333333 "A-HYDROCORT"        "H2A1 (INJ CORTICOSTEROIDS PLN ) "  "HYDROCORTISONE"                "PFIZER"             "A-HYDROCORT"      
                     23 2008 183.66666666666666 "A-SOY"              "V6D0 (OTHER NUTRIENTS         ) "  "COLECALCIFEROL"                "PERRIGO"            "A-SOY"            
                     24 2006  5933.727272727273 "A-TAN 12X"          "R5C0 (EXPECTORANTS            ) "  "MEPYRAMINE"                    "AMNEAL PHARM"       "A-TAN 12X"        
                     25 2004 1080.5925925925926 "A/T/S"              "D10A0 (TOPICAL ANTI-ACNE PREPS ) " "ERYTHROMYCIN"                  "TARO PHARMA"        "A/T/S"
                    which I have obtained with:
                    Code:
                    drop if salesmnf==0
                    collapse (mean) salesmnf (first) internationalproduct atc4 molecule crp prd, by(idproduct Year)
                    sort idproduct Year
                    by idproduct: keep if _n==1
                    Now what I would like to obtain is something similar but with biennia. The steps I need to follow are:
                    1) Reshape long the dataset in #1
                    2) throw away from the sample the first quarters with sales = 0. I mean: if I have something like:
                    2004q1 0
                    2004q2 0
                    2004q3 4
                    2004q4 5
                    I would like to obtain:
                    2004q3 4
                    2004q4 5
                    3) Keep in the sample just the first period by product (biennium in my case)
                    4) compute the mean with a collapse and possibly, within the collapse report also the number of quarters which should be less than 4 for the first year of positive sales and 4 for the next one
                    5) collapse again the biennia in order to obtain something like the annual result above.


                    The point is that in the annual aggregation I was not interested in the 0s in successive years. In the biennial aggregation, instead, I would lose 0s that are relevant for the denominator when I collapse the mean.

                    Have I been clearer?

                    p.s. for the reshape, starting from dataset in #1, I have used the following code to reshape:

                    Code:
                    gen id_numeric = _n
                    reshape long salesmnf, i(id_numeric) j(quarter) string
                    
                    gen trimestre = quarterly(quarter, "YQ")
                    format trimestre %tq
                    gen Year = year(dofq(trimestre))
                    and then dropped id_numeric and left idproduct. My question was: if I have to to leave the id as the idproduct which is nothing else but the id divided per product name, is it ok to reshape long in the way I did? In that way, reshaping I of course obtain that for a single product id I would have many "structures of the type:
                    2004q1
                    2004q2
                    2004q3
                    2004q4
                    ...
                    2015q1
                    2015q2
                    2015q3
                    2015q4
                    instead of a single one. So it could happen that for product 2 I have:
                    2004q1
                    2004q2
                    2004q3
                    2004q4
                    ...
                    2015q1
                    2015q2
                    2015q3
                    2015q4
                    2004q1
                    2004q2
                    2004q3
                    2004q4
                    ...
                    2015q1
                    2015q2
                    2015q3
                    2015q4
                    2004q1
                    2004q2
                    2004q3
                    2004q4
                    ...
                    2015q1
                    2015q2
                    2015q3
                    2015q4

                    Instead of just a "single structure"

                    What I would have liked to do, is the describable with the reshape using "not unique" idproduct:
                    Code:
                    egen idproduct = group(internationalproduct)
                    where "international product" is a string with the name of the products (which is not unique because the same product could have different packages, molecules...) and I could not make it become unique because single products differ for different combinations of variables. There is not a single variable which combined with idproduct makes it unique and we do not know which is the optimal combination.

                    Thank you again!!!!
                    Last edited by Federico Nutarelli; 01 Nov 2018, 08:02.

                    Comment

                    Working...
                    X