How to assign a maximum value to a year

Natalia Remel

Join Date: Mar 2017

Posts: 23
#1

How to assign a maximum value to a year

18 Mar 2017, 08:41

Hello everyone,
Stata is new to me and my question may seem to be easy but I would be greatful if you could help me.
I have quarterly sales data from 2003 until 2012. I also have a variable maxval which shows the maximum sales values in each observation. I need to create a variable which shows in which year the sales were at maximum. Thank you very much in advance.
maxval year
1284.08 ex.2003
1005.84 …
13282.51
1626.06
831.32
3503.8

Kind regards,
NR
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

18 Mar 2017, 09:09

Welcome to Stata and to Statlist, Natalia.

I think we need to know more about your data and your problem.

You tell us you have quarterly data, and you need a variable that tells us what year the sales were at a maximum. Do you mean that the sales for one quarter in that year were larger than any quarter in any other year, or do you mean that the total sales in the four quarters of the year were larger than the total sales in any other year?

Furthermore, do you have sales data on a number of different companies, so that your problem requires the new variable to take a different value for each company, but the same value for all the observations within that company?

Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

It would be particularly helpful to post a small hand-made example, perhaps with just a few observations, showing the data before the process and how you expect it to look after the process. You can use the Stata Data Editor to fill in by hand in your sample the values would want to calculate in the whole dataset. In particular, please read FAQ #12 and use dataex and CODE delimiters when posting to Statalist.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30165
#3

18 Mar 2017, 09:15

The problem is ill-posed. There could be two or more years that all have the same value of maxval, and that value happens to be the highest. You don't specify how you would decide which of those years to pick.

Here's code that will simply give you a list of all of them:

Code:

summ maxval, meanonly local target = r(max) list year if float(maxval) == float(`target')

Note: Because your variable maxval is not an integer, there are precision issues here. Do read -help precision- for a good discussion of the potential problems. By using -float(maxval) == float(`target')- I am reducing the risk that you will fail to find any match here. If this code fails, we could perhaps assume that maxval, being a sales figure, need not be accurate beyond two decimal places. In that case:

Code:

gen long new_maxval = round(100*maxval, 1) summ new_maxval local target = r(max) list year if new_maxval == `target'

will be bullet-proof.

Added: Crossed with William's post. William noticed even more ambiguities in the original question than I did!
Comment
Natalia Remel

Join Date: Mar 2017

Posts: 23
#4

18 Mar 2017, 10:08

Dear William, thank you for your answer.

I use Stata 12.1. Attached is a hand-made example in an excel format with a few observations. Two last columns is the result I would like to have.
I have in total 700 observations. This is an analysis on a brand level. Company or country are not taken into account. Variables (su303 stands for March 2003, su603 – June 2003 etc; su stands for standard units) from su303 to su1214 show a peak during their life cycle. The variable maxval represents the maximum value. I also have a variable ldate (launch date). I want to know in how many years a brand reaches its peak. To do that, I wanted to generate first a new variable which indicates in which year a brand reached its peak and then calculate the difference between this year and the launch year. I did not post a second part of my question at the very beginning. Thank you.

NR

Attached Files

sample of the dataset.xlsx (12.0 KB, 1 view)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30165
#5

18 Mar 2017, 10:20

Natalia,

Thank you for the additional explanations. Let me call your attention to a part of William Lisowski's post (#2) that you apparently missed:

In particular, please read FAQ #12 and use dataex and CODE delimiters when posting to Statalist.

Had you read FAQ, not only would you have posted your example with -dataex- you also would have learned that Forum members are explicitly asked not to attach Excel files as data. There are three reasons for this. 1. Some of the most frequent responders here do not use Microsoft Office. 2. Even among those who do, there is reluctance to download Office documents from strangers because they can contain active malware. 3. Importing data from Stata to Excel is a process in its own right that make it just that much more work for those who want to help you, and can result in a Stata data set that is different from your Stata data set, depending on options chosen. Using -dataex- gets around all of these problems: it enables those who want to help you to make a completely faithful replica of your data example with a simple copy paste operation, it requires no software other than Stata which (presumably) we all have and use, and it carries no risk of transporting malware.

Please read the FAQ, and especially #12, and re-post your example using -dataex-.
Comment

Natalia Remel

Join Date: Mar 2017
Posts: 23

18 Mar 2017, 11:45

Thank you for your answer. I appologize for not posting my answer in a proper way. I did read #12 in FAQ but since I have external access to stata from the university I was mistakenly looking for dataex at my desktop... I this example I just posted a few sales data (ex. su303) , not for all years.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str30 country str21 intname float(su303 su603 su903 su1203 su304 su604 su904 su1204 su305 su605 su905 su1205 su312 su612 su912 su1212 ldate lorder maxval)
"Italy"  "ZEFFIX"        599.2  610.78  622.72   631.7  664.44  700.54  704.37  724.87  711.64  768.06   754.9  758.25 1144.96 1017.89  935.91  837.72 480 3  1284.08
"Spain"  "ZIAGEN"       848.55  835.03  788.91  770.04  771.79  860.97  832.94  902.86  927.23 1005.84  868.05  765.01  286.13  255.87  243.11  237.68 480 4  1005.84
"Spain"  "COAPROVEL"   4716.45 5201.05 5219.18 5445.35 6056.49 6381.87 6431.01 6711.11 7032.46 7358.27 7096.02 7160.24 9229.92 8776.24 6684.97 7570.63 480 5 13282.51
"UK"     "AROMASIN"     250.37  277.23  285.56   347.1  330.35  385.71   420.3  460.94  459.03  511.71  571.92  626.75   863.1  861.09  800.04  788.79 480 6  1626.06
"France" "HUMALOG MIX"  786.33  831.32  820.31  767.41  750.62  743.24  733.51  715.22  694.04  715.94  681.52  667.06  543.25  544.35  521.85   532.7 480 7   831.32
"France" "ASMASAL"      3503.8  3041.4  2637.6  2914.4  2506.4  2493.6  2207.4  2403.4  2462.6    2320  1778.2  2087.8  1159.4  1115.4   995.2  1162.2 480 7   3503.8
end
format %tm ldate

Last edited by Natalia Remel; 18 Mar 2017, 11:48.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30165
#7

18 Mar 2017, 12:38

OK. I notice that in your data, sometimes there is no date where su matches the maxval variable. For those observations, the code below returns missing value.

A still unresolved ambiguity in your problem is what to do if there are several dates where su = maxval for a given product. The code below gives you the first such date. If you want the last such date, change -min- to -max-. If you want some other way to break the tie, post back with an explanation of what you want.

Code:

reshape long su, i(intname) j(pseudo_date) by intname, sort: egen peak_date = min(cond(float(maxval) == float(su), pseudo_date, .))

Notes:

1. This will leave your data in long layout. I don't know where you're headed with this data from this point. In general, analyses are easier and work better in Stata when the data are in long layout. But if you have a compelling reason to go back to wide layout, just add -reshape wide- to the end of that code.

2. I assume that the variable intname is a unique identifier for products.

3. If you do stay in long layout, you will probably be better off changing pseudo_date and ldate into real Stata internal date variables. This will make it possible to do calculations (such as the interval from launch date to peak sales date!). In current form you cannot do that correctly.

Again, do read -help precision- for a deeper understanding of why the -float()- functions are needed here.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

18 Mar 2017, 12:57

Added in edit: This crossed with Clyde's post, above. Same ideas; I carried it farther along the lines Clyde indicated, to leave a dataset useful for further analysis.

Thank you for the data.

The first thing we notice is that your data is in what Stata would call a "wide" layout with the values of su for different months in different variables. We will transform it to a "long" layout, where each observation has just one value of su for one given month. The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. It certainly does make things much easier here.

In the code below, I follow Clyde's advice and do the calculations in integers, multiplying your su values by 100. But once the calculations are done, I drop the integer variables. Even following Clyde's advice, it is possible that the exact same peak will occur twice for a brand, just by chance, and this code will find the second peak if that does happen. It seems unlikely, though.

I discarded your maxval variable because, in the first brand at least, it doesn't correspond to any of the values of su in the data.

So here's my code which I hope will point you in a useful direction.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str30 country str21 intname float(su303 su603 su903 su1203 su304 su604 su904 su1204 su305 su605 su905 su1205 su312 su612 su912 su1212 ldate lorder maxval)
"Italy"  "ZEFFIX"        599.2  610.78  622.72   631.7  664.44  700.54  704.37  724.87  711.64  768.06   754.9  758.25 1144.96 1017.89  935.91  837.72 480 3  1284.08
"Spain"  "ZIAGEN"       848.55  835.03  788.91  770.04  771.79  860.97  832.94  902.86  927.23 1005.84  868.05  765.01  286.13  255.87  243.11  237.68 480 4  1005.84
"Spain"  "COAPROVEL"   4716.45 5201.05 5219.18 5445.35 6056.49 6381.87 6431.01 6711.11 7032.46 7358.27 7096.02 7160.24 9229.92 8776.24 6684.97 7570.63 480 5 13282.51
"UK"     "AROMASIN"     250.37  277.23  285.56   347.1  330.35  385.71   420.3  460.94  459.03  511.71  571.92  626.75   863.1  861.09  800.04  788.79 480 6  1626.06
"France" "HUMALOG MIX"  786.33  831.32  820.31  767.41  750.62  743.24  733.51  715.22  694.04  715.94  681.52  667.06  543.25  544.35  521.85   532.7 480 7   831.32
"France" "ASMASAL"      3503.8  3041.4  2637.6  2914.4  2506.4  2493.6  2207.4  2403.4  2462.6    2320  1778.2  2087.8  1159.4  1115.4   995.2  1162.2 480 7   3503.8
end
format %tm ldate
// assign a uniqe ID to each row for use by reshape
generate id = _n
// reshape into long layout
reshape long su, i(id) j(my)
// generate a Stata Internal Format date from the month-year variable my
generate date = ym(2000 + mod(my,100), floor(my/100))
format %tm date
order id country intname date
drop my
// following Clyde's advice, work with integers
drop maxval
generate long su100 = round(100*su,1)
sort id date
by id: egen long p_su100  = max(su100)
by id: egen pdate = max(cond(su100==p_su100,date,.))
format %tm pdate
generate peak_su = p_su100/100
generate peak_mon = pdate - ldate
label variable peak_mon "Months from Inception to Peak"
// no longer need these
drop su100 p_su100

This is what the data now looks like for one of your brands, in a long layout:

Code:

. list country intname ldate date su pdate peak_su peak_mon if intname=="ZIAGEN", clean

       country   intname    ldate      date        su    pdate   peak_su   peak_mon  
 17.     Spain    ZIAGEN   2000m1    2003m3    848.55   2005m6   1005.84         65  
 18.     Spain    ZIAGEN   2000m1    2003m6    835.03   2005m6   1005.84         65  
 19.     Spain    ZIAGEN   2000m1    2003m9    788.91   2005m6   1005.84         65  
 20.     Spain    ZIAGEN   2000m1   2003m12    770.04   2005m6   1005.84         65  
 21.     Spain    ZIAGEN   2000m1    2004m3    771.79   2005m6   1005.84         65  
 22.     Spain    ZIAGEN   2000m1    2004m6    860.97   2005m6   1005.84         65  
 23.     Spain    ZIAGEN   2000m1    2004m9    832.94   2005m6   1005.84         65  
 24.     Spain    ZIAGEN   2000m1   2004m12    902.86   2005m6   1005.84         65  
 25.     Spain    ZIAGEN   2000m1    2005m3    927.23   2005m6   1005.84         65  
 26.     Spain    ZIAGEN   2000m1    2005m6   1005.84   2005m6   1005.84         65  
 27.     Spain    ZIAGEN   2000m1    2005m9    868.05   2005m6   1005.84         65  
 28.     Spain    ZIAGEN   2000m1   2005m12    765.01   2005m6   1005.84         65  
 29.     Spain    ZIAGEN   2000m1    2012m3    286.13   2005m6   1005.84         65  
 30.     Spain    ZIAGEN   2000m1    2012m6    255.87   2005m6   1005.84         65  
 31.     Spain    ZIAGEN   2000m1    2012m9    243.11   2005m6   1005.84         65  
 32.     Spain    ZIAGEN   2000m1   2012m12    237.68   2005m6   1005.84         65

You could now use reshape wide to return your data to wide format, but as I discuss above, that would almost certainly be a mistake.

Last edited by William Lisowski; 18 Mar 2017, 13:00.

Comment

Natalia Remel

Join Date: Mar 2017

Posts: 23
#9

19 Mar 2017, 09:01

Dear William and Clyde, thank you! You have been such a great help to me. I have my data also in a long format but I mistakenly thought that with a wide format I will have more overview. I got the result I wanted. Concerning the question if there are many peak sales, I am still thinking how to adress this issue because I want to sort out all observations according to the principle so that in the last year under review sales should have fallen to at least 70% of the peak. I want to make sure that a brand has already reached its peak within 2003-2012.. Thank you again for all the codes and help.
Comment

Natalia Remel

Join Date: Mar 2017
Posts: 23

#10

19 Mar 2017, 12:20

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str30 country str21 intname str4 neu float(su ldate lorder time_su pdate peak_su peak_mon peak_year)
"Japan" "5 FU KYOWA KIRIN" "1203" 3884.88 93 3 1203 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "1204" 3080.88 93 3 1204 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "1205" 2780.18 93 3 1205 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "1206" 2682.55 93 3 1206 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "1207" 2675.67 93 3 1207 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "1208" 2737.76 93 3 1208 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "1209" 2635.79 93 3 1209 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "1210"  2358.6 93 3 1210 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "1211" 1452.65 93 3 1211 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "1212" 1183.79 93 3 1212 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "1213"  974.03 93 3 1213 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "1214"  930.27 93 3 1214 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "2002"       . 93 3 2002 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "303"  3636.28 93 3  303 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "304"  2972.64 93 3  304 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "305"  2471.51 93 3  305 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "306"  2308.47 93 3  306 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "307"  2272.57 93 3  307 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "308"  2316.92 93 3  308 603 3921.88 510 42.5
"Japan" "5 FU KYOWA KIRIN" "309"  2449.19 93 3  309 603 3921.88 510 42.5
end

keep if lorder<10
gen time_su = real(neu)
format %tm time_su
generate long su100 = round(100*su,1)
sort intname time_su
by intname: egen long p_su100 = max(su100)
by intname: egen pdate = max(cond(su100==p_su100,time_su,.))
format %tm pdate
generate peak_su = p_su100/100
generate peak_mon = pdate - ldate
generate peak_year = peak_mon/12
drop su100 p_su100

sort intname neu
by intname: gen su1214_gr = (1214 in neu/peak_su)*100

format %tm neu

I also have another question. I want to format the variable “neu” (time when sales occurred) into time format (ex. 1203 into 2003m12) with a command format %tm neu, then I got an error 120. After reading waht this error means I used the command gen time_su = real(neu) to convert string into numeric variable but the result is not correct (when I compare variables neu and time_su).

I also want to generate a variable “su1214_gr” which shows the percentage of the peak that a product reached in the last year under review. I do not know how to express (1214 in neu = sales in last year under review which is 2014 December) in the command.
sort intname neu
by intname: gen su1214_gr = (su1214 in neu/peak_su)*100
Thank you.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30165
#11

19 Mar 2017, 13:41

So your variable neu is a string variable which, to human eyes, reads like 303 304 305 ... 1213 1214. Your intent is that the last two digits are the calendar year (with 2000 as base) and the first one or two digits are the month. So to get this to a Stata internal date you could do:

Code:

gen neu_year = mod(real(neu), 100) + 2000 gen neu_month = floor(real(neu)/100) gen neu_sif = mofd(mdy(neu_month, 1, neu_year)) format neu_sif %tm list neu*, noobs clean

There are other ways to do this. And it is, in principle, possible to reduce this all to a single line of code, but it would be completely opaque.
Note that there is one value of neu in your example data for which this code fails: 2002. It doesn't fit your general scheme and I don't know what it's supposed to represent. I presume it's an error in your data.

Stata date variables are complicated and even the most experienced users have to refer to the manuals to remember all of the details.

Also, to work with chronological data, you will just get hopelessly lost using anything but Stata date variables to do that. So, converting neu to a Stata internal format date variable, as above, is just the first step. Do it also for your peak date, and any other date variables that might be in your real data. The calculations for peak date are a little shorter because it starts out as a numeric variable instead of a string.

Code:

drop peak_mon peak_year // CALCULATED INCORRECTLY IN EXAMPLE DATA gen peak_year = mod(pdate, 100) + 2000 gen peak_month = floor(pdate/100) gen pdate_sif = mofd(mdy(peak_month, 1, peak_year)) format pdate_sif %tm

I also want to generate a variable “su1214_gr” which shows the percentage of the peak that a product reached in the last year under review. I do not know how to express (1214 in neu = sales in last year under review which is 2014 December) in the command.

Code:

// VERIFY EACH PRODUCT HAS ONLY A SINGLE OBSERVATION // WITH NEU = 2014M12 by intname, sort: egen check_1214 = total(neu_sif == tm(2014m12)) assert check_1214 <= 1 // CALCULATE RATIO OF SU IN 2014M12 TO PEAK SU by intname, sort: egen su1214_gr = max(cond(neu_sif == tm(2014m12), su/peak_su, .)) // CONVERT RATIO TO PERCENTAGE replace su1214_gr = 100*su1214_gr
1 like
Comment
Natalia Remel

Join Date: Mar 2017

Posts: 23
#12

21 Mar 2017, 04:21

Thank you very much, Clyde! You have been a huge help to me.
Comment

Natalia Remel

Join Date: Mar 2017
Posts: 23

#13

02 Apr 2017, 02:44

Dear Statalisters,

I would like to generate multiple variables price303, price603, price903 etc which I get by dividing (leumnfqtr172, leumnfqtr173, leumnfqtr174 etc) by (su303, su 603, su903 etc).
Is there is a command how to generate them without having to write each command (gen price303=leumnfqtr172/su303) many times? Thank you very much in advance!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str21 intname float(leumnfqtr172 leumnfqtr173 leumnfqtr174 leumnfqtr175 su303 su603 su903 su1203)
"HUMULIN NPH"   5336.92  5533.8 5456.98 4913.61  799.18  831.58  822.24  738.15
"HUMALOG MIX"   5014.22 5303.46 5234.07 4897.36  786.33  831.32  820.31  767.41
"INSULINE DCI"        0       0       0       0       0       0       0       0
"INSUMAN KOMB"   619.98  662.34  643.56   598.6  107.71   114.8  111.53  103.38
"ACTRAPHANE HM" 7862.06 7965.43 7719.52 7352.18 1315.22 1331.28 1290.42 1228.94
"ACTRAPHANE HM"   341.3  340.84  316.29  277.23   28.21   28.17   26.14   22.91
"NOVOMIX"       1359.82 1822.29 2054.29 2385.09  219.73   294.3  331.46  384.77
"NOVOMIX"       1196.03 1617.19 1845.87 2150.54  191.49  258.94  295.54  344.34
end

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35779

#14

02 Apr 2017, 03:32

Your example implies 4 commands one by one and about as many in a loop. No obvious gain, except practice with technique:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str21 intname float(leumnfqtr172 leumnfqtr173 leumnfqtr174 leumnfqtr175 su303 su603 su903 su1203)
"HUMULIN NPH"   5336.92  5533.8 5456.98 4913.61  799.18  831.58  822.24  738.15
"HUMALOG MIX"   5014.22 5303.46 5234.07 4897.36  786.33  831.32  820.31  767.41
"INSULINE DCI"        0       0       0       0       0       0       0       0
"INSUMAN KOMB"   619.98  662.34  643.56   598.6  107.71   114.8  111.53  103.38
"ACTRAPHANE HM" 7862.06 7965.43 7719.52 7352.18 1315.22 1331.28 1290.42 1228.94
"ACTRAPHANE HM"   341.3  340.84  316.29  277.23   28.21   28.17   26.14   22.91
"NOVOMIX"       1359.82 1822.29 2054.29 2385.09  219.73   294.3  331.46  384.77
"NOVOMIX"       1196.03 1617.19 1845.87 2150.54  191.49  258.94  295.54  344.34
end

forval j = 1/4 {
    local L = 171 + `j'
    local P = 300 * `j' + 3
    gen price`P' = leumnfqtr`L' / su`P'
}

You could squeeze the loop like this (here's one step of two), but code like this is harder to read:

Code:

forval j = 1/4 {
    local P = 300 * `j' + 3
    gen price`P' = leumnfqtr`=171 + `j'' / su`P'
}

There is technique like this

Code:

local whatever 303 603 903 1203 
forval j = 172/175 { 
      gettoken this whatever : whatever 
      gen price`this' = leuumnqtr`j'  / su`this' 
}

Still harder to think up than four direct statements.

Last edited by Nick Cox; 02 Apr 2017, 03:45.

Comment

Natalia Remel

Join Date: Mar 2017

Posts: 23
#15

02 Apr 2017, 04:33

Thank you Nick very much! It is very helpful.
Comment

Announcement