If you're curious, here is background on the kind of graph I'm making: Wikipedia link.
But the more general issue is how to extract the exact value of where two lines cross. Ideally, I'd like to have the values for all four points.
Below is code that generates fake data and makes a pretty graph. When applied to real data, the marketing people I'm working with are happy to guesstimate based on printing it out and using a ruler. But darnit, I want exact values. The code I came up with will find the nearest point, but doesn't give an exact solution. The nearest point in the observed data is fine with simulated data with enough data points. But with real data, you don't get curves, since the data is more sparse, and the point closest to the "optimal price" might also be the point closest to the "minimum acceptable price," thus having an exact solution is needed.
I thank you in advance for any help you might give, please peruse the code below.
But the more general issue is how to extract the exact value of where two lines cross. Ideally, I'd like to have the values for all four points.
Below is code that generates fake data and makes a pretty graph. When applied to real data, the marketing people I'm working with are happy to guesstimate based on printing it out and using a ruler. But darnit, I want exact values. The code I came up with will find the nearest point, but doesn't give an exact solution. The nearest point in the observed data is fine with simulated data with enough data points. But with real data, you don't get curves, since the data is more sparse, and the point closest to the "optimal price" might also be the point closest to the "minimum acceptable price," thus having an exact solution is needed.
I thank you in advance for any help you might give, please peruse the code below.
Code:
cd "C:\Users\earnharb\Documents\Van Westendorp" clear set more off set seed 1971 capture log close set obs 200 gen id=_n expand 4 gen price=int(runiform()*100) sort id price by id: gen id2=_n reshape wide price, i(id) j(id2) *====================CHECKING FOR VALID DATA *====================WITH SIMULATED DATA, ALL SHOULD BE GOOD, *====================AND WITH CHECKS BUILT INTO SURVEY, SHOULD BE GOOD *====================BUT CHECK ANYWAY gen badcase=0 replace badcase=1 if price1>price2 | price2>price3 | price3> price4 drop if badcase==1 *====================GENERATING FREQUENCY TABLES log using prices1.log, replace tab price1 log close log using prices2.log, replace tab price2 log close log using prices3.log, replace tab price3 log close log using prices4.log, replace tab price4 log close *========================splicing into Excel dataset import delimited using prices1.log, clear keep if _n>8 forvalues x=1/10 { drop if trim(v1[_n-1])=="------------+-----------------------------------" } drop if trim(v1)=="------------+-----------------------------------" split v1, gen(vw) keep vw1 vw5 destring vw1, replace destring vw5, replace rename vw1 price rename vw5 too_low replace too_low=100-too_low save too_low, replace *===================== import delimited using prices2.log, clear keep if _n>8 forvalues x=1/10 { drop if trim(v1[_n-1])=="------------+-----------------------------------" } drop if trim(v1)=="------------+-----------------------------------" split v1, gen(vw) keep vw1 vw5 destring vw1, replace destring vw5, replace rename vw1 price rename vw5 bargain replace bargain=100-bargain save bargain, replace *===================== import delimited using prices3.log, clear keep if _n>8 forvalues x=1/10 { drop if trim(v1[_n-1])=="------------+-----------------------------------" } drop if trim(v1)=="------------+-----------------------------------" split v1, gen(vw) keep vw1 vw5 destring vw1, replace destring vw5, replace rename vw1 price rename vw5 expensive save expensive, replace *===================== import delimited using prices4.log, clear keep if _n>8 forvalues x=1/10 { drop if trim(v1[_n-1])=="------------+-----------------------------------" } drop if trim(v1)=="------------+-----------------------------------" split v1, gen(vw) keep vw1 vw5 destring vw1, replace destring vw5, replace rename vw1 price rename vw5 too_expensive save too_expensive, replace *===================== *========================================================================== use too_low, clear merge 1:1 price using bargain drop _merge merge 1:1 price using expensive drop _merge merge 1:1 price using too_expensive drop _merge sort price *=========================================================================== export excel using vw3.xlsx, replace erase prices1.log erase prices2.log erase prices3.log erase prices4.log erase too_low.dta erase bargain.dta erase expensive.dta erase too_expensive.dta *================================================= gen priceDiff1=abs(too_low-too_expensive) egen minDiff=min(priceDiff1) gen optimalPrice=price if priceDiff1==minDiff local optimal="" local obs=_N forvalues i=1/`obs' { if optimalPrice[`i']!=. { local optimal=optimalPrice[`i'] } } display "`optimal'" gen priceDiff2=abs(too_low-expensive) egen minDiff2=min(priceDiff2) gen lowPrice=price if priceDiff2==minDiff2 local lowprice="" local obs=_N forvalues i=1/`obs' { if lowPrice[`i']!=. { local lowprice=lowPrice[`i'] } } display "`lowprice'" gen priceDiff3=abs(bargain-too_expensive) egen minDiff3=min(priceDiff3) gen highPrice=price if priceDiff3==minDiff3 local highprice="" local obs=_N forvalues i=1/`obs' { if highPrice[`i']!=. { local highprice=highPrice[`i'] } } display "`highprice'" local sub="Optimal price="+"`optimal'"+" Range "+"`lowprice'"+"-"+"`highprice'" display "`sub'" line too_low bargain expensive too_expensive price, title("Van Westendorp for Product X") subtitle(`sub') graph export "VW.png", as(png) replace
Comment