If you're curious, here is background on the kind of graph I'm making: Wikipedia link.
But the more general issue is how to extract the exact value of where two lines cross. Ideally, I'd like to have the values for all four points.
Below is code that generates fake data and makes a pretty graph. When applied to real data, the marketing people I'm working with are happy to guesstimate based on printing it out and using a ruler. But darnit, I want exact values. The code I came up with will find the nearest point, but doesn't give an exact solution. The nearest point in the observed data is fine with simulated data with enough data points. But with real data, you don't get curves, since the data is more sparse, and the point closest to the "optimal price" might also be the point closest to the "minimum acceptable price," thus having an exact solution is needed.
I thank you in advance for any help you might give, please peruse the code below.
But the more general issue is how to extract the exact value of where two lines cross. Ideally, I'd like to have the values for all four points.
Below is code that generates fake data and makes a pretty graph. When applied to real data, the marketing people I'm working with are happy to guesstimate based on printing it out and using a ruler. But darnit, I want exact values. The code I came up with will find the nearest point, but doesn't give an exact solution. The nearest point in the observed data is fine with simulated data with enough data points. But with real data, you don't get curves, since the data is more sparse, and the point closest to the "optimal price" might also be the point closest to the "minimum acceptable price," thus having an exact solution is needed.
I thank you in advance for any help you might give, please peruse the code below.
Code:
cd "C:\Users\earnharb\Documents\Van Westendorp"
clear
set more off
set seed 1971
capture log close
set obs 200
gen id=_n
expand 4
gen price=int(runiform()*100)
sort id price
by id: gen id2=_n
reshape wide price, i(id) j(id2)
*====================CHECKING FOR VALID DATA
*====================WITH SIMULATED DATA, ALL SHOULD BE GOOD,
*====================AND WITH CHECKS BUILT INTO SURVEY, SHOULD BE GOOD
*====================BUT CHECK ANYWAY
gen badcase=0
replace badcase=1 if price1>price2 | price2>price3 | price3> price4
drop if badcase==1
*====================GENERATING FREQUENCY TABLES
log using prices1.log, replace
tab price1
log close
log using prices2.log, replace
tab price2
log close
log using prices3.log, replace
tab price3
log close
log using prices4.log, replace
tab price4
log close
*========================splicing into Excel dataset
import delimited using prices1.log, clear
keep if _n>8
forvalues x=1/10 {
drop if trim(v1[_n-1])=="------------+-----------------------------------"
}
drop if trim(v1)=="------------+-----------------------------------"
split v1, gen(vw)
keep vw1 vw5
destring vw1, replace
destring vw5, replace
rename vw1 price
rename vw5 too_low
replace too_low=100-too_low
save too_low, replace
*=====================
import delimited using prices2.log, clear
keep if _n>8
forvalues x=1/10 {
drop if trim(v1[_n-1])=="------------+-----------------------------------"
}
drop if trim(v1)=="------------+-----------------------------------"
split v1, gen(vw)
keep vw1 vw5
destring vw1, replace
destring vw5, replace
rename vw1 price
rename vw5 bargain
replace bargain=100-bargain
save bargain, replace
*=====================
import delimited using prices3.log, clear
keep if _n>8
forvalues x=1/10 {
drop if trim(v1[_n-1])=="------------+-----------------------------------"
}
drop if trim(v1)=="------------+-----------------------------------"
split v1, gen(vw)
keep vw1 vw5
destring vw1, replace
destring vw5, replace
rename vw1 price
rename vw5 expensive
save expensive, replace
*=====================
import delimited using prices4.log, clear
keep if _n>8
forvalues x=1/10 {
drop if trim(v1[_n-1])=="------------+-----------------------------------"
}
drop if trim(v1)=="------------+-----------------------------------"
split v1, gen(vw)
keep vw1 vw5
destring vw1, replace
destring vw5, replace
rename vw1 price
rename vw5 too_expensive
save too_expensive, replace
*=====================
*==========================================================================
use too_low, clear
merge 1:1 price using bargain
drop _merge
merge 1:1 price using expensive
drop _merge
merge 1:1 price using too_expensive
drop _merge
sort price
*===========================================================================
export excel using vw3.xlsx, replace
erase prices1.log
erase prices2.log
erase prices3.log
erase prices4.log
erase too_low.dta
erase bargain.dta
erase expensive.dta
erase too_expensive.dta
*=================================================
gen priceDiff1=abs(too_low-too_expensive)
egen minDiff=min(priceDiff1)
gen optimalPrice=price if priceDiff1==minDiff
local optimal=""
local obs=_N
forvalues i=1/`obs' {
if optimalPrice[`i']!=. {
local optimal=optimalPrice[`i']
}
}
display "`optimal'"
gen priceDiff2=abs(too_low-expensive)
egen minDiff2=min(priceDiff2)
gen lowPrice=price if priceDiff2==minDiff2
local lowprice=""
local obs=_N
forvalues i=1/`obs' {
if lowPrice[`i']!=. {
local lowprice=lowPrice[`i']
}
}
display "`lowprice'"
gen priceDiff3=abs(bargain-too_expensive)
egen minDiff3=min(priceDiff3)
gen highPrice=price if priceDiff3==minDiff3
local highprice=""
local obs=_N
forvalues i=1/`obs' {
if highPrice[`i']!=. {
local highprice=highPrice[`i']
}
}
display "`highprice'"
local sub="Optimal price="+"`optimal'"+" Range "+"`lowprice'"+"-"+"`highprice'"
display "`sub'"
line too_low bargain expensive too_expensive price, title("Van Westendorp for Product X") subtitle(`sub')
graph export "VW.png", as(png) replace

Comment