Von Westendorp graph, more general, how to compute intersection of two lines?

ben earnhart

Join Date: May 2014
Posts: 1027

Von Westendorp graph, more general, how to compute intersection of two lines?

03 Nov 2016, 20:16

If you're curious, here is background on the kind of graph I'm making: Wikipedia link.

But the more general issue is how to extract the exact value of where two lines cross. Ideally, I'd like to have the values for all four points.

Below is code that generates fake data and makes a pretty graph. When applied to real data, the marketing people I'm working with are happy to guesstimate based on printing it out and using a ruler. But darnit, I want exact values. The code I came up with will find the nearest point, but doesn't give an exact solution. The nearest point in the observed data is fine with simulated data with enough data points. But with real data, you don't get curves, since the data is more sparse, and the point closest to the "optimal price" might also be the point closest to the "minimum acceptable price," thus having an exact solution is needed.

I thank you in advance for any help you might give, please peruse the code below.

Code:

cd "C:\Users\earnharb\Documents\Van Westendorp"
clear
set more off
set seed 1971
capture log close

set obs 200
gen id=_n
expand 4
gen price=int(runiform()*100)
sort id price 
by id: gen id2=_n 
reshape wide price, i(id) j(id2)


*====================CHECKING FOR VALID DATA
*====================WITH SIMULATED DATA, ALL SHOULD BE GOOD,
*====================AND WITH CHECKS BUILT INTO SURVEY, SHOULD BE GOOD
*====================BUT CHECK ANYWAY

gen badcase=0
replace badcase=1 if price1>price2 | price2>price3 | price3> price4
drop if badcase==1

*====================GENERATING FREQUENCY TABLES

log using prices1.log, replace
tab price1
log close
log using prices2.log, replace
tab price2
log close
log using prices3.log, replace
tab price3
log close
log using prices4.log, replace
tab price4
log close

*========================splicing into Excel dataset
import delimited using prices1.log, clear
keep if _n>8
forvalues x=1/10 {
drop if trim(v1[_n-1])=="------------+-----------------------------------"
}
drop if trim(v1)=="------------+-----------------------------------"
split v1, gen(vw)
keep vw1 vw5
destring vw1, replace
destring vw5, replace
rename vw1 price
rename vw5 too_low
replace too_low=100-too_low
save too_low, replace
*=====================
import delimited using prices2.log, clear
keep if _n>8
forvalues x=1/10 {
drop if trim(v1[_n-1])=="------------+-----------------------------------"
}
drop if trim(v1)=="------------+-----------------------------------"
split v1, gen(vw)
keep vw1 vw5
destring vw1, replace
destring vw5, replace
rename vw1 price
rename vw5 bargain
replace bargain=100-bargain
save bargain, replace
*=====================
import delimited using prices3.log, clear
keep if _n>8
forvalues x=1/10 {
drop if trim(v1[_n-1])=="------------+-----------------------------------"
}
drop if trim(v1)=="------------+-----------------------------------"
split v1, gen(vw)
keep vw1 vw5
destring vw1, replace
destring vw5, replace
rename vw1 price
rename vw5 expensive
save expensive, replace
*=====================
import delimited using prices4.log, clear
keep if _n>8
forvalues x=1/10 {
drop if trim(v1[_n-1])=="------------+-----------------------------------"
}
drop if trim(v1)=="------------+-----------------------------------"
split v1, gen(vw)
keep vw1 vw5
destring vw1, replace
destring vw5, replace
rename vw1 price
rename vw5 too_expensive
save too_expensive, replace
*=====================

*==========================================================================
use too_low, clear
merge 1:1 price using bargain
drop _merge
merge 1:1 price using expensive
drop _merge
merge 1:1 price using too_expensive
drop _merge
sort price


*===========================================================================
export excel using vw3.xlsx, replace

erase prices1.log
erase prices2.log
erase prices3.log
erase prices4.log
erase too_low.dta
erase bargain.dta
erase expensive.dta
erase too_expensive.dta

*=================================================

gen priceDiff1=abs(too_low-too_expensive)
egen minDiff=min(priceDiff1)
gen optimalPrice=price if priceDiff1==minDiff

local optimal=""
local obs=_N

forvalues i=1/`obs' {
    if optimalPrice[`i']!=. {
    local optimal=optimalPrice[`i'] 
    }
    
}
display "`optimal'"

gen priceDiff2=abs(too_low-expensive)
egen minDiff2=min(priceDiff2)
gen lowPrice=price if priceDiff2==minDiff2

local lowprice=""
local obs=_N

forvalues i=1/`obs' {
    if lowPrice[`i']!=. {
    local lowprice=lowPrice[`i'] 
    }
    
}
display "`lowprice'"

gen priceDiff3=abs(bargain-too_expensive)
egen minDiff3=min(priceDiff3)
gen highPrice=price if priceDiff3==minDiff3

local highprice=""
local obs=_N

forvalues i=1/`obs' {
    if highPrice[`i']!=. {
    local highprice=highPrice[`i'] 
    }
    
}
display "`highprice'"

local sub="Optimal price="+"`optimal'"+" Range "+"`lowprice'"+"-"+"`highprice'"
display "`sub'"
line too_low bargain expensive too_expensive price, title("Van Westendorp for Product X") subtitle(`sub')

graph export "VW.png", as(png) replace

Tags: None

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

04 Nov 2016, 09:43

Ben,

I hesitate to comment on your posting, but let me note that it is extremely long - it would take most of us a long time to figure out the real issue. We also don't have the data necessary to test anything. You might have a better response with a snippet of the program and some data that highlighted the issue.

I assume your problem is that you're plotting discrete points rather than a function. As far as I can tell, you are in two dimensions. So it would seem to me that you want to approximate the lines between the points on either side of the intersection and then calculate the intersection point. Is this correct?

If you are in two dimensions and can identify those points, then you should be able to write down the equations for the lines between the points and the intersection.

So I assume there is some complexity I am missing.

Phil
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#3

04 Nov 2016, 10:05

Phil --

Thanks for your response. The first chunk of code generates fake data to graph, so it's all self-contained, no need for data. It needs to plot discreet points, because of the way the pricing model works.

The outline of my code is this:
1) Generate fake data.
2) Generate four frequency tables, saved as four separate log files.
3) Read log files back into Stata, chop them up to get cumulative percentages, save as Stata datasets.
4) Merge the four back into one dataset.
5) Attempt to find points closest to intersections.
6) Graph it.

But the points closest to the intersections are not good enough; with real data, the points are sparse and clumpy (people like prices in round numbers at $5 or $10 increments).

So I need to do exactly what you propose, identify the points, get the equations for lines, and determine where they cross, And it is at this stage that I am at a loss.

Hope this makes more sense?

Thanks,

Ben
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10482
#4

07 Nov 2016, 04:20

#1

When applied to real data, the marketing people I'm working with are happy to guesstimate based on printing it out and using a ruler. But darnit, I want exact values.

There are no "exact" values here. Stata is simply joining dots (data points) when plotting, and "points" on the line between any two dots are just approximations.

#3

So I need to do exactly what you propose, identify the points, get the equations for lines, and determine where they cross, And it is at this stage that I am at a loss.

The reason why automation is difficult is that your data is in wide form and therefore Stata plots each line individually. Whereas both you and I see that the lines intersect, Stata does not. Therefore, you will have to locate each intersection manually. Also, these are not straight lines and the slope changes from point to point. Equations therefore have to be defined locally, but this can be done by picking out points provided by markers (see below).

In summary, I think that printing and using a ruler although crude might not be such a bad approximation technique weighing the costs and benefits. My only input is to show that one need not print to obtain the points. Note that like other approximations, visual identification suffers if the data points are too sparse. Using the preserve command, you can combine a line with a scatter plot with marker labels. A start can be looking at the full graph with intersections to locate the point to zoom in.

Original graph:

Intersection between "bargain" and "expensive": ≈ (y=48, x=31.5)

Code:

preserve *FOLLOWING YOUR CODE drop if price <`optimal'-1 drop if price >`optimal'+1 *EASIER TO VISUALIZE IF ONE OF THE GRAPHS IS A SCATTER PLOT WITH MARKER LABELS scatter bargain price, msize(small) mlabel(price)|| line expensive price restore
Comment

Announcement

Von Westendorp graph, more general, how to compute intersection of two lines?

Comment

Comment

Comment