Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Von Westendorp graph, more general, how to compute intersection of two lines?

    If you're curious, here is background on the kind of graph I'm making: Wikipedia link.

    But the more general issue is how to extract the exact value of where two lines cross. Ideally, I'd like to have the values for all four points.

    Below is code that generates fake data and makes a pretty graph. When applied to real data, the marketing people I'm working with are happy to guesstimate based on printing it out and using a ruler. But darnit, I want exact values. The code I came up with will find the nearest point, but doesn't give an exact solution. The nearest point in the observed data is fine with simulated data with enough data points. But with real data, you don't get curves, since the data is more sparse, and the point closest to the "optimal price" might also be the point closest to the "minimum acceptable price," thus having an exact solution is needed.

    I thank you in advance for any help you might give, please peruse the code below.

    Code:
    cd "C:\Users\earnharb\Documents\Van Westendorp"
    clear
    set more off
    set seed 1971
    capture log close
    
    set obs 200
    gen id=_n
    expand 4
    gen price=int(runiform()*100)
    sort id price 
    by id: gen id2=_n 
    reshape wide price, i(id) j(id2)
    
    
    *====================CHECKING FOR VALID DATA
    *====================WITH SIMULATED DATA, ALL SHOULD BE GOOD,
    *====================AND WITH CHECKS BUILT INTO SURVEY, SHOULD BE GOOD
    *====================BUT CHECK ANYWAY
    
    gen badcase=0
    replace badcase=1 if price1>price2 | price2>price3 | price3> price4
    drop if badcase==1
    
    *====================GENERATING FREQUENCY TABLES
    
    log using prices1.log, replace
    tab price1
    log close
    log using prices2.log, replace
    tab price2
    log close
    log using prices3.log, replace
    tab price3
    log close
    log using prices4.log, replace
    tab price4
    log close
    
    *========================splicing into Excel dataset
    import delimited using prices1.log, clear
    keep if _n>8
    forvalues x=1/10 {
    drop if trim(v1[_n-1])=="------------+-----------------------------------"
    }
    drop if trim(v1)=="------------+-----------------------------------"
    split v1, gen(vw)
    keep vw1 vw5
    destring vw1, replace
    destring vw5, replace
    rename vw1 price
    rename vw5 too_low
    replace too_low=100-too_low
    save too_low, replace
    *=====================
    import delimited using prices2.log, clear
    keep if _n>8
    forvalues x=1/10 {
    drop if trim(v1[_n-1])=="------------+-----------------------------------"
    }
    drop if trim(v1)=="------------+-----------------------------------"
    split v1, gen(vw)
    keep vw1 vw5
    destring vw1, replace
    destring vw5, replace
    rename vw1 price
    rename vw5 bargain
    replace bargain=100-bargain
    save bargain, replace
    *=====================
    import delimited using prices3.log, clear
    keep if _n>8
    forvalues x=1/10 {
    drop if trim(v1[_n-1])=="------------+-----------------------------------"
    }
    drop if trim(v1)=="------------+-----------------------------------"
    split v1, gen(vw)
    keep vw1 vw5
    destring vw1, replace
    destring vw5, replace
    rename vw1 price
    rename vw5 expensive
    save expensive, replace
    *=====================
    import delimited using prices4.log, clear
    keep if _n>8
    forvalues x=1/10 {
    drop if trim(v1[_n-1])=="------------+-----------------------------------"
    }
    drop if trim(v1)=="------------+-----------------------------------"
    split v1, gen(vw)
    keep vw1 vw5
    destring vw1, replace
    destring vw5, replace
    rename vw1 price
    rename vw5 too_expensive
    save too_expensive, replace
    *=====================
    
    *==========================================================================
    use too_low, clear
    merge 1:1 price using bargain
    drop _merge
    merge 1:1 price using expensive
    drop _merge
    merge 1:1 price using too_expensive
    drop _merge
    sort price
    
    
    *===========================================================================
    export excel using vw3.xlsx, replace
    
    erase prices1.log
    erase prices2.log
    erase prices3.log
    erase prices4.log
    erase too_low.dta
    erase bargain.dta
    erase expensive.dta
    erase too_expensive.dta
    
    *=================================================
    
    gen priceDiff1=abs(too_low-too_expensive)
    egen minDiff=min(priceDiff1)
    gen optimalPrice=price if priceDiff1==minDiff
    
    local optimal=""
    local obs=_N
    
    forvalues i=1/`obs' {
        if optimalPrice[`i']!=. {
        local optimal=optimalPrice[`i'] 
        }
        
    }
    display "`optimal'"
    
    gen priceDiff2=abs(too_low-expensive)
    egen minDiff2=min(priceDiff2)
    gen lowPrice=price if priceDiff2==minDiff2
    
    local lowprice=""
    local obs=_N
    
    forvalues i=1/`obs' {
        if lowPrice[`i']!=. {
        local lowprice=lowPrice[`i'] 
        }
        
    }
    display "`lowprice'"
    
    gen priceDiff3=abs(bargain-too_expensive)
    egen minDiff3=min(priceDiff3)
    gen highPrice=price if priceDiff3==minDiff3
    
    local highprice=""
    local obs=_N
    
    forvalues i=1/`obs' {
        if highPrice[`i']!=. {
        local highprice=highPrice[`i'] 
        }
        
    }
    display "`highprice'"
    
    local sub="Optimal price="+"`optimal'"+" Range "+"`lowprice'"+"-"+"`highprice'"
    display "`sub'"
    line too_low bargain expensive too_expensive price, title("Van Westendorp for Product X") subtitle(`sub')
    
    graph export "VW.png", as(png) replace

  • #2
    Ben,

    I hesitate to comment on your posting, but let me note that it is extremely long - it would take most of us a long time to figure out the real issue. We also don't have the data necessary to test anything. You might have a better response with a snippet of the program and some data that highlighted the issue.

    I assume your problem is that you're plotting discrete points rather than a function. As far as I can tell, you are in two dimensions. So it would seem to me that you want to approximate the lines between the points on either side of the intersection and then calculate the intersection point. Is this correct?

    If you are in two dimensions and can identify those points, then you should be able to write down the equations for the lines between the points and the intersection.

    So I assume there is some complexity I am missing.

    Phil

    Comment


    • #3
      Phil --

      Thanks for your response. The first chunk of code generates fake data to graph, so it's all self-contained, no need for data. It needs to plot discreet points, because of the way the pricing model works.

      The outline of my code is this:
      1) Generate fake data.
      2) Generate four frequency tables, saved as four separate log files.
      3) Read log files back into Stata, chop them up to get cumulative percentages, save as Stata datasets.
      4) Merge the four back into one dataset.
      5) Attempt to find points closest to intersections.
      6) Graph it.

      But the points closest to the intersections are not good enough; with real data, the points are sparse and clumpy (people like prices in round numbers at $5 or $10 increments).

      So I need to do exactly what you propose, identify the points, get the equations for lines, and determine where they cross, And it is at this stage that I am at a loss.

      Hope this makes more sense?

      Thanks,

      Ben

      Comment


      • #4
        #1
        When applied to real data, the marketing people I'm working with are happy to guesstimate based on printing it out and using a ruler. But darnit, I want exact values.
        There are no "exact" values here. Stata is simply joining dots (data points) when plotting, and "points" on the line between any two dots are just approximations.

        #3
        So I need to do exactly what you propose, identify the points, get the equations for lines, and determine where they cross, And it is at this stage that I am at a loss.
        The reason why automation is difficult is that your data is in wide form and therefore Stata plots each line individually. Whereas both you and I see that the lines intersect, Stata does not. Therefore, you will have to locate each intersection manually. Also, these are not straight lines and the slope changes from point to point. Equations therefore have to be defined locally, but this can be done by picking out points provided by markers (see below).

        In summary, I think that printing and using a ruler although crude might not be such a bad approximation technique weighing the costs and benefits. My only input is to show that one need not print to obtain the points. Note that like other approximations, visual identification suffers if the data points are too sparse. Using the preserve command, you can combine a line with a scatter plot with marker labels. A start can be looking at the full graph with intersections to locate the point to zoom in.


        Original graph:
        Click image for larger version

Name:	original.png
Views:	1
Size:	32.8 KB
ID:	1363204




        Intersection between "bargain" and "expensive": ≈ (y=48, x=31.5)


        Code:
        preserve
        
        *FOLLOWING YOUR CODE
        drop if price <`optimal'-1
        drop if price >`optimal'+1
        
        *EASIER TO VISUALIZE IF ONE OF THE GRAPHS IS A SCATTER PLOT WITH MARKER LABELS
        scatter bargain price, msize(small) mlabel(price)|| line expensive price
        
        restore
        Click image for larger version

Name:	approx.png
Views:	1
Size:	9.9 KB
ID:	1363205

        Comment

        Working...
        X