Kaplan Meier Yes answer in Contingent Valuation

Henrik Andersson

Join Date: May 2015
Posts: 12

Kaplan Meier Yes answer in Contingent Valuation

12 Oct 2020, 14:45

Dear all,

I tried to illustrate the proportion of yes answer for different bid levels from a contingent valuation survey. The version is Stata 15.1.

The summary statistics below provides the proportion of yes answers for 4 different bid levels. The graph (provided below) based on .sts graph at a first glance seems correct. However, a closer look reveals that the proportions are not correct, i.e. at bid=50 the proportion should be 0.96, at bid=100 it should be 0.749 and not above 0.75 as it is now, at bid=150 it is hard to distinguish whether the proportion is the correct 0.541, but at bid=200 it is obvious that the proportion yes answer is higher than the correct 0.304.

I tried to trick Stata by adding bid=0 with proportion yes = 1, i.e. no one says not to the bid (price) equal to zero. Same result. I would be very grateful for any help on, either what is wrong with my code, or for an alternative to illustrate my survival curve with the right proportions.

Code:

. su yes if bid==50

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         yes |        375         .96     .196221          0          1

. su yes if bid==100

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         yes |        375    .7493333    .4339759          0          1

. su yes if bid==150

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         yes |        375    .5413333    .4989543          0          1

. su yes if bid==200

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         yes |        375        .304    .4605971          0          1

. 
. stset bid, failure(yes)

     failure event:  yes != 0 & yes < .
obs. time interval:  (0, bid]
 exit on or before:  failure

------------------------------------------------------------------------------
      1,875  total observations
        375  observations end on or before enter()
------------------------------------------------------------------------------
      1,500  observations remaining, representing
        958  failures in single-record/single-failure data
    187,500  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =       200

. sts graph

         failure _d:  yes
   analysis time _t:  bid

Thank you in advance,

Henrik

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#2

12 Oct 2020, 15:59

You are misunderstanding the Kaplan-Meier (K-M) estimator. The value it produces are not the proportions yes at each bid level. And I also suspect that your data is not organized appropriately to use -sts test- either. I say that because it strikes me as odd that there are exactly 375 observations at each value of bid. I suppose that's possible, but it seems an odd coincidence (or a strange design). I'm guessing that you do not actually have 1875 people in your data, rather you have 375 people who were repeated offered increasing bids until either they said yes, or they reached 200 and still said no. If I'm right about that, your data does not represent that at all. You should have a single observation for each person that simply records the bid at which they first said yes, or 200, whichever came first, along with a variable yes which is 1 if they said yes at that bid and 0 if it was the final bid but they didn't say yes (and either that was the end of the series at 200 or they didn't complete the series beyond that point for some reason.) So you probably need to do some data management to trim your data set to include, for each participant, only the lowest bid at which they said yes, or, if there is no such bid, the highest bid at which they participated, and you need a variable to distinguish which of those cases applied.

Then once you have the data properly organized, bear in mind what the K-M estimator gives you. It is not the proportion of people who say yes at a given level. It is the proportion who have still not said yes up to that level. And it is only defined for the analysis time > 0 (in your case, bid > 0) so including observations with bid = 0 will not only not trick Stata into anything, it won't affect the results at all (as you observed).
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#3

12 Oct 2020, 16:08

Correction to #2. Looking at the results you show more carefully, I see that the proportions saying yes at each level decrease as the levels go up. That would be unlikely in the interpretation of the data that I have given in #2. So I think it would be better if I stopped trying to imaging your study and instead ask you to explain how these data were generated (study design and method) and also that you use the -dataex- command to show an example. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment

Henrik Andersson

Join Date: May 2015
Posts: 12

13 Oct 2020, 08:18

Thank you Clyde for your very thoughtful answer.

Stata SE 16.1 used below.

The reason why I have exactly 375 observations at each bid level is because I created the data for teaching purposes. So, yes it is a one yes/no question per bid per respondent (i.e. single bounded). So, if I post the data when the bid variable changes from 50 to 100 it will look like this,

Code:

. dataex bid yes in 360/390

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(bid yes)
 50 1
 50 1
 50 1
 50 1
 50 1
 50 1
 50 1
 50 1
 50 1
 50 1
 50 1
 50 1
 50 1
 50 1
 50 1
 50 1
100 1
100 0
100 1
100 1
100 1
100 1
100 1
100 0
100 1
100 1
100 0
100 1
100 1
100 1
100 1
end

The Kaplan-Meier has often been used in contingent valuation to show the survival function for yes answers over the bid levels. The higher the bid the less likely respondents are to say yes, and hence the "proportion of survivors" will decrease with the bid levels. The Kaplan-Meier is related to the Turnbull lower bound estimator often used (sometimes called Kaplan-Meier-Turnbull), which works well with my data (after adding some observations as can be seen below).

Code:

. local plus1 = _N + 375 // add observations with bid=0 and yes=1 to get turnbull to work

. set obs `plus1'
number of observations (_N) was 1,875, now 2,250

. 
. replace yes=1 if yes==. 
(375 real changes made)

. replace bid=0 if bid==.
(375 real changes made)

. 
. turnbull bid yes


-----------------------------------------------------------------------------------------------------
          |                                    Turnbull Estimates                                    
      Bid |     Nj        Tj        Fj       Nj*       Tj*       Fj*       fj*       Elb      V(Elb) 
----------+------------------------------------------------------------------------------------------
       0p |                         0.000                         0.000               0.000          
       50 |    15.000   375.000     0.040    15.000   375.000     0.040     0.040    10.533     0.256
      100 |    94.000   375.000     0.251    94.000   375.000     0.251     0.211    20.800     1.252
      150 |   172.000   375.000     0.459   172.000   375.000     0.459     0.208    35.600     1.655
      200 |   261.000   375.000     0.696   261.000   375.000     0.696     0.237    60.800     1.411
      600 |                         1.000                         1.000     0.304                    
    Total |   542.000 2,250.000             542.000 2,250.000                       127.733     4.574
-----------------------------------------------------------------------------------------------------

---------------------
          | Turnbull 
          | Estimates
      Bid |    Eub   
----------+----------
       0p |     2.000
       50 |    21.067
      100 |    31.200
      150 |    47.467
      200 |   182.400
      600 |          
    Total |   284.133
---------------------
Note: (p) pooled category. Last bid value was arbitrarily chosen. Pval(Elb) =  0.00000.


tipo not unique within _v1;
there are multiple observations at the same tipo within _v1.
Type "reshape error" for a listing of the problem observations.
r(9);

To summarize, what I want to do is to illustrate the proportions shown in my original post, or the Turnbull output above, as in my graph in the original post. This has been referred to as "Kaplan Meier" in the contingent valuation literature but as it seems not accurately. But, what makes me confused is why the Kaplan Meier estimate in my output is almost correct?

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#5

13 Oct 2020, 09:53

I'm not really sure why what you got from -sts test- is almost correct; I think it is more or less coincidence.

In the example data you show, thinking of it in survival analysis terms, you start out with 16 people, all of whom are saying yes at bid = 50. So the K-M estimator is 1.0 out to 50. After saying yes at 50, one person drops out with no further participation: so that person is censored at 50. The remaining 15 continue on to the condition where bid = 100, and 12 say yes, the remaining 3 say no. So immediately after bid = 50, 13/16 = 0.8125 are still in the game and might say either yes or no at the next bid. The example data doesn't say what happens after that, so the Kaplan-Meier graph for this data is a horizontal line at 1 from bid = 0 to bid = 50, and then steps down to 0.8125 at that point. The graph ends there because no further information is available.
Comment
Henrik Andersson

Join Date: May 2015

Posts: 12
#6

13 Oct 2020, 10:51

Hi Clyde,

I only posted a part of the data set with dataex to show the structure. The proportion of no answers are shown in the .turnbull output in my last post and are bid(50)=0.04, bid(100)=0.251, bid(150)=0.459, and bid(200)=0.696. I therefore expected the K-M estimator to drop to 0.96 at bid(0) and remain at that level to bid(50), then drop to 0.749 at bid(50) and remain at that level to bid(100), then drop to 0.541 at bid(100) and remain at that level to bid(150), and finally drop to 0.304 at bid(150), remain at that level to bid(200) and then drop to 0 beyond that.

Thanks again for your help. I will opt for another way to illustrate the same distribution.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#7

13 Oct 2020, 14:00

Well, again, that's not how the K-M estimator works. At each time (bid) where somebody says no or is lost from the group, the curve steps down. The level at the next step is always the level at the preceding step multiplied by the probability that a person who survived the present step continues on to the next step.

I'm not familiar with the -turnbull- program, nor with the Turnbull-Kaplan-Meier estimator. It is possible they are related in some way, or it is possible that Turnbull, Kaplan, and Meier developed jointly but it is something different. I really don't know. Anyway, for what you want, -sts graph- is not the right tool. It does something different.
Comment

Announcement

Kaplan Meier Yes answer in Contingent Valuation

Comment

Comment

Comment

Comment

Comment

Comment