Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Kaplan Meier Yes answer in Contingent Valuation

    Dear all,

    I tried to illustrate the proportion of yes answer for different bid levels from a contingent valuation survey. The version is Stata 15.1.

    The summary statistics below provides the proportion of yes answers for 4 different bid levels. The graph (provided below) based on .sts graph at a first glance seems correct. However, a closer look reveals that the proportions are not correct, i.e. at bid=50 the proportion should be 0.96, at bid=100 it should be 0.749 and not above 0.75 as it is now, at bid=150 it is hard to distinguish whether the proportion is the correct 0.541, but at bid=200 it is obvious that the proportion yes answer is higher than the correct 0.304.

    I tried to trick Stata by adding bid=0 with proportion yes = 1, i.e. no one says not to the bid (price) equal to zero. Same result. I would be very grateful for any help on, either what is wrong with my code, or for an alternative to illustrate my survival curve with the right proportions.

    Code:
    . su yes if bid==50
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
             yes |        375         .96     .196221          0          1
    
    . su yes if bid==100
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
             yes |        375    .7493333    .4339759          0          1
    
    . su yes if bid==150
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
             yes |        375    .5413333    .4989543          0          1
    
    . su yes if bid==200
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
             yes |        375        .304    .4605971          0          1
    
    . 
    . stset bid, failure(yes)
    
         failure event:  yes != 0 & yes < .
    obs. time interval:  (0, bid]
     exit on or before:  failure
    
    ------------------------------------------------------------------------------
          1,875  total observations
            375  observations end on or before enter()
    ------------------------------------------------------------------------------
          1,500  observations remaining, representing
            958  failures in single-record/single-failure data
        187,500  total analysis time at risk and under observation
                                                    at risk from t =         0
                                         earliest observed entry t =         0
                                              last observed exit t =       200
    
    . sts graph
    
             failure _d:  yes
       analysis time _t:  bid
    Graph.png

    Thank you in advance,

    Henrik

  • #2
    You are misunderstanding the Kaplan-Meier (K-M) estimator. The value it produces are not the proportions yes at each bid level. And I also suspect that your data is not organized appropriately to use -sts test- either. I say that because it strikes me as odd that there are exactly 375 observations at each value of bid. I suppose that's possible, but it seems an odd coincidence (or a strange design). I'm guessing that you do not actually have 1875 people in your data, rather you have 375 people who were repeated offered increasing bids until either they said yes, or they reached 200 and still said no. If I'm right about that, your data does not represent that at all. You should have a single observation for each person that simply records the bid at which they first said yes, or 200, whichever came first, along with a variable yes which is 1 if they said yes at that bid and 0 if it was the final bid but they didn't say yes (and either that was the end of the series at 200 or they didn't complete the series beyond that point for some reason.) So you probably need to do some data management to trim your data set to include, for each participant, only the lowest bid at which they said yes, or, if there is no such bid, the highest bid at which they participated, and you need a variable to distinguish which of those cases applied.

    Then once you have the data properly organized, bear in mind what the K-M estimator gives you. It is not the proportion of people who say yes at a given level. It is the proportion who have still not said yes up to that level. And it is only defined for the analysis time > 0 (in your case, bid > 0) so including observations with bid = 0 will not only not trick Stata into anything, it won't affect the results at all (as you observed).

    Comment


    • #3
      Correction to #2. Looking at the results you show more carefully, I see that the proportions saying yes at each level decrease as the levels go up. That would be unlikely in the interpretation of the data that I have given in #2. So I think it would be better if I stopped trying to imaging your study and instead ask you to explain how these data were generated (study design and method) and also that you use the -dataex- command to show an example. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Comment


      • #4
        Thank you Clyde for your very thoughtful answer.

        Stata SE 16.1 used below.

        The reason why I have exactly 375 observations at each bid level is because I created the data for teaching purposes. So, yes it is a one yes/no question per bid per respondent (i.e. single bounded). So, if I post the data when the bid variable changes from 50 to 100 it will look like this,

        Code:
        . dataex bid yes in 360/390
        
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(bid yes)
         50 1
         50 1
         50 1
         50 1
         50 1
         50 1
         50 1
         50 1
         50 1
         50 1
         50 1
         50 1
         50 1
         50 1
         50 1
         50 1
        100 1
        100 0
        100 1
        100 1
        100 1
        100 1
        100 1
        100 0
        100 1
        100 1
        100 0
        100 1
        100 1
        100 1
        100 1
        end
        The Kaplan-Meier has often been used in contingent valuation to show the survival function for yes answers over the bid levels. The higher the bid the less likely respondents are to say yes, and hence the "proportion of survivors" will decrease with the bid levels. The Kaplan-Meier is related to the Turnbull lower bound estimator often used (sometimes called Kaplan-Meier-Turnbull), which works well with my data (after adding some observations as can be seen below).

        Code:
        . local plus1 = _N + 375 // add observations with bid=0 and yes=1 to get turnbull to work
        
        . set obs `plus1'
        number of observations (_N) was 1,875, now 2,250
        
        . 
        . replace yes=1 if yes==. 
        (375 real changes made)
        
        . replace bid=0 if bid==.
        (375 real changes made)
        
        . 
        . turnbull bid yes
        
        
        -----------------------------------------------------------------------------------------------------
                  |                                    Turnbull Estimates                                    
              Bid |     Nj        Tj        Fj       Nj*       Tj*       Fj*       fj*       Elb      V(Elb) 
        ----------+------------------------------------------------------------------------------------------
               0p |                         0.000                         0.000               0.000          
               50 |    15.000   375.000     0.040    15.000   375.000     0.040     0.040    10.533     0.256
              100 |    94.000   375.000     0.251    94.000   375.000     0.251     0.211    20.800     1.252
              150 |   172.000   375.000     0.459   172.000   375.000     0.459     0.208    35.600     1.655
              200 |   261.000   375.000     0.696   261.000   375.000     0.696     0.237    60.800     1.411
              600 |                         1.000                         1.000     0.304                    
            Total |   542.000 2,250.000             542.000 2,250.000                       127.733     4.574
        -----------------------------------------------------------------------------------------------------
        
        ---------------------
                  | Turnbull 
                  | Estimates
              Bid |    Eub   
        ----------+----------
               0p |     2.000
               50 |    21.067
              100 |    31.200
              150 |    47.467
              200 |   182.400
              600 |          
            Total |   284.133
        ---------------------
        Note: (p) pooled category. Last bid value was arbitrarily chosen. Pval(Elb) =  0.00000.
        
        
        tipo not unique within _v1;
        there are multiple observations at the same tipo within _v1.
        Type "reshape error" for a listing of the problem observations.
        r(9);
        To summarize, what I want to do is to illustrate the proportions shown in my original post, or the Turnbull output above, as in my graph in the original post. This has been referred to as "Kaplan Meier" in the contingent valuation literature but as it seems not accurately. But, what makes me confused is why the Kaplan Meier estimate in my output is almost correct?

        Comment


        • #5
          I'm not really sure why what you got from -sts test- is almost correct; I think it is more or less coincidence.

          In the example data you show, thinking of it in survival analysis terms, you start out with 16 people, all of whom are saying yes at bid = 50. So the K-M estimator is 1.0 out to 50. After saying yes at 50, one person drops out with no further participation: so that person is censored at 50. The remaining 15 continue on to the condition where bid = 100, and 12 say yes, the remaining 3 say no. So immediately after bid = 50, 13/16 = 0.8125 are still in the game and might say either yes or no at the next bid. The example data doesn't say what happens after that, so the Kaplan-Meier graph for this data is a horizontal line at 1 from bid = 0 to bid = 50, and then steps down to 0.8125 at that point. The graph ends there because no further information is available.

          Comment


          • #6
            Hi Clyde,

            I only posted a part of the data set with dataex to show the structure. The proportion of no answers are shown in the .turnbull output in my last post and are bid(50)=0.04, bid(100)=0.251, bid(150)=0.459, and bid(200)=0.696. I therefore expected the K-M estimator to drop to 0.96 at bid(0) and remain at that level to bid(50), then drop to 0.749 at bid(50) and remain at that level to bid(100), then drop to 0.541 at bid(100) and remain at that level to bid(150), and finally drop to 0.304 at bid(150), remain at that level to bid(200) and then drop to 0 beyond that.

            Thanks again for your help. I will opt for another way to illustrate the same distribution.

            Comment


            • #7
              Well, again, that's not how the K-M estimator works. At each time (bid) where somebody says no or is lost from the group, the curve steps down. The level at the next step is always the level at the preceding step multiplied by the probability that a person who survived the present step continues on to the next step.

              I'm not familiar with the -turnbull- program, nor with the Turnbull-Kaplan-Meier estimator. It is possible they are related in some way, or it is possible that Turnbull, Kaplan, and Meier developed jointly but it is something different. I really don't know. Anyway, for what you want, -sts graph- is not the right tool. It does something different.

              Comment

              Working...
              X