Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sequence of data in a discrete choice experiment

    Hi,

    I'm currently trying to enter data for a discrete choice experiment in Stata; this data was given to me in Excel, so I'm having a little trouble importing it. The data in Excel tells me which alternative a respondent chose in each choice set. The respondents are facing 10 choice sets, each with three alternatives, so a total of 30 observations per individual. My main problem is how to generate the the dependent variable "y"; this variable indicates the choice a respondent makes and takes on a value of 1 if the individual chose an alternative or 0 if the person did not choose it. I was thinking that I would create ten variables, choiceset1 through choiceset10, that take on values of 1, 2, or 3, depending on whether the 1st, 2nd, or 3rd alternative within the choice set is chosen. My question is, how can I translate that information into the variable y? In words, I'm trying to write a series of commands that does something like "for respondent i, if choicesetk=j, let the variable y take on the value 1 for the jth observation among the kth set of three observations and 0 elsewhere within that set of observations."

    I know this is a complicated request, so if anyone can point me in the direction of commands that would be helpful to construct something like this, I would be grateful. I have attached an example of what I want the final data to look like, just for the first respondent in order to keep it short.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id cs1 cs2 cs3 cs4 cs5 cs6 cs7 cs8 cs9 cs10 y)
    1 1 2 3 1 1 1 2 3 3 2 1
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 1
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 1
    1 1 2 3 1 1 1 2 3 3 2 1
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 1
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 1
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 1
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 1
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 1
    1 1 2 3 1 1 1 2 3 3 2 0
    1 1 2 3 1 1 1 2 3 3 2 1
    1 1 2 3 1 1 1 2 3 3 2 0
    end

  • #2
    Well, you shouldn't do it quite this way. I'm imagining that the Excel data you have basically looks like the id and cs1-cs10 variables, but without the y. What you need to do is not stick y on the end there, but actually get rid of cs1-cs10. That's wide layout data and it will be very difficult to do anything with it analytically. Also "variables" cs1-cs10 do not actually vary (within id): so you are repeating the same 10 numbers 30 times. So to start with, I would just import id and cs1-cs10 from Excel once and not make 30 replicates. The -reshape- command is your friend

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id cs1 cs2 cs3 cs4 cs5 cs6 cs7 cs8 cs9 cs10)
    1 1 2 3 1 1 1 2 3 3 2
    end
    
    reshape long cs, i(id) j(choice_set)
    rename cs selection
    At this point you have a data set with 10 observations per person; each observation identifies which of the 10 choice sets it corresponds to and which of the options was selected in that choice set. Now you are ready to analyze the data (once you bring in whatever predictor variables you are studying) using -mlogit y predictor variables-.

    Now, if you need to study each of the three choices separately, so you really need 3 corresponding indicators of whether that option was selected, you can follow the above code with:

    Code:
    assert inlist(selection, 1, 2, 3) if !missing(selection)
    forvalues i = 1/3 {
        gen byte selected`i' = `i'.selection if !missing(selection)
    }
    Then if you just want to explore factors influencing selection 2, you can do things like -logit selected2 predictor variables-.

    I think these are the most useful layouts for this kind of data analysis.

    Thank you for using -dataex-

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Well, you shouldn't do it quite this way. I'm imagining that the Excel data you have basically looks like the id and cs1-cs10 variables, but without the y. What you need to do is not stick y on the end there, but actually get rid of cs1-cs10. That's wide layout data and it will be very difficult to do anything with it analytically. Also "variables" cs1-cs10 do not actually vary (within id): so you are repeating the same 10 numbers 30 times. So to start with, I would just import id and cs1-cs10 from Excel once and not make 30 replicates. The -reshape- command is your friend

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(id cs1 cs2 cs3 cs4 cs5 cs6 cs7 cs8 cs9 cs10)
      1 1 2 3 1 1 1 2 3 3 2
      end
      
      reshape long cs, i(id) j(choice_set)
      rename cs selection
      At this point you have a data set with 10 observations per person; each observation identifies which of the 10 choice sets it corresponds to and which of the options was selected in that choice set. Now you are ready to analyze the data (once you bring in whatever predictor variables you are studying) using -mlogit y predictor variables-.

      Now, if you need to study each of the three choices separately, so you really need 3 corresponding indicators of whether that option was selected, you can follow the above code with:

      Code:
      assert inlist(selection, 1, 2, 3) if !missing(selection)
      forvalues i = 1/3 {
      gen byte selected`i' = `i'.selection if !missing(selection)
      }
      Then if you just want to explore factors influencing selection 2, you can do things like -logit selected2 predictor variables-.

      I think these are the most useful layouts for this kind of data analysis.

      Thank you for using -dataex-
      Wow, great! Thank you so much!

      Comment


      • #4
        Hi,
        I have 6 choice scenario, each with 3 unlabelled alternatives(A, B, C) and each alternative has 6 categorical variables each with 3 levels and one payment attribute with 4 levels.I am trying to enter survey data to Excel. What will be the data format to estimate MNL model? Since Alternative A in scenario-1 is not same as Scenario-2. The levels of attributes are different. How can I enter Attribute for each scenario or alternative. Please help me if anyone done the similar study.

        Comment

        Working...
        X