Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with logit model

    Dear everyone,
    I am trying to learn about logit model by myself. However, I have got some problems that I can't find answers anywhere.
    I am interested in the factors that influence whether a person choose to buy product A or product B. The outcome (response) variable Y is binary (0/1); product A or product B. The predictor variables of interest are the amount of money spent on that product per year (X1) and some characteristic variables (such as: age, gender,...) (X2, X3,...)
    The problem is that there are some people bought both products. For example, person coded P0001 have two lines in dataset which have differences only in Y and X1:
    Line 1: Y = 1, X1 = 1000, same X2, X3, ...
    Line 2: Y = 0, X1 = 3000, same X2, X3, ...
    And I tried to run 3 models with 3 different dataset:
    Model 1: I kept all observations
    Model 2: I dropped observations which bought both products.
    Model 3: I kept observations which have larger value in X1. For example, for person coded P0001, I will drop line 1.
    Could you please tell me which model is correct ?
    I would really appreciate your help ! Sorry for my bad english.
    Best regards,
    Vinh

  • #2
    A binary dependent variable is a variable which indicates whether an individual has made one of two exclusive choices (Y=1) or another (Y=0). In your first specification, Yi=1 if individual i purchases "A" and Yi=0 if individual i purchases "B" is not an exclusive choice because as you suggest, some individuals purchase both A and B. Therefore, you have the following options:

    1) Yi=1 if individual i purchases "A", and 0 otherwise (include all individuals who purchase A and both products in the positive category).
    2) Yi=1 if individual i purchases "B", and 0 otherwise (include all individuals who purchase B and both products in the positive category).
    3) Your model 2 above.

    Now, as in any model that you specify, "correctness" depends on your objectives. All models are valid and cannot be said to be "incorrect". Do you want to predict the probability of an individual purchasing product A (B) without any regard to their alternative purchases? Go for 1 (2). If you believe that for the subsample of individuals who purchase either A or B, this is an exclusive choice (those who buy A will generally never buy B, and vice-versa), go for 3.

    Comment


    • #3
      You can also try with a bivariate probit model if you believe that the choices of these 2 products are linked. it will allow you also to into account of the probabilty that some people choose both products, one of them or none of them!

      Best

      Comment


      • #4
        How exactly are the data set up? Could there also be, say, cases that bought product A multiple times? Based on what you say, instead of having multiple lines I would have had 1 line where the variables were bought A and bought B each having possible values of 0/1. But if, say, these were multiple shopping trips, then it would be more like a panel data problem.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Originally posted by Andrew Musau View Post
          A binary dependent variable is a variable which indicates whether an individual has made one of two exclusive choices (Y=1) or another (Y=0). In your first specification, Yi=1 if individual i purchases "A" and Yi=0 if individual i purchases "B" is not an exclusive choice because as you suggest, some individuals purchase both A and B. Therefore, you have the following options:

          1) Yi=1 if individual i purchases "A", and 0 otherwise (include all individuals who purchase A and both products in the positive category).
          2) Yi=1 if individual i purchases "B", and 0 otherwise (include all individuals who purchase B and both products in the positive category).
          3) Your model 2 above.

          Now, as in any model that you specify, "correctness" depends on your objectives. All models are valid and cannot be said to be "incorrect". Do you want to predict the probability of an individual purchasing product A (B) without any regard to their alternative purchases? Go for 1 (2). If you believe that for the subsample of individuals who purchase either A or B, this is an exclusive choice (those who buy A will generally never buy B, and vice-versa), go for 3.
          Thank you for your quick reply. Your suggestion help me a lot. However, as I meantioned, my objective is to determine the factors which influence whether a person choose to buy product A or product B. In my opinion, if I follow your model 1 or 2, the estimated model can only explain why people "buy or don't buy A" which is different from "choose A over B".
          What if I change my objectives is to determine the factors which influence whether a customer's favorite product is A or B ? Then, will my model 3 (keeping observations which have larger value) be suitable ?

          Comment


          • #6
            Originally posted by Williams Ahouakan View Post
            You can also try with a bivariate probit model if you believe that the choices of these 2 products are linked. it will allow you also to into account of the probabilty that some people choose both products, one of them or none of them!

            Best
            This is new way for me to try. However, I think the result which seem difference from my objective cause I don't need to compare people choose both products and people choose one of two products.

            Comment


            • #7
              Originally posted by Richard Williams View Post
              How exactly are the data set up? Could there also be, say, cases that bought product A multiple times? Based on what you say, instead of having multiple lines I would have had 1 line where the variables were bought A and bought B each having possible values of 0/1. But if, say, these were multiple shopping trips, then it would be more like a panel data problem.
              Thank you for your reply. As you mentioned, it will be more logical if the dataset is surveyed per shopping trip. Because customers will buy only A or B on one shopping trip, the dataset will be a panel data. However, my dataset is surveyed from customers in one country in 2012 and the variable X2 (the amount of money spent on that product in 2012) is sum of money spent on product A(B) on all shopping trips which happened in 2012. Therefore, I think that in 2012, there were time when a person bought product A and another time when that person bought product B. This will make one person maybe have 2 lines on my dataset.

              Comment


              • #8
                The predictor variables of interest are the amount of money spent on that product per year (X1)
                This confuses me. You also say

                Line 1: Y = 1, X1 = 1000, same X2, X3, ...
                Line 2: Y = 0, X1 = 3000, same X2, X3, ...
                If Y = 0, how are you spending any money on the product?
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  Thank you for your reply. As you mentioned, it will be more logical if the dataset is surveyed per shopping trip. Because customers will buy only A or B on one shopping trip, the dataset will be a panel data. However, my dataset is surveyed from customers in one country in 2012 and the variable X2 (the amount of money spent on that product in 2012) is sum of money spent on product A(B) on all shopping trips which happened in 2012. Therefore, I think that in 2012, there were time when a person bought product A and another time when that person bought product B. This will make one person maybe have 2 lines on my dataset.
                  Sorry my mistake. The variable holding the amount of money spent on that product in 2012 should be X1 not X2
                  Originally posted by Richard Williams View Post

                  This confuses me. You also say



                  If Y = 0, how are you spending any money on the product?
                  Sorry for my bad english. My dataset has data in only one year 2012 so maybe using "per" here is not correct.
                  About variables,
                  Y = 0/1: Y = 1 means a customer bought product A and Y = 0 means a customer bought product B
                  X1 : the amount of money spent on that product in 2012
                  X2, X3,...: characteristic variable
                  Therefore,
                  Line 1: Y = 1, X1 = 1000, same X2, X3, ... means that person P0001 did bought product A in 2012 and the money he spent on that product in that year is 1000
                  Line 2: Y = 0, X1 = 3000, same X2, X3, ... means that person P0001 did bought product B in 2012 and the money he spent on that product in that year is 3000

                  Comment

                  Working...
                  X