Problem with logit model

Vinh Nguyen

Join Date: Dec 2015

Posts: 7
#1

Problem with logit model

13 Dec 2015, 08:56

Dear everyone,
I am trying to learn about logit model by myself. However, I have got some problems that I can't find answers anywhere.
I am interested in the factors that influence whether a person choose to buy product A or product B. The outcome (response) variable Y is binary (0/1); product A or product B. The predictor variables of interest are the amount of money spent on that product per year (X1) and some characteristic variables (such as: age, gender,...) (X2, X3,...)
The problem is that there are some people bought both products. For example, person coded P0001 have two lines in dataset which have differences only in Y and X1:
Line 1: Y = 1, X1 = 1000, same X2, X3, ...
Line 2: Y = 0, X1 = 3000, same X2, X3, ...
And I tried to run 3 models with 3 different dataset:
Model 1: I kept all observations
Model 2: I dropped observations which bought both products.
Model 3: I kept observations which have larger value in X1. For example, for person coded P0001, I will drop line 1.
Could you please tell me which model is correct ?
I would really appreciate your help ! Sorry for my bad english.
Best regards,
Vinh
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#2

13 Dec 2015, 09:50

A binary dependent variable is a variable which indicates whether an individual has made one of two exclusive choices (Y=1) or another (Y=0). In your first specification, Y_i=1 if individual i purchases "A" and Y_i=0 if individual i purchases "B" is not an exclusive choice because as you suggest, some individuals purchase both A and B. Therefore, you have the following options:

1) Y_i=1 if individual i purchases "A", and 0 otherwise (include all individuals who purchase A and both products in the positive category).
2) Y_i=1 if individual i purchases "B", and 0 otherwise (include all individuals who purchase B and both products in the positive category).
3) Your model 2 above.

Now, as in any model that you specify, "correctness" depends on your objectives. All models are valid and cannot be said to be "incorrect". Do you want to predict the probability of an individual purchasing product A (B) without any regard to their alternative purchases? Go for 1 (2). If you believe that for the subsample of individuals who purchase either A or B, this is an exclusive choice (those who buy A will generally never buy B, and vice-versa), go for 3.
Comment
Williams Ahouakan

Join Date: Mar 2015

Posts: 32
#3

13 Dec 2015, 18:46

You can also try with a bivariate probit model if you believe that the choices of these 2 products are linked. it will allow you also to into account of the probabilty that some people choose both products, one of them or none of them!

Best
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#4

13 Dec 2015, 19:30

How exactly are the data set up? Could there also be, say, cases that bought product A multiple times? Based on what you say, instead of having multiple lines I would have had 1 line where the variables were bought A and bought B each having possible values of 0/1. But if, say, these were multiple shopping trips, then it would be more like a panel data problem.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Vinh Nguyen

Join Date: Dec 2015

Posts: 7
#5

14 Dec 2015, 10:41

Originally posted by Andrew Musau View Post

A binary dependent variable is a variable which indicates whether an individual has made one of two exclusive choices (Y=1) or another (Y=0). In your first specification, Y_i=1 if individual i purchases "A" and Y_i=0 if individual i purchases "B" is not an exclusive choice because as you suggest, some individuals purchase both A and B. Therefore, you have the following options:

1) Y_i=1 if individual i purchases "A", and 0 otherwise (include all individuals who purchase A and both products in the positive category).
2) Y_i=1 if individual i purchases "B", and 0 otherwise (include all individuals who purchase B and both products in the positive category).
3) Your model 2 above.

Now, as in any model that you specify, "correctness" depends on your objectives. All models are valid and cannot be said to be "incorrect". Do you want to predict the probability of an individual purchasing product A (B) without any regard to their alternative purchases? Go for 1 (2). If you believe that for the subsample of individuals who purchase either A or B, this is an exclusive choice (those who buy A will generally never buy B, and vice-versa), go for 3.

Thank you for your quick reply. Your suggestion help me a lot. However, as I meantioned, my objective is to determine the factors which influence whether a person choose to buy product A or product B. In my opinion, if I follow your model 1 or 2, the estimated model can only explain why people "buy or don't buy A" which is different from "choose A over B".
What if I change my objectives is to determine the factors which influence whether a customer's favorite product is A or B ? Then, will my model 3 (keeping observations which have larger value) be suitable ?
Comment
Vinh Nguyen

Join Date: Dec 2015

Posts: 7
#6

14 Dec 2015, 10:49

Originally posted by Williams Ahouakan View Post

You can also try with a bivariate probit model if you believe that the choices of these 2 products are linked. it will allow you also to into account of the probabilty that some people choose both products, one of them or none of them!

Best

This is new way for me to try. However, I think the result which seem difference from my objective cause I don't need to compare people choose both products and people choose one of two products.
Comment
Vinh Nguyen

Join Date: Dec 2015

Posts: 7
#7

14 Dec 2015, 11:11

Originally posted by Richard Williams View Post

How exactly are the data set up? Could there also be, say, cases that bought product A multiple times? Based on what you say, instead of having multiple lines I would have had 1 line where the variables were bought A and bought B each having possible values of 0/1. But if, say, these were multiple shopping trips, then it would be more like a panel data problem.

Thank you for your reply. As you mentioned, it will be more logical if the dataset is surveyed per shopping trip. Because customers will buy only A or B on one shopping trip, the dataset will be a panel data. However, my dataset is surveyed from customers in one country in 2012 and the variable X2 (the amount of money spent on that product in 2012) is sum of money spent on product A(B) on all shopping trips which happened in 2012. Therefore, I think that in 2012, there were time when a person bought product A and another time when that person bought product B. This will make one person maybe have 2 lines on my dataset.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#8

14 Dec 2015, 11:39

The predictor variables of interest are the amount of money spent on that product per year (X1)

This confuses me. You also say

Line 1: Y = 1, X1 = 1000, same X2, X3, ...
Line 2: Y = 0, X1 = 3000, same X2, X3, ...

If Y = 0, how are you spending any money on the product?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Vinh Nguyen

Join Date: Dec 2015

Posts: 7
#9

15 Dec 2015, 17:50

Thank you for your reply. As you mentioned, it will be more logical if the dataset is surveyed per shopping trip. Because customers will buy only A or B on one shopping trip, the dataset will be a panel data. However, my dataset is surveyed from customers in one country in 2012 and the variable X2 (the amount of money spent on that product in 2012) is sum of money spent on product A(B) on all shopping trips which happened in 2012. Therefore, I think that in 2012, there were time when a person bought product A and another time when that person bought product B. This will make one person maybe have 2 lines on my dataset.

Sorry my mistake. The variable holding the amount of money spent on that product in 2012 should be X1 not X2

Originally posted by Richard Williams View Post

This confuses me. You also say

If Y = 0, how are you spending any money on the product?

Sorry for my bad english. My dataset has data in only one year 2012 so maybe using "per" here is not correct.
About variables,
Y = 0/1: Y = 1 means a customer bought product A and Y = 0 means a customer bought product B
X1 : the amount of money spent on that product in 2012
X2, X3,...: characteristic variable
Therefore,
Line 1: Y = 1, X1 = 1000, same X2, X3, ... means that person P0001 did bought product A in 2012 and the money he spent on that product in that year is 1000
Line 2: Y = 0, X1 = 3000, same X2, X3, ... means that person P0001 did bought product B in 2012 and the money he spent on that product in that year is 3000
Comment

Announcement

Problem with logit model

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment