Include panel data xtset in bivariate probit

Matthias Enichlmayr

Join Date: Jun 2014

Posts: 31
#1

Include panel data xtset in bivariate probit

28 Jul 2014, 06:56

Dear all,

I already run a static bivariate probit model using coverage and claims as the two dependant variables on a number of observables like sex, type_of_car, age, age_of_the_car etc.
It looks like this: biprobit coverage claims sex type_of_car age age_of_the_car ... for the year 2011

Since my data set contains information on the years 2005 - 2011, I want to use this information in a panel data regression
I want to focus on data from 2008 - 2011. Thus, I dropped all observations before 2008 and then I used the command "xtset id_of_insuree year" to tell Stata that I want to make use of the panel data option.
But how do I implement this now in my bivariate probit model?

Do I just repeat the same comand as before or do I have to include xtset in the analysis?

Thanks for your help

Cheers
Tags: None
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#2

28 Jul 2014, 11:14

Hi,

first note that it is custom here to use full names, not user names, and it would be very much appreciated if you would click on contact us on the bottom right of the page and ask for your username to be changed to your full name.

I'm not aware that there is a Stata command that does panel bivariate probit like you want (someone please correct me if I'm wrong here). Having said that, if you run the bivariate probit like you suggest is basically a pooled probit that doesn't account for any grouping. This is fine if there are no insuree specific effects.

If you want to run a bivariate probit with random effects at the xtset_id_of_insuree level, you can use the user written cmp command (SSC). There is an example in the help file of a bivariate probit, and also examples of how to account for random effects, so you can figure out from the help how to go about it.

For a fixed-effects estimation I'm not sure that with binary dependent variables demeaning the variables by group (insuree in your case) would work, I actually think it won't, so the only suggestion that pops into mind is to create the dummy (binary) variables for xtset_id_of_insuree and run the model with the dummy variables. It may not converge easily, depending on the number of insurees you have in your data, i.e. the number of dummies you would have to include. You can use the regular bivariate probit estimation you did including the dummy variables for this, or cmp as well. They would both work.

As for a between estimation of the model I do not know what to suggest.

I hope this helps,

Alfonso.

Alfonso Sanchez-Penalver
Comment
Matthias Enichlmayr

Join Date: Jun 2014

Posts: 31
#3

28 Jul 2014, 15:28

Hi Alfonso,

thanks for your reply.
I am not quite sure that I understood you correctly.
Enclosed you find an excerpt of the paper by Chiappori and Salanié (2000)
One of their proposed models is "A pair of probits". With panel data I can easily regress the two separate probits on page 2.
The next step would be to see if the error terms are correlated. Shouldn't it be possible with the predicted value option?
Or do you have a different idea how this might work? The theory is enclosed in the attachment, I simply don't find out how to implement the steps in data.

Really appreciate your help

Matthias

Matthias Enichlmayr
Attached Files

A pair of probits.pdf (76.0 KB, 1 view)
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#4

28 Jul 2014, 16:24

Matthias, I have told you how to do the bivariate probit estimation with three of the four methods of estimation using panel data: pooled, fixed effects, and random effects. When you say you want to estimate the bivariate probit with panel data, my question now is which of these estimation methods do you think is the most appropriate to use with your data and why? I personally would do the three of them, and compare the coefficients.

The extract you include does not deal with panel data, simply with the decision of estimating individual probit estimations or a bivariate one, which as you mention is whether the correlation of the errors across equations is different than zero. Any bivariate probit estimation, independent of which of the three methods you choose, will include an estimate of the correlation of the errors in the two equations. Thus this test is easily determined from the results.

Alfonso Sanchez-Penalver
Comment
Matthias Enichlmayr

Join Date: Jun 2014

Posts: 31
#5

29 Jul 2014, 03:43

Okay, I think I will start with the fixed effects method. Since I have almost 7k insurees. Do I have to include a dummy for each individual insuree? If so, if don't know how this should work.
You are right, the extract I attached does not include panel data. But should't it work nevertheless?
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#6

29 Jul 2014, 09:10

The easiest way to generate the dummies is with the tabulate command so

Code:

tabulate id_of_insuree, generate(insuree)

This will generate the series of dummy variables with the name insuree# where # will be a consecutive number from 1 to the last different insuree you have (say 50 if you were to have 50 different insurees). Then you can do your estimation using insuree1-insuree49 as explanatory (indpendent) variables (you need to drop one so that you avoid collinearity with the constant term).

Let me know if this helps.

Alfonso Sanchez-Penalver
Comment
Matthias Enichlmayr

Join Date: Jun 2014

Posts: 31
#7

29 Jul 2014, 10:32

your command works.

Now I have 141 dummies for 141 individual policyholders
So if I stick to the bivariate probit model, I use the following command:

1) biprobit type_of_coverage accidents_per_year sex type_of_car age_of_polidyholder age_of_the_car (...) insuree1 insuree2 .... insuree140

Is this possible to estimate panel data via biprobit or is this not possible since I am not using xtprobit or xtlogit in this first specification?

Or alternatively, as you suggested in my other post, I use the xtlogit command

2) xtlogit type_of_coverage accidents_per_year sex type_of_car age_of_polidyholder age_of_the_car (...) insuree1 insuree2 .... insuree140

I assume that I have to use the "xtset id_of_insuree year" command before the xtlogit regression.

Always keep in mind: I want to estimate the effect increased insurance coverage has on the probability of having an accident.
The idea is to empirically estimate the effects of moral hazard and adverse selection on the german automobile insurance market
Comment
Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 432
#8

29 Jul 2014, 10:59

Matthias, I've just responded in the other post. Read what I suggested there and try what I suggested there. It will clear up what to do somewhat, I hope and then we can tackle the specific estimations. As for commodity when including the new dummies as explanatory variables you can use insuree1-insuree140, instead of writing them all in, or insuree* if there is no other variable that starts with insuree and you don't want to include. This last one would include all 141 dummies, but Stata is clever enough to drop one of them when you do that.

Alfonso Sanchez-Penalver
Comment

Announcement

Include panel data xtset in bivariate probit

Comment

Comment

Comment

Comment

Comment

Comment

Comment