Collecting marginal effects after probit estimation with a series of dummy variables

Andy Hallot

Join Date: Jan 2019
Posts: 2

Collecting marginal effects after probit estimation with a series of dummy variables

11 Feb 2019, 15:47

Hi Statalist,

I'm attempting to estimate a probit model on the likelihood of a patent being accepted (based on only a small handful of firm/patent characteristics). Presenting the probit results from this regression is relatively straightforward, but because the majority of my variables are dummy variables, I'm having a few problems. Here is a copy of my data.

Sealed	logavgmktcap	patents_on_issue	ipcA	ipcB	ipcC	ipcD	ipcE	ipcF	ipcG	ipcH
1	23.73145	4	0	0	0	0	0	0	1	0
0	26.00386	6	0	0	1	0	0	0	0	0
1	26.2179	1	0	0	0	0	0	1	0	0
1	24.43442	10	0	0	1	0	0	0	0	0
1	26.354576	1	0	0	1	0	0	0	0	0
0	26.117344	8	0	0	1	0	0	0	0	0
0	25.99842	11	0	0	1	0	0	0	0	0
1	25.34848	4	0	0	1	0	0	0	0	0
1	26.00386	6	0	0	0	0	0	0	0	1
1	26.117344	8	0	0	0	0	0	1	0	0
1	25.68289	7	0	0	0	0	1	0	0	0
0	25.68289	7	0	0	1	0	0	0	0	0
0	25.99842	11	0	0	0	0	0	0	1	0
1	25.34848	4	0	0	1	0	0	0	0	0
1	25.99842	11	0	0	0	0	0	1	0	0
1	24.97608	6	0	0	1	0	0	0	0	0
1	24.97608	6	0	0	1	0	0	0	0	0

The dependent variable is an indicator that takes on the value of 1 if a patent is granted and IpcA through H are dummy variables to identify the patent technology field (physics, materials etc.). Naturally I had to drop one of the IPC class variable to avoid perfect collinearity in the probit regression.

How would I go about estimating the (conditional) probability of each unique combination of the above variables and saving them down as a new variable to include in a separate regression? One of the above dummies will not be found in the list of covariates during post-estimation.

Thanks,

Andy

Last edited by Andy Hallot; 11 Feb 2019, 15:50.

Tags: average effects, conditional probability, Marginal Effects, probit

Clyde Schechter

Join Date: Apr 2014

Posts: 30169
#2

11 Feb 2019, 17:10

So I imagine you spent a fair amount of time hand-coding those ipc* variables to use as indicators ("dummies") in your probit regression. That just made things harder for you. So the code below, takes those variables and synthesizes them into a single variable, called ipc, that takes on 8 levels (1 through 8) corresponding to the different ipca-ipch being 1. (You may well have started out with such a variable in the first place, in which case just bring it back, don't bother re-calculating.) Then we let Stata's factor variable notation do the work for you:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte sealed float logavgmktcap byte(patents_on_issue ipca ipcb ipcc ipcd ipce ipcf ipcg ipch) 1 23.73145 4 0 0 0 0 0 0 1 0 0 26.00386 6 0 0 1 0 0 0 0 0 1 26.2179 1 0 0 0 0 0 1 0 0 1 24.43442 10 0 0 1 0 0 0 0 0 1 26.354576 1 0 0 1 0 0 0 0 0 0 26.117344 8 0 0 1 0 0 0 0 0 0 25.99842 11 0 0 1 0 0 0 0 0 1 25.34848 4 0 0 1 0 0 0 0 0 1 26.00386 6 0 0 0 0 0 0 0 1 1 26.117344 8 0 0 0 0 0 1 0 0 1 25.68289 7 0 0 0 0 1 0 0 0 0 25.68289 7 0 0 1 0 0 0 0 0 0 25.99842 11 0 0 0 0 0 0 1 0 1 25.34848 4 0 0 1 0 0 0 0 0 1 25.99842 11 0 0 0 0 0 1 0 0 1 24.97608 6 0 0 1 0 0 0 0 0 1 24.97608 6 0 0 1 0 0 0 0 0 end // VERIFY THAT IPCA-IPCH ARE MUTUALLY EXCLUSIVE AND EXHAUSTIVE egen checksum = rowtotal(ipc*) assert checksum == 1 drop checksum foreach v of varlist ipc* { assert inlist(`v', 0, 1) } // CONERT IPC TO A MULTI-LEVEL CATEGORY VARIABLE ds ipc* local ipcs `r(varlist)' local nvars: word count `ipcs' capture label drop ipc gen ipc = . forvalues i = 1/`nvars' { local varname: word `i' of `ipcs' replace ipc = `i' if `varname' == 1 label define ipc `i' "`varname'", add } label values ipc ipc probit sealed logavgmktcap i.ipc margins ipc

The output of the -margins- command gives you the predicted probabilities of sealed, adjusted for logavgmktcap, in each category of ipc (i.e. corresponding to each of the variables ipca-ipch being 1.) -margins- is a very powerful and complicated command, but you can get a really lucid introduction to it from the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf.

In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.

Added: This code will not produce satisfying results in the example data set because it is too limited. For example, it contains no observations with ipca or ipcb or ipcd = 1. Consequently those "variables" are constant, so they get omitted. But assuming that your full data covers the spectrum, you won't encounter that problem. You will still have one level of ipc omitted from the probit output (probably ipca)--but that is normal and necessary and, as you can see from the -margins- output, it does not in any way prevent you from getting predictions for ipca.

Last edited by Clyde Schechter; 11 Feb 2019, 17:13.
1 like
Comment
Andy Hallot

Join Date: Jan 2019

Posts: 2
#3

11 Feb 2019, 18:22

Exactly what I needed Clyde. Much appreciated.

I did use -dataex- for that output, but thought it made more sense in table form. I will post raw output code in the future.

Thanks

Andy
Comment

Announcement

Collecting marginal effects after probit estimation with a series of dummy variables

Comment

Comment