Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collecting marginal effects after probit estimation with a series of dummy variables

    Hi Statalist,

    I'm attempting to estimate a probit model on the likelihood of a patent being accepted (based on only a small handful of firm/patent characteristics). Presenting the probit results from this regression is relatively straightforward, but because the majority of my variables are dummy variables, I'm having a few problems. Here is a copy of my data.

    Sealed logavgmktcap patents_on_issue ipcA ipcB ipcC ipcD ipcE ipcF ipcG ipcH
    1 23.73145 4 0 0 0 0 0 0 1 0
    0 26.00386 6 0 0 1 0 0 0 0 0
    1 26.2179 1 0 0 0 0 0 1 0 0
    1 24.43442 10 0 0 1 0 0 0 0 0
    1 26.354576 1 0 0 1 0 0 0 0 0
    0 26.117344 8 0 0 1 0 0 0 0 0
    0 25.99842 11 0 0 1 0 0 0 0 0
    1 25.34848 4 0 0 1 0 0 0 0 0
    1 26.00386 6 0 0 0 0 0 0 0 1
    1 26.117344 8 0 0 0 0 0 1 0 0
    1 25.68289 7 0 0 0 0 1 0 0 0
    0 25.68289 7 0 0 1 0 0 0 0 0
    0 25.99842 11 0 0 0 0 0 0 1 0
    1 25.34848 4 0 0 1 0 0 0 0 0
    1 25.99842 11 0 0 0 0 0 1 0 0
    1 24.97608 6 0 0 1 0 0 0 0 0
    1 24.97608 6 0 0 1 0 0 0 0 0

    The dependent variable is an indicator that takes on the value of 1 if a patent is granted and IpcA through H are dummy variables to identify the patent technology field (physics, materials etc.). Naturally I had to drop one of the IPC class variable to avoid perfect collinearity in the probit regression.

    How would I go about estimating the (conditional) probability of each unique combination of the above variables and saving them down as a new variable to include in a separate regression? One of the above dummies will not be found in the list of covariates during post-estimation.

    Thanks,

    Andy
    Last edited by Andy Hallot; 11 Feb 2019, 15:50.

  • #2
    So I imagine you spent a fair amount of time hand-coding those ipc* variables to use as indicators ("dummies") in your probit regression. That just made things harder for you. So the code below, takes those variables and synthesizes them into a single variable, called ipc, that takes on 8 levels (1 through 8) corresponding to the different ipca-ipch being 1. (You may well have started out with such a variable in the first place, in which case just bring it back, don't bother re-calculating.) Then we let Stata's factor variable notation do the work for you:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte sealed float logavgmktcap byte(patents_on_issue ipca ipcb ipcc ipcd ipce ipcf ipcg ipch)
    1  23.73145  4 0 0 0 0 0 0 1 0
    0  26.00386  6 0 0 1 0 0 0 0 0
    1   26.2179  1 0 0 0 0 0 1 0 0
    1  24.43442 10 0 0 1 0 0 0 0 0
    1 26.354576  1 0 0 1 0 0 0 0 0
    0 26.117344  8 0 0 1 0 0 0 0 0
    0  25.99842 11 0 0 1 0 0 0 0 0
    1  25.34848  4 0 0 1 0 0 0 0 0
    1  26.00386  6 0 0 0 0 0 0 0 1
    1 26.117344  8 0 0 0 0 0 1 0 0
    1  25.68289  7 0 0 0 0 1 0 0 0
    0  25.68289  7 0 0 1 0 0 0 0 0
    0  25.99842 11 0 0 0 0 0 0 1 0
    1  25.34848  4 0 0 1 0 0 0 0 0
    1  25.99842 11 0 0 0 0 0 1 0 0
    1  24.97608  6 0 0 1 0 0 0 0 0
    1  24.97608  6 0 0 1 0 0 0 0 0
    end
    
    
    //    VERIFY THAT IPCA-IPCH ARE MUTUALLY EXCLUSIVE AND EXHAUSTIVE
    egen checksum = rowtotal(ipc*)
    assert checksum == 1
    drop checksum
    foreach v of varlist ipc* {
        assert inlist(`v', 0, 1)
    }
    
    //    CONERT IPC TO A MULTI-LEVEL CATEGORY VARIABLE
    ds ipc*
    local ipcs `r(varlist)'
    local nvars: word count `ipcs'
    capture label drop ipc
    gen ipc = .
    forvalues i = 1/`nvars' {
        local varname: word `i' of `ipcs'
        replace ipc = `i' if `varname' == 1
        label define ipc `i' "`varname'", add
    }
    label values ipc ipc
    
    probit sealed logavgmktcap i.ipc
    margins ipc
    The output of the -margins- command gives you the predicted probabilities of sealed, adjusted for logavgmktcap, in each category of ipc (i.e. corresponding to each of the variables ipca-ipch being 1.) -margins- is a very powerful and complicated command, but you can get a really lucid introduction to it from the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Added: This code will not produce satisfying results in the example data set because it is too limited. For example, it contains no observations with ipca or ipcb or ipcd = 1. Consequently those "variables" are constant, so they get omitted. But assuming that your full data covers the spectrum, you won't encounter that problem. You will still have one level of ipc omitted from the probit output (probably ipca)--but that is normal and necessary and, as you can see from the -margins- output, it does not in any way prevent you from getting predictions for ipca.
    Last edited by Clyde Schechter; 11 Feb 2019, 17:13.

    Comment


    • #3
      Exactly what I needed Clyde. Much appreciated.

      I did use -dataex- for that output, but thought it made more sense in table form. I will post raw output code in the future.

      Thanks

      Andy

      Comment

      Working...
      X