logit model with very large odds ratio

Alice Yang

Join Date: Mar 2022
Posts: 69

logit model with very large odds ratio

16 Aug 2022, 09:40

Hello!

I'm running a logit regression, the DV action is a dummy variable, it =1 if a firm conducts a certain action and 0 otherwise. The IV L.return is a continuous variable for a firm's stock return, lagged at year t-1. I also have some control variables, some are continuous some are dummy, and they are all lagged at year t-1. I get a very large coefficient and odds ratio for L.return, I think this is probably because only 6.8% of the action dummy has value =1, most of the observations have action =0, so the data is extremely unbalanced (?). I wonder how can I work around this problem. Thanks a lot for any help!

Code:

------------------------------------------------------------------------------------
                   |               Robust
             action|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
             return|
               L1. |   3.348689   1.099598     3.05   0.002     1.193517    5.503862
                   |

Code:

------------------------------------------------------------------------------------
                   |               Robust
             action| Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
             return|
               L1. |    28.4654    31.3005     3.05   0.002     3.298662    245.6387

Tags: None

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

16 Aug 2022, 09:56

Recall that the interpretation of the regression is that each unit change in an independent variable is associated with, for example, 28.45 times higher odds of the firm doing whatever you modeled. Now, think about how your explanatory variable is scaled. I understand that stock returns are typically scaled in percents. Is your return variable coded in percentage points, or is it coded more like a fraction, e.g. a return of 100 percentage points is coded as 1?

I think you can see where I’m going with this. If you coded it as 1 = a 100 percentage point return, then a one-unit change in the explanatory variable is a very large change, which is pretty rare. That’s one possible explanation.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

16 Aug 2022, 09:59

Tell us something about the values that return takes. Run

Code:

summarize return, detail

and copy the output into a new reply on this topic.
Comment

Alice Yang

Join Date: Mar 2022
Posts: 69

16 Aug 2022, 10:07

Hi Weiwen and William,

Thanks a lot for your quick reply. Here are the details for the return variable. If a firm's stock return is 1.2% then in my sample it is expressed as 0.012.

Code:

-------------------------------------------------------------
      Percentiles      Smallest
 1%      .025641              0
 5%     .0487013              0
10%     .0589971              0       Obs              27,900
25%     .0818505              0       Sum of Wgt.      27,900

50%     .1084337                      Mean           .1325732
                        Largest       Std. Dev.      .0777148
75%     .1666667       .4266667
90%     .2477876       .4266667       Variance       .0060396
95%     .2916667       .4266667       Skewness        1.32929
99%     .3870968       .4266667       Kurtosis       4.612329

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

16 Aug 2022, 10:41

My post crossed with post #2 by Weiwen Ng who correctly anticipates the results you showed us.

The estimated odds ratio tells you that if the return is 1 the odds of the action being taken will be 28 times larger that it would be if the return is 0.

The problem is that the units of your return are small relative to 1, so that the effect inferred from the coefficient is unrealistically large, because returns are never going to differ by 1.

If you were to report return in percentage points - so .012 becomes 1.2 - then your coefficient estimate will be reduced to 0.03348689 (and the standard error and confidence interval similarly) and your odds ratio will change from e^3.348689 = 28.46 to e^0.03348689 = 1.03.

Alternatively, you can learn about the margins command and use it to present a more meaningful interpretation of your results.
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#6

16 Aug 2022, 17:39

I don't know how margins works with lagged variables. But anyway, William's point about using margins is a good one. If you consider margins, you want to remember that margins reports on the probability scale. Epidemiologists call this a risk difference. It is literally reporting a difference in probabilities, with probability scaled 0 to 1.

The Stata forum's Richard Williams has a nice explainer here. The manual for margins is also good. Specific to your command (remember I don't know if the syntax works exactly with lagged variables), you might type something like:

Code:

margins, at(return = (0 (0.01) 0.4)) marginsplot

This means give me the probability of the event, setting the past-year returns at 0%, 1%, 2% ... 40%. And then plot them.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Alice Yang

Join Date: Mar 2022

Posts: 69
#7

16 Aug 2022, 19:44

Hi William and Weiwen,

Thanks a lot for your suggestions! After changing return to percentage points the results look more reasonable. I will try margins as well.
Comment

Announcement

logit model with very large odds ratio

Comment

Comment

Comment

Comment

Comment

Comment