Three dimensional Logit model

Chris Cha

Join Date: Jul 2016

Posts: 4
#1

Three dimensional Logit model

10 Jul 2016, 13:28

Good afternoon,

As part of my dissertation, I need to run a Logit model in order to calculate the probability that an analyst will revise and change his forecast should news surface during the period between his initial forecast and the actual earnings announcement.

I have a dataset with information on each analyst for each firm, at each point in time.

More specifically, I have the variables:
Independent: Probability of Revision ( where the revision variable is a dummy that is 1 if a revision takes place, 0 otherwise)
Dependent: value of a news-related sentiment index that I created
the existence of news on that particular day
number of analysts making forecasts for the firm
age and market cap

I have very little knowledge of Stata and I was wondering how I could use it to test this.
One of the main problem, besides running the actual test, is merging the data. The dates are the same for all firms but every other piece of information is different (even the revisions variable takes different values for each firm and I need to somehow get one result on the probability of revisions for all firms).

Any input would be greatly appreciated and for any questions/clarifications I will try to answer as soon as possible.
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4396
#2

10 Jul 2016, 18:14

Your description of your problem (and your data) isn't very clear. I guess that you have two questions. The first relates to merging two (or more?) datasets into one for analysis. The second appears to relate to "how . . . to test this", where "this" seems to refer to whether the probability of a financial analyst's changing a forecast is zero (null hypothesis) or greater than zero (alternative hypothesis) over a period of time. I'll answer that one for you now: it's greater than zero if the analyst is alive; the point null is true otherwise.

Regarding the first, it would be helpful if you list a representative sample of your datasets and and an example of what you'd like to see after merging them so that others on the list can see exactly what it is that you're trying to accomplish.

Once that becomes clear, then others might also be able to help you with your second question, the answer to which I'm guessing will involve something like melogit with random effects for at least firm, and possibly for analyst, as well, cross-classified.

P.S. In your dissertation, I recommend describing forecast revision as the dependent (or outcome or response) variable, and the others as the independent (explanatory or predictor) variables. Your committee will be less confused that way.

Last edited by Joseph Coveney; 10 Jul 2016, 18:16.
Comment
Chris Cha

Join Date: Jul 2016

Posts: 4
#3

11 Jul 2016, 08:39

I made some mistakes when I described the problem, I apologise.

The revisions is indeed the DEPENDENT variable
and the rest are the independent variables.

My excel file, after I'm done cleaning up the data will look something like this:

(EXAMPLE)

FIRM NAME DATE Revisions (REV) Value of News Sentiment Index (SI) News (N)

1 19950102 0 0.27 1

1 19950103 1 0.45 1

2 19950102 0 0 0

3 19970711 0 0.12 1

3 19970712 1 -0.35 1

My model looks something like: Prob(REVijt) = a0 + d0 * N + b1 * N * SI + {Controls}, where i = firm, t= time

I do not know how to run the Logit model in Stata and whether this format is appropriate/readable by the programme in order to give me the results I need.

What I seek is to figure out whether the existence of news and the sentiment value that results from them, affects the probability that an analyst would consider revising a forecast he has already made.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4396
#4

11 Jul 2016, 21:35

Your description leaves a lot of unanswered questions whose answers could be pertinent. For example:

1. Do you plan to fit a separate logistic regression model to data for each of your analysts? (I assume that analyst is the j in Prob(REVijt).) I ask, because it looks from your example data listing that you have a separate Excel worksheet for each analyst (there is no column for analyst).

2. If there is no news and no forecast revision, then how do you assign a date for the nonevent, as in the sole row of data for Firm 2? Is it the date of "actual earnings announcement"?

3. You show time as a date-like long integer, but wouldn't it be more relevant to render time as elapsed time (from the analyst's original forecast)?

4. There is no time variable in your proposed model. Wouldn't elapsed time, itself, be pertinent in predicting the occurrence of a forecast revision? (See my comment above about analysts alive and otherwise.) Is elapsed time among the "{Controls}"? If so, wouldn't you want a term for interaction of elapsed time and news or sentiment index in order to capture any acceleration of the temporal trajectory by the event of news and its sentiment?

5. Your proposed model has a term for news and a term for news × sentiment index interaction. But there is no term for sentiment index, one of the constituents of the interaction term. Is that intentional?

6. I assume that there are news items with a sentiment index of exactly zero, so that the news indicator variable is not redundant.

7. Have you considered modeling forecast revision with time-to-event estimation commands (help st)?

8. In your example, you show at most a single forecast revision for each firm. Is that correct? (You can still use survival analysis even with multiple forecast revisions.)

As another postscript, you seem to have decided in the interim to do your data wrangling in Excel workbooks. Be careful of the pitfalls in going that route. You might want to reconsider using Stata for your data management as well as for fitting the model.
Comment
Chris Cha

Join Date: Jul 2016

Posts: 4
#5

15 Jul 2016, 11:07

Thank you very much for your questions.

1) I might run a separate logistic regression for each analyst, I have not decided yet.

2) That date was just an example. In my sample there are a lot of dates that "nothing happens" with regards to these two variables, but they contain information about the control variables.

3,4) The elapsed time is not relevant yet. It might indeed be one of the Control Variables.

5,6) The News x SI interaction might actually be redundant, considering the fact that SI has a value only on days with news, so it will just be b1 * SI.

7) not sure what you mean, sorry.

8) The Revisions shown is not the number of revisions, but instead: 1: existence of revisions, 0: no revisions at all

What are the problems with data manipulation with excel? I am actually also using R for the more complicated variables, considering this is a very large sample (over 3 million observations)
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4396
#6

15 Jul 2016, 18:49

You could consider something like the following for a start.

Code:

import excel Data.xlsx, sheet(EXAMPLE) firstrow xtlogit REV i.N c.SI, i(FIRM_NAME) fe
Comment
Chris Cha

Join Date: Jul 2016

Posts: 4
#7

18 Jul 2016, 12:49

Thank you! I will try it out as soon as I'm done with data manipulation.
Comment

FIRM NAME	DATE	Revisions (REV)	Value of News Sentiment Index (SI)	News (N)
1	19950102	0	0.27	1
1	19950103	1	0.45	1
2	19950102	0	0	0
3	19970711	0	0.12	1
3	19970712	1	-0.35	1

Announcement

Three dimensional Logit model

Comment

Comment

Comment

Comment

Comment

Comment