Hi everyone,
I am studying group behaviour such as herding and the impact of these “biases” on the decision-making behaviour of individuals. My data set consists of 360 daily observations of investments made by individuals in firms (the same individual can invest in multiple firms -> not one-to-many but many-to-many). I extended the data set to not only include the realized investments (~24,000) but also unrealized ones representing potential alternative investments individuals decided not to pursue (~1,000,000).
My data looks as follows: I look at realized (tie = 1) and unrealized (tie = 0) investments made by investors (invest_id) into companies (name) that exhibit certain characteristics at t-1 such as the number of already committed investors (lag_inv_co~t). To restrict the sample size I randomly selected 5 unrealized ties for each realized one (as to prior literature).
As I want to study effects such as herding, best measured with lagged variables, I want to control as best as possible for any unobserved time-invariant heterogeneity across firms (“name”) and investors (“invest_id”).
Naturally I thought of using two fixed effects on firm and investor-level, which is, though, very problematic in context of a logistic regression (incidental parameter problem: would be fine for the “name” FE as T> ~40, but for investor FE with T=6 this leads very likely to biased coefficients).
A natural alternative would be to estimate a conditional logit, but there doesn’t exist any practical or even theoretical implementations to date that consider two+ (and not one) FE (in addition to the time FE). Additionally, I want to cluster SE across both FE which is also complicated to implement in context of a conditional logit (the clus_nway package exists, but does not work with clogit).
In a perfect world I would be looking for a command like the following:
clogit tie lag_raised_amount lag_inv_count lag_target i.day i.date, group(name invest_id) vce(cluster name invest_id)
-> clogit only allows for one fixed effect
or
clogit tie lag_raised_amount lag_inv_count lag_target i.day i.date i.name, group(invest_id) vce(cluster name invest_id)
-> clogit works assuming coefficient estimates are not biased as T>40 for i.name; but clustered standard errors are only allowed for variables that also appear in group()
Ideally this would also account for the rareness of the events (e.g., via relogit), as in the whole data set there are about 2.4% realized events.
Has anyone an idea how I could solve this problem? Is there maybe a totally different approach of how to model this data and still be able to draw a conclusion about how individuals are influenced in their investment decision by the number of committed investors in t-1 (i.e., herding)?
Any hints are appreciated!
Jan
I am studying group behaviour such as herding and the impact of these “biases” on the decision-making behaviour of individuals. My data set consists of 360 daily observations of investments made by individuals in firms (the same individual can invest in multiple firms -> not one-to-many but many-to-many). I extended the data set to not only include the realized investments (~24,000) but also unrealized ones representing potential alternative investments individuals decided not to pursue (~1,000,000).
My data looks as follows: I look at realized (tie = 1) and unrealized (tie = 0) investments made by investors (invest_id) into companies (name) that exhibit certain characteristics at t-1 such as the number of already committed investors (lag_inv_co~t). To restrict the sample size I randomly selected 5 unrealized ties for each realized one (as to prior literature).
As I want to study effects such as herding, best measured with lagged variables, I want to control as best as possible for any unobserved time-invariant heterogeneity across firms (“name”) and investors (“invest_id”).
Naturally I thought of using two fixed effects on firm and investor-level, which is, though, very problematic in context of a logistic regression (incidental parameter problem: would be fine for the “name” FE as T> ~40, but for investor FE with T=6 this leads very likely to biased coefficients).
A natural alternative would be to estimate a conditional logit, but there doesn’t exist any practical or even theoretical implementations to date that consider two+ (and not one) FE (in addition to the time FE). Additionally, I want to cluster SE across both FE which is also complicated to implement in context of a conditional logit (the clus_nway package exists, but does not work with clogit).
In a perfect world I would be looking for a command like the following:
clogit tie lag_raised_amount lag_inv_count lag_target i.day i.date, group(name invest_id) vce(cluster name invest_id)
-> clogit only allows for one fixed effect
or
clogit tie lag_raised_amount lag_inv_count lag_target i.day i.date i.name, group(invest_id) vce(cluster name invest_id)
-> clogit works assuming coefficient estimates are not biased as T>40 for i.name; but clustered standard errors are only allowed for variables that also appear in group()
Ideally this would also account for the rareness of the events (e.g., via relogit), as in the whole data set there are about 2.4% realized events.
Has anyone an idea how I could solve this problem? Is there maybe a totally different approach of how to model this data and still be able to draw a conclusion about how individuals are influenced in their investment decision by the number of committed investors in t-1 (i.e., herding)?
Any hints are appreciated!
Jan
Comment