Hi all,
I am doing fracreg in stata on two subsamples defined by a dummy variable that indicates wheter an observation is above or below the mean (in this case the dummy variable is meanippr: the mean of the ip protection among the countries where the companies in the sample are operating in).
I used the following stata command:
by meanippr, sort: fracreg logit turnmar_1 breadth breadthsq depth depthsq breadthxdepth size_2 lnrrd_rat co empud gp fun enm
> gdpc a_dum b_dum d_dum e_dum f_dum g_dum h_dum i_dum j_dum k_dum l_dum m_dum n_dum p_dum q_dum r_dum s_dum i.country_id
where:
turnmar_1 is the innovation performance of the company
breadth is on a 0-11 scale
depth is on a 0-11 scale
depthsq is the square of depth
breadthsq is the suare of breadth
breadthxdepth is the interaction effect of both
size_2 is a dummy variable (above or below a certain number of employees)
lnrrd_rat is the ln(r&d expenses)
co is a dummy indicating collaboration agreements by firms
empud: on a 0-6 scale this defines % intervals for the number of highly educated staff within the organization
gp: dummy that indicates wheter the company is part of a group
fun: dummy that indicates recent financial support by the government
enm: dummy that indicates recent mergers or acquisitions by the company
gdpc: the gdp per capita of the country the firm is headquartered (ranging from approximately 30000 to 65000)
after issuing the command, stata reports the following:
Iteration 0: log pseudolikelihood = -2306.4939 (not concave)
....
and keeps on going.
I tried several tips, such as the rescaling of the variable gdpc (since this is on a quite large scale compared to the other variables), looking at unreasonably coefficients and standard errors, and using models excluding certain variables.
Note that the model works perfectly in the following cases:
Not doing the regression by subset, i.e.,
fracreg logit turnmar_1 breadth breadthsq depth depthsq breadthxdepth size_2 lnrrd_rat co empud gp fun enm gdpc a_dum b_dum d_dum e_dum f_dum g_dum h_dum i_dum j_dum k_dum l_dum m_dum n_dum p_dum q_dum r_dum s_dum i.country_id
Or doing the subsettting,but excluding gdpc (or its rescaled or standardized variant), ie:
by meanippr, sort: fracreg logit turnmar_1 breadth breadthsq depth depthsq breadthxdepth size_2 lnrrd_rat co empud gp fun enm a_dum b_dum d_dum e_dum f_dum g_dum h_dum i_dum j_dum k_dum l_dum m_dum n_dum p_dum q_dum r_dum s_dum i.country_id
So the combination of splitting the sample by meanippr and including gdpc seems troublesome.
What might cause this problem, and how to solve it?
Rgrds,
Maarten
I am doing fracreg in stata on two subsamples defined by a dummy variable that indicates wheter an observation is above or below the mean (in this case the dummy variable is meanippr: the mean of the ip protection among the countries where the companies in the sample are operating in).
I used the following stata command:
by meanippr, sort: fracreg logit turnmar_1 breadth breadthsq depth depthsq breadthxdepth size_2 lnrrd_rat co empud gp fun enm
> gdpc a_dum b_dum d_dum e_dum f_dum g_dum h_dum i_dum j_dum k_dum l_dum m_dum n_dum p_dum q_dum r_dum s_dum i.country_id
where:
turnmar_1 is the innovation performance of the company
breadth is on a 0-11 scale
depth is on a 0-11 scale
depthsq is the square of depth
breadthsq is the suare of breadth
breadthxdepth is the interaction effect of both
size_2 is a dummy variable (above or below a certain number of employees)
lnrrd_rat is the ln(r&d expenses)
co is a dummy indicating collaboration agreements by firms
empud: on a 0-6 scale this defines % intervals for the number of highly educated staff within the organization
gp: dummy that indicates wheter the company is part of a group
fun: dummy that indicates recent financial support by the government
enm: dummy that indicates recent mergers or acquisitions by the company
gdpc: the gdp per capita of the country the firm is headquartered (ranging from approximately 30000 to 65000)
after issuing the command, stata reports the following:
Iteration 0: log pseudolikelihood = -2306.4939 (not concave)
....
and keeps on going.
I tried several tips, such as the rescaling of the variable gdpc (since this is on a quite large scale compared to the other variables), looking at unreasonably coefficients and standard errors, and using models excluding certain variables.
Note that the model works perfectly in the following cases:
Not doing the regression by subset, i.e.,
fracreg logit turnmar_1 breadth breadthsq depth depthsq breadthxdepth size_2 lnrrd_rat co empud gp fun enm gdpc a_dum b_dum d_dum e_dum f_dum g_dum h_dum i_dum j_dum k_dum l_dum m_dum n_dum p_dum q_dum r_dum s_dum i.country_id
Or doing the subsettting,but excluding gdpc (or its rescaled or standardized variant), ie:
by meanippr, sort: fracreg logit turnmar_1 breadth breadthsq depth depthsq breadthxdepth size_2 lnrrd_rat co empud gp fun enm a_dum b_dum d_dum e_dum f_dum g_dum h_dum i_dum j_dum k_dum l_dum m_dum n_dum p_dum q_dum r_dum s_dum i.country_id
So the combination of splitting the sample by meanippr and including gdpc seems troublesome.
What might cause this problem, and how to solve it?
Rgrds,
Maarten
Comment