I am conducting a event study and I have to show abnormal returns of 100 different compananies in a graph. Companies are displayed with the variable firmid. These firms have to be dividend in quartiles and the event takes place at t=0. The estimation + event window (eventtime) starts at t=-79 until t=10. The quartiles are divided based on earnings surprise (ES). After we have used the command: xtile eq_4 = ES, nq(4) we divide the firms into 4 different quartiles and after this we create a new variables that shows per ES and firm (firmid) if the earnings surprise is in the first, second, third or fourth quartile. The problem is dat the ES only shows a value at t=0 and the total eventtime is from -79 until 10 per company. I need a command for the range per company for eventtime -79 until 10 that shows the same value as in t=0. This command should fill the missing data per firmid of all the eventtimes and make them equal to the quartile number at t=0. For more details look in the attachments. I hope someone can help me out.

Kind regards,

Johan

]]>

1) Controlling for gender, regress years of education (yedu) and age on addwor. Note: To interpret the intercept in a meaningful manner, you need to center metric independent variables.

2) Does the effect of years of education or age have a stronger effect on addwor controlling for age?

Here you can find the dataset I use:

https://workupload.com/file/eeSAb4SK

And I could already achieve the following results:

__________________________________________________ ___

//Part 1//

use data1

//Part 2//

egen addwor=anycount (wor*), values (1)

tab addwor, mis

//Part 3//

egen addwor_m = rowmiss(wor*)

tabulate addwor if addwor_m == 0

//Part4//

mean addwor //The mean for the Addwor variable is 3.356866//

//Part5//

recode addwor (0/4 = 1) (5/8 = 2) (9/12 = 3), gen (addwor_r)

tab addwor_r

//Part 6//

tab addwor sex

*The highest value on overall worries corresponds to female respondents; they are as such more worried about the future*

//Part 7//

reg addwor sex

replace addwor = 0 if sex == 1

replace addwor = 1 if sex == 2

reg addwor sex

__________________________________________________ __

Could someone help me answer the questions?

Thank you very much!

]]>

Old 1 | Old 2 | New |

FE | 32 | FE32 |

ML | 21 | ML21 |

XC | ub | XCub |

12 | AP | 12AP |

Thank you!!

]]>

I have the following variables in string format:

x1 | x2 |

aaa [1] | ddd [xy] |

bbb [2] | eee [1] |

ccc [3] | fff [2] |

bbb [2] | eee [1] |

aaa [1] | ... |

How can I remove [...] (and the space) from each observation?

Thanks]]>

For my master's thesis I am conducting research about the effect of tax exemptions and subsidies on car purchase on the adoption rate of electric vehicles. For this I have formed a panel dataset containing data of 15 EU countries for 8 years.

My issue in doing this research at the moment is how to test for unit root in the data. I have used the Im-Pesaran-Shin test and found that the null hypothesis can be rejected, so at least one panel is stationary. However, all my panels have to be stationary, according to my supervisor. I do not know how I can test this. The IPS test is definitely the better one for me to use, because it allows the autoregressive parameter to vary between the panels. In the meantime I have considered using the Breitung test, but because of the fact that I use data from several countries which brings with it cultural and institutional differences I do not think that the Breitung test will be appropriate to use.

Is there a way to become sure about the (non-)stationarity by using the IPS test and some other test? Or is there another test that is appropriate to use when using a small panel dataset with macro- en micro-economic variables in it?

Thanks in advance for your time and advice.]]>

I am using PISA data of 2006 and 2012; the aim is to transform this somehow so I get school-level panel data for 2 periods. However, "SCHOOLID", the unique code for identifying different schools went from a 5-digits code in 2006 to a 7-digits code in 2012. Additionally, I cannot find information on how the codes are assigned to schools so (for now) I have no way to be sure that I can identify the same school in 2006 and 2012 in order to treat them as panel data.

First question: I realize this is not a PISA data forum but the other questions might be superfluous if someone knows the answer. Does anyone know if the "SCHOOLID" code of 2006 and 2012 are in any way linked so I that can simply identify the same school in both years by e.g. transforming the codes? From what I've found, OECD does not provide any additional information on the "SCHOOLID" code from what I can find.

Second question: Is there a different way to identify schools? Can I, for example, check all other variables (all +600 of them) if they behave the same as "SCHOOLID" as in they are different for students of different schools but identical for students of the same school? Any way to check this?

Third question: if it is not possible to do this, what would be an alternative grouping strategy? I reckon that if I use some sort of clustering, I can't treat the data as panel data anymore?

Last question: suppose I am able to identify the schools in some way, how then should I aggregate the PISA values on a school level with the plausible values? Someone suggested to average each of the 5 plausible values for every student of a specific school so that I still have 5 plausible values on school level to continue with. This seems counterintuitive to me because I read averaging plausible scores is never the way to go.

I would gladly accept any solutions, suggestions, remarks or criticism.]]>

xtset uid year

quietly xtreg lnTR_ETB $inputs

The uid is unique for each observation (5707). But the resultant outcome is an error message (insufficient observation). I have checked for missing variable and there is nothing.

Anyone who can advise me to sort out the problem ?

Thank you

]]>

I've got a question regarding a problem consisting of three variables. The 3 variables are "Treatment" (Control/Treatment), "Gender" and "# of desserts bought". How can I get the mean standard deviation and a significance test like with the chi2-test but for the three conditions. So: Is there a significant difference between the numbers of desserts bought by a female person under treatment or control?

Thank you for you help!]]>

I have the following question:

I would like to examine what factors determine the IPO size. But it is a two-step question: step 1: what determines whether the IPO is successful (in my sample there is a probability of IPO failure). 2. conditioning on an IPO success, we will explore the factors that will affect the money raised of an IPO.

Can I ask what type of model I can use to tackle this question? Thank you.

best

Joe ]]>

I'm trying to understand the difference between cell means and treatment effects estimation on discrete variables (exact macthing). Similar to here: https://blog.stata.com/2016/08/16/ex...on-adjustment/

Difference between cell means on a single "treatment" variable is the same as regressing outcome on same variable, as can be seen here:

Code:

cls clear all sysuse auto drop if inlist(rep78,1,2) Source | SS df MS Number of obs = 64 -------------+---------------------------------- F(1, 62) = 0.08 Model | 701899.735 1 701899.735 Prob > F = 0.7772 Residual | 538613171 62 8687309.21 R-squared = 0.0013 -------------+---------------------------------- Adj R-squared = -0.0148 Total | 539315071 63 8560556.68 Root MSE = 2947.4 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- foreign | Foreign | 220.4913 775.7051 0.28 0.777 -1330.121 1771.104 _cons | 6164.19 454.7974 13.55 0.000 5255.063 7073.318 ------------------------------------------------------------------------------ . margins r.foreign Contrasts of adjusted predictions Model VCE : OLS Expression : Linear prediction, predict() ------------------------------------------------ | df F P>F -------------+---------------------------------- foreign | 1 0.08 0.7772 | Denominator | 62 ------------------------------------------------ ------------------------------------------------------------------------ | Delta-method | Contrast Std. Err. [95% Conf. Interval] -----------------------+------------------------------------------------ foreign | (Foreign vs Domestic) | 220.4913 775.7051 -1330.121 1771.104 ------------------------------------------------------------------------ . . table foreign, c(mean price) ----------------------- Car type | mean(price) ----------+------------ Domestic | 6,164.2 Foreign | 6,384.7 ----------------------- . di 6384.7 - 6164.2 220.5

Code:

. reg price i.foreign##i.rep78 Source | SS df MS Number of obs = 59 -------------+---------------------------------- F(5, 53) = 0.44 Model | 19070228.2 5 3814045.63 Prob > F = 0.8204 Residual | 462156727 53 8719938.25 R-squared = 0.0396 -------------+---------------------------------- Adj R-squared = -0.0510 Total | 481226956 58 8297016.48 Root MSE = 2953 ------------------------------------------------------------------------------- price | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- foreign | Foreign | -1778.407 1797.111 -0.99 0.327 -5382.955 1826.14 | rep78 | 4 | -725.5185 1136.593 -0.64 0.526 -3005.235 1554.198 5 | -2402.574 2164.008 -1.11 0.272 -6743.024 1937.876 | foreign#rep78 | Foreign#4 | 2158.296 2273.185 0.95 0.347 -2401.136 6717.728 Foreign#5 | 3866.574 2925.484 1.32 0.192 -2001.204 9734.352 | _cons | 6607.074 568.2963 11.63 0.000 5467.216 7746.932 ------------------------------------------------------------------------------- . margins r.foreign Contrasts of predictive margins Model VCE : OLS Expression : Linear prediction, predict() ------------------------------------------------ | df F P>F -------------+---------------------------------- foreign | 1 0.13 0.7172 | Denominator | 53 ------------------------------------------------ ------------------------------------------------------------------------ | Delta-method | Contrast Std. Err. [95% Conf. Interval] -----------------------+------------------------------------------------ foreign | (Foreign vs Domestic) | -399.0574 1095.717 -2596.787 1798.672 ------------------------------------------------------------------------ . teffects nnmatch (price) (foreign), ematch(rep78) vce(iid) Treatment-effects estimation Number of obs = 59 Estimator : nearest-neighbor matching Matches: requested = 1 Outcome model : matching min = 2 Distance metric: Mahalanobis max = 27 ---------------------------------------------------------------------------------------- price | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------------+---------------------------------------------------------------- ATE | foreign | (Foreign vs Domestic) | -399.0574 943.6028 -0.42 0.672 -2248.485 1450.37 ----------------------------------------------------------------------------------------

Code:

. table foreign rep78, c(mean price) col ---------------------------------------------- | Repair Record 1978 Car type | 3 4 5 Total ----------+----------------------------------- Domestic | 6,607.1 5,881.6 4,204.5 6,308.8 Foreign | 4,828.7 6,261.4 6,292.7 6,070.1 ---------------------------------------------- . di 6070.1 - 6308.8 -238.7

Seems like after almost a year I need some help with stata again.

So, I am doing a

1) If my endogenous explanatory variable is continuous, what Stata cmd do I have to use? If the endogenous explanatory variable is binary, then what is the Stata cmd to be used?

2) Will I get consistent and unbiased standard errors after running the model?

3) To get the marginal effects do I have to use the

Any sort of help will be really appreciated. Thanks!]]>

I am fairly new to Stata. I am facing the following problem. I have multiple versions of add campaigns and want to compare them among each other. Data looks as follows:

Array

Clicks and views correspond to cumulated values from the individual campaigns. How do I test, e.g., if the facebook campaign performed significantly different from the premium campaign? I feel like chiĀ² test will be the right approach as we are comparing dichotomous values (success/failure facebook / premium) but I don't know how to approach the testing procedure in Stata as I have never worked with cumulated data. Does anyone have any hints on how to do this?

Kindest regards,

Lucas

]]>

i will be glad if you accept to help me,

thank you in advance,

regards,

Array

Array

Array

]]>

I am using Stata 14.2 on Windows and am currently working with some financial data.

I want to generate a variable, displaying the sum of the return on assets from a period t to t+2 for each company, specified by the ds_code in a way that the generated value is represented as new variable in the row of year t.

I first only aggregated the roa across a company using, but the code does not specify the time-frames I am interested in:

egen roa3y = sum(roa), by(ds_code)

I am struggling to include the condition to only calculate the sum of the years t=0 to t=2 referring to the year of the observation. My data looks somewhat like this:

ds_code year roa

"130062" 2004 .1783172

"130062" 2005 6.661182

"130062" 2006 2.339188

"130062" 2007 .16718867

"130062" 2008 .12236715

"130088" 2004 .22295973

"130088" 2005 .04354694

"130088" 2006 .3157642

"130088" 2007 .11244648

"130088" 2008 5.147346

So in e.g. in row 1, I want to generate a new variable roa3y that is the sum of the roa of firm "130062" from years 2004, 2005, and 2006 (9.1786872 in this case).

Really appreciate any help, you may be able to provide!

]]>