Is there any test of endogeneity after running xtreg, fe or re?

Thank you.]]>

Can we use xtreg, fe and xtreg, re when T>N in our panel data?

Many Thanks.]]>

When running -sbknown- with two given dates, does it identify whether there is a structural break in the trend BETWEEN the dates? For example I'm identifying breaks in 17 years of monthly population counts - if I feed two separate dates, and it gives a test statistic with a p value of <.05, does that mean there is a structural break in the series between the two dates which I use in the syntax?

Also, is there any reason why one would get different results when using -sbsingle- (searches a time series for the first structural break from a given point in time) versus the aforementioned -sbknown-? As far as I can see from searching the help file and PDF documentation, the underlying math is the same - Wald test. Or it should be.]]>

Previously, when I define the path as:

cd "F:\360CloudUp\Research\Projects\Research_Projects \!Accept -- SC\Empirical"

global dofiles= "/dofiles"

global rawdata= "/rawdata"

global workingdata= "/workingdata"

global tables= "/tables"

And I run IV regression as:

xi: xtivreg2 capreturn (ln_mean_4_total = ln_spend_popu ln_admin_income_popu) L6.ln_mean_4_total $xlist i.year, fe first cluster(id)

outreg2 using tables/Table_8.xls, replace label addstat(First stage F-stat, e(widstat), Hansen, e(j), Hansen p value, e(jp))

Hansen test and its p value can be reported in the excel file.

Afterwards, when I run another project with definition of the path:

global dofiles="F:\360CloudUp\Research\Projects\Research_ Projects\!!_Data_and_Empirics\CFPS\DFC\dofiles"

global rawdata="F:\360CloudUp\Research\Projects\Research_ Projects\!!_Data_and_Empirics\CFPS\rawdata"

global workingdata="F:\!!_Data_and_Empirics\CFPS\DFC\work ingdata"

global tables="F:\360CloudUp\Research\Projects\Research_P rojects\!!_Data_and_Empirics\CFPS\DFC\tables"

and run the regression:

xi: xtivreg2 lnconsumer_exp (cpre_total_index=IV_city_Hangzhou1 IV_city_Hangzhou2 IV_city_Hangzhou3) i.year, fe first cluster(city)

outreg2 using $tables/Table_7_cpre_Hangzhou_IV.xls, replace addstat(First stage F-stat, e(widstat), Hansen, e(j), Hansen p value, e(jp))

it says:

check eret list for the existence of e(jp)

invalid syntax

r(198);

I don't know why the error has appeared. It seems that the two codes are the same.

Thank you.

]]>

Trying to write an .ado file. This is how it looks.

When I was debugging this line by line, I am getting 2 errors.:

1. xc, invalid name, r(198)

2. y' invalid name, r(198)

When I delete (`xc'==`k'), error #1 disappears and when I remove the extra " ' " after y in scalar asf = (`y''*ind), error #2 goes away. However, both the conditional check and finding the transpose of y are crucial to my code. How do I get rid of these errors?

program func, rclass

version 15.1

syntax varlist(numeric min=4 max=4), i(integer) j(integer) k(integer)

args y xa xb xc

//indicator

gen ind = (`xa' == `i')*(`xb' == `j')*(`xc'==`k')

//some calculations

qui sum ind, det

scalar asf = (`y''*ind)

return scalar asf = asf

return scalar I = `i'

return scalar J = `j'

return scalar K = `k'

dropind

end]]>

I have GDP data for each state from the last 10 years, and I want to use hp filter on each state's time series, suppose "state" is the variable for state name, and "year" is the time, is there a concise way to use "tsfilter hp" command to detrend each state's time series?

Thanks!]]>

I am looking for some suggestions to choose an appropriate model for my research. I am trying to estimate the probability of using different job search method for unemployed individuals. I have five different outcome variables (search channels) that individual used to find jobs: (1) contacted potential employer(s) directly (yes/no), (2) through friend(s)/relative(s) (Yes/no), (3) placed or answered newspaper ad(s) (Yes/no), (4) consulted with employment agency (Yes/no), (5) searched the Internet. In the right-hand side, I am using demographic variables (Age, Sex, Education, Ethnicity) and other neighborhood characteristics as predictive variables (IV). My data is longitudinal, and I will estimate the model for both cross-sectional and panel setup.

I have few options to estimate the model. First, I can categorize the outcomes into two variables: informal networks ( friends and relatives) and formal networks (Internet or other institutional methods) and use a simple logit/Probit model. But I am also interested in examining the probability of using different formal methods as well, like Internet versus employment agency. In that case, I can use the multinomial Logit/Probit model. But the problem of Multinomial logit/Probit is that it assumes the individual will select only one alternative. But in my case, individuals used three or four methods simultaneously. If I restrict the sample on individuals who used only one method, I lose more than half of my sample.

Second, I can use -gsem- to estimate the model. The advantage of using –gsem- is that it would allow me for five separate but correlated binary outcome variables and also give me separate coefficient for each DV. But I am not sure whether -gsem- is the only available option or best option to solve my problem. Additionally, I was wondering whether I could use multivariate logit model as an alternative.

Any suggestions and advice will be greatly appreciated.

]]>

I would be very grateful if someone could confirm whether my understanding is correct.

I ran a panel regression, then asked stata for the adjusted r squared, can I assume this is the within r squared? My panels are local authorities.

"The adjusted r squared is 0.361 for dry recycling and 0.699 for compost, telling us only 36.1% of the variation in dry recycling rates is explained by the independent variables, and 69.9% of the variation in compost recycling rates."

Is it the variation in dry recycling rates, or the variation within local authorities in dry recycling rates?

Thank you in advance!

Darcy ]]>

Im fairly new to stata, i've just manually run a RESET test on my panel data and it failed. I understand that this is due to incorrect functional form of my linear model. Im not sure which variables in the model need to be changed. Im not sure if there is a way I can tell?

Thanks]]>

Code:

sysuse auto egen price_q5 = xtile(price), n(5) egen weightmedian_priceq5 = median( weight ), by( price_q5) levelsof weightmedian_priceq5 gen weightmedian_binary = . replace weightmedian_binary = 0 if price_q5==1 & weight<2640 replace weightmedian_binary = 1 if price_q5==1 & weight>=2640 replace weightmedian_binary = 0 if price_q5==2 & weight<2650 replace weightmedian_binary = 1 if price_q5==2 & weight>=2650 replace weightmedian_binary = 0 if price_q5==3 & weight<2670 replace weightmedian_binary = 1 if price_q5==3 & weight>=2670 replace weightmedian_binary = 0 if price_q5==4 & weight<3280 replace weightmedian_binary = 1 if price_q5==4 & weight>=3280 replace weightmedian_binary = 0 if price_q5==5 & weight<3890 replace weightmedian_binary = 1 if price_q5==5 & weight>=3890

I was hoping someone could help me.

I ran the command

Loneway recycling acode

Where recycling is the recycling rate within a local authority, and acode is the code for each local authority.

Below are my results. What does it mean by 'intraclass correlation' here? Am I correct in thinking around two thirds of the variation in recycling rates comes from between local authorities; Is there is a high correlation of recycling rates due to local authority specific variation.

Thank You

One-way Analysis of Variance for recycling:

Number of obs = 6,350

R-squared = 0.6220

Source SS df MS F Prob > F

-------------------------------------------------------------------------

Between acode 103750.6 317 327.28895 31.31 0.0000

Within acode 63046.518 6,032 10.452009

-------------------------------------------------------------------------

Total 166797.12 6,349 26.2714

Intraclass Asy.

correlation S.E. [95% Conf. Interval]

------------------------------------------------

0.60287 0.02016 0.56336 0.64237

Estimated SD of acode effect 3.983316

Estimated SD within acode 3.232957

Est. reliability of a acode mean 0.96806

(evaluated at n=19.97)

]]>

I’m estimating a mvprobit for 5 search actions (outcomes) for the unemployed to see the effect of individual characteristics and structural variables effectiveness on such actions. The output I got so far is in the attached pdf . First there is some descriptive stuff for the outcomes and covariates (the labels of the variables are in Italia, buti t does not matter for my issue). Second, there is the mvprobit estimation with 5 equations (Y1,…Y5, 1 equation for each unemployment job search action):

Y1 (0,1) =f(x)

Y2 (0,1) =f(x)

Y3 (0,1) =f(x)

Y4 (0,1) =f(x)

Y5 (0,1) =f(x)

Marginal effects on each outcome are obtained using posterior simulation (10000 as number of simulated coefficient vectors from the posterior distribution of the estimated model parameters) and are not shown in the pdf for the sake of brevity.

I’m interested in calculating the marginal effects for combinations of outcome (joint marginal effect, to see the joint effect of search actions, as some unemployed use combinations of actions) such as:

pr(Y1=1, Y2=1, Y3=1, Y4=1, Y5=1)

pr(Y1=0, Y2=1, Y3=1, Y4=1, Y5=1)

pr(Y1=0, Y2=0, Y3=1, Y4=1, Y5=1)

….

pr(Y1=1, Y2=0, Y3=1, Y4=1, Y5=1)

……

….

pr(Y1=0, Y2=0, Y3=0, Y4=0, Y5=1)

we won’t have the case of no search actions:

pr(Y1=0, Y2=0, Y3=0, Y4=0, Y5=0)

I read in the Stata manual, your help file and SJ article that it is easy to calculate these marginal effects for M = 3, but I was not able to find a way to calculate them for M > 3, that is my case, with 5 equations.

Do you have some suggestions on this?

Thank you very much in advance,

Chiara

]]>

I’m hoping to solicit some advice from those of you that are familiar with survival analysis. I’m still a bit new to the subject, but would like to get my feet wet with some firm-level data. I’ve read through a couple of helpful resources, but am still a bit puzzled as to properly stset this particular set up. The closest example that I can find related to my current setup on Statalist is this thread (https://www.statalist.org/forums/for...re-using-stset).

Below there is a sample of my data. It’s confidential so it’s been heavily altered. The data a firm level data from Q1-2008 to Q3-2014. I have a firm id (firm), a quarterly variable (quarter), the employment count (emp), when the firm came into business (register_date), when the firm went out of business (termination_date), and the major industry (naics). I’ve removed the other covariates for simplicity.

Here’s what I’ve done – I’m not sure if this is right, but would appreciate any feedback!

Code:

gen failure = 0 replace failure = 1 if quarter == termination_date bys firm (quarter) : replace failure = . if failure[_n-1] == 1 bys firm (quarter) : replace failure = . if failure[_n-1] == . & quarter >= termination_date bys firm (quarter) : replace failure = . if employment == . & quarter < termination_date stset quarter, failure(failure == 1) replace _t = _t – 191

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int(firm quarter) float employment str6 naics int(register_date termination_date) 1 192 . "54" 204 207 1 193 . "54" 204 207 1 194 . "54" 204 207 1 195 . "54" 204 207 1 196 . "54" 204 207 1 197 . "54" 204 207 1 198 . "54" 204 207 1 199 . "54" 204 207 1 200 . "54" 204 207 1 201 . "54" 204 207 1 202 . "54" 204 207 1 203 . "54" 204 207 1 204 38 "54" 204 207 1 205 55 "54" 204 207 1 206 6 "54" 204 207 1 207 66 "54" 204 207 1 208 . "54" 204 207 1 209 . "54" 204 207 1 210 . "54" 204 207 1 211 . "54" 204 207 1 212 . "54" 204 207 1 213 . "54" 204 207 1 214 . "54" 204 207 1 215 . "54" 204 207 1 216 . "54" 204 207 1 217 . "54" 204 207 1 218 . "54" 204 207 2 192 . "54" 204 . 2 193 . "54" 204 . 2 194 . "54" 204 . 2 195 . "54" 204 . 2 196 . "54" 204 . 2 197 . "54" 204 . 2 198 . "54" 204 . 2 199 . "54" 204 . 2 200 . "54" 204 . 2 201 . "54" 204 . 2 202 . "54" 204 . 2 203 . "54" 204 . 2 204 27 "54" 204 . 2 205 39 "54" 204 . 2 206 21 "54" 204 . 2 207 20 "54" 204 . 2 208 66 "54" 204 . 2 209 5 "54" 204 . 2 210 2 "54" 204 . 2 211 29 "54" 204 . 2 212 26 "54" 204 . 2 213 24 "54" 204 . 2 214 6 "54" 204 . 2 215 10 "54" 204 . 2 216 22 "54" 204 . 2 217 5 "54" 204 . 2 218 20 "54" 204 . 3 192 . "51" 204 215 3 193 . "51" 204 215 3 194 . "51" 204 215 3 195 . "51" 204 215 3 196 . "51" 204 215 3 197 . "51" 204 215 3 198 . "51" 204 215 3 199 . "51" 204 215 3 200 . "51" 204 215 3 201 . "51" 204 215 3 202 . "51" 204 215 3 203 . "51" 204 215 3 204 220 "51" 204 215 3 205 237 "51" 204 215 3 206 215 "51" 204 215 3 207 361 "51" 204 215 3 208 225 "51" 204 215 3 209 219 "51" 204 215 3 210 338 "51" 204 215 3 211 398 "51" 204 215 3 212 123 "51" 204 215 3 213 37 "51" 204 215 3 214 37 "51" 204 215 3 215 0 "51" 204 215 3 216 . "51" 204 215 3 217 . "51" 204 215 3 218 . "51" 204 215 4 192 . "53" 204 . 4 193 . "53" 204 . 4 194 . "53" 204 . 4 195 . "53" 204 . 4 196 . "53" 204 . 4 197 . "53" 204 . 4 198 . "53" 204 . 4 199 . "53" 204 . 4 200 . "53" 204 . 4 201 . "53" 204 . 4 202 . "53" 204 . 4 203 . "53" 204 . 4 204 6 "53" 204 . 4 205 15 "53" 204 . 4 206 35 "53" 204 . 4 207 34 "53" 204 . 4 208 49 "53" 204 . 4 209 31 "53" 204 . 4 210 7 "53" 204 . end format %tq quarter format %tq register_date format %tq termination_date

I am running a regression on logged hourly pay and was wondering how to test whether I should include higher power terms of continuous variables or not. The continuous variables I have are both measured in years, experience and tenure. I have noticed it's common to include age & age squared to account for decreasing marginal returns, and was wondering if there's a way to check whether to include tenure and/or experience as squared terms too. I have plotted a two-way scatter graph between hourly & tenure then experience but can't see a clear relationship.

Thank you for your time, I really appreciate it!

]]>

b is a dummy ( 0 and 1)

c = a*b

c is for interation

I winsor2 the variable a and b, but i dont winsor2 the variable c

is that ok ?]]>