Will there be problem of spurious regression for large T small N logistic regression (with individual fixed effects),

if the dependent variable is binary (0 or 1), and independent variables are non-stataionary?

Can nonlinear model (like logit and probit models) have the problem of spurious regression?

]]>

I am using -nlsur- to estimate a system of equations. However, I find the Residual SS calculated in each iteration stays the same. Is this indicative of a problem in estimation? The following is what I get.

HTML Code:

Calculating NLS estimates... Iteration 0: Residual SS = 12537.91 Iteration 1: Residual SS = 12537.91 Iteration 2: Residual SS = 12537.91 Iteration 3: Residual SS = 12537.91 Calculating FGNLS estimates... Iteration 0: Scaled RSS = 136761.4 Iteration 1: Scaled RSS = 136761.4 Iteration 2: Scaled RSS = 136761.4

Q1) Is T=30 large enough to ignore the endogeneity bias issue? Or do I need to address it via instrumentation?

Q2) More generally when are T and N considered "large/small"?

Q3) Can/should time fixed effects be used in a dynamic model?

Q4) How can I decide whether a single lag of the dependent variable enough? My dataset may be too small to support multiple lags but I'd still like to know.

Perhaps a dynamic model isn't the way to go. Alternatively, I could use first differencing or a time polynomial to alleviate nonstationarity issues.

Q5) How do I know which combination of these modeling techniques is most appropriate (dynamic methods, first-differencing, detrending, inclusion/exclusion of year dummies)?

Q6) At least one message board I found suggested that stationarity isn't a major concern when using panel data. I can't imagine how this could be true, as I think nonstationarity would lead to spurious results. Am I correct or am I missing something?

I've been reading message boards, online lecture notes, and academic papers for days but can't find practical answers to these questions.

If you can address ANY of these questions, I would greatly appreciate it. When doing so, please bear in mind that I'm looking for practical approaches and don't have the ability to understand highly technical/theoretical papers. Thank you!

]]>

My code is attached:

clear all

cd "...."

program weibullsim

tempname sim

postfile `sim' mean var using results, replace

quietly {

forvalues i=1/20000 {

set obs 20

gen u=uniform()

gen weibull=1*(-ln([u]))^2

gen weibullpower=[weibull]^0.5

summarize weibullpower

post `sim' (r(mean)) (r(Var))

}

}

postclose `sim'

end

clear

set seed 123456

weibullsim

use results, clear

gen estimate=mean^2

summarize estimate

end

Thanks!]]>

I am currently analysing a survey about the role of the state in Arab countries. I have 3 different questions which all refer to the role of the state. For example the first one is:

"If you have to choose only one, which one of the following statements would you choose as the most essential characteristics of a democracy?

1. Government narrows the gap between the rich and the poor.

2. People choose the government leaders in free and fair election.

3. Government does not waste any public money.

4. People are free to express their political views openly."

The problem is that the other two questions ask exactly the same thing but with different options for answers. What I would like is to combine the three questions in end up with a general ranking of the most important role of the state within the population.

Each individual in the dataset answered the three questions.

Therefore I would like to know if there is a trick or a statistical method to solve this ?

Best

]]>

I have the following 2 variables :

1. tab pdlast

pdlast | Freq. Percent Cum.

------------+-----------------------------------

1 | 28 1.02 1.02

1.5 | 9 0.33 1.35

2 | 252 9.20 10.55

2.5 | 126 4.60 15.15

3 | 1,263 46.09 61.24

3.5 | 231 8.43 69.67

4 | 429 15.66 85.33

4.5 | 57 2.08 87.41

5 | 170 6.20 93.61

5.5 | 22 0.80 94.42

6 | 111 4.05 98.47

6.5 | 1 0.04 98.50

7 | 23 0.84 99.34

8 | 7 0.26 99.60

9 | 10 0.36 99.96

12 | 1 0.04 100.00

------------+-----------------------------------

2. bop_deepest

5 | Freq. Percent Cum.

------------+-----------------------------------

0 | 351 87.31 87.31

1 | 51 12.69 100.00

------------+-----------------------------------

I would like to create a new variable with 2 levels and with the following conditions:

0 if pdlast is less than 5 and bop_deepest is 0 or 1 OR if pdlast is 5 or 5.5 and bop_deepest ==0

1

if pdlast is 5 or 5.5 and bop_deepest ==1 OR if

pdlast is 6 or higher and bop_deepest is 0 or 1

Any help would be appreciated.

Thank you,

Nikos

]]>Thank you,

Nikos

I am trying to use the SSC command mimrgns to compute predicted probabilities for eprobit models. The models are ERMs using endogenous treatment effects, if that matters, and are estimated using:

Code:

mi estimate, saving(filename) esample(sampname) cmdok: svy: eprobit ...

Code:

mimrgns using filename, esample(sampname) dydx(*) predict(pr)

inconsistent estimation sample levels 0 and 1 of factor VARNAME

an error occurred when mi estimate executed mimrgns_estimate on m=2

r(459);

Any advice will be most appreciated. Thanks.

]]>

Can someone tell me the difference between these two Unit Root Tests (xtfisher and xtunit root fisher), or, at least, can someone confirm me that xtfisher is a second generation URT? ]]>

I've been trying to use the supermodularity approach to check the pairwise complementarities between innovations.

My question is about testing joint inequality in STATA.

In the supermodularity approach, there is complementaritiy if C (1, 1) – C (0, 1) ≥ C (1, 0) – C (0, 0), where the binary variables indicate whether a firm conducts a type of innovation.

To test joint inequality,recent studies follow Kodde and Palm’s approach (1986) and use Wald test. They use inequality in their null hypothesis: C (11XX) – C (01XX) - C (10XX) + C (00XX) ≥ 0

The problem is that the test command, which is for Wald test, in STATA doesn't allow joint inequality, so I can only test joint equality in my null hypothesis: C (11XX) – C (01XX) - C (10XX) + C (00XX) = 0

My Stata code looks like:

Code:

test (C1111 – C0111 - C1011 - C0011 = 0) (C1110 – C0110 - C1010 - C0010 = 0) (C1101 – C0101 - C1001 - C0001 = 0) (C1100 – C0100 - C1000 - C0000 = 0) chi2 ( 4) = 1.57 Prob > chi2 = 0.8137

I'm looking for a way to do this:

Code:

`test (C1111 – C0111 - C1011 - C0011 >= 0) (C1110 – C0110 - C1010 - C0010 >= 0) (C1101 – C0101 - C1001 - C0001 >= 0) (C1100 – C0100 - C1000 - C0000 >= 0)`

Also, I've check the FAQ: "How can I perform a one-sided test?"

https://www.stata.com/support/faqs/s...-coefficients/

But this only applies to single hypothesis, not joint hypothesis.]]>

How can I generate a new variable "countyindicator",

if for a person, all county equal to 25, then generate countyindicator = 1

if for a person, one of county equal to 25, then generate countyindicator = 2

if for a person, none of county equal to 25, then generate county indicator = 3

Here is the example below,

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(id county) 1 25 1 25 1 25 1 25 2 78 2 29 3 64 3 25 3 97 4 25 end

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(id county countyindicator) 1 25 1 1 25 1 1 25 1 1 25 1 2 78 3 2 29 3 3 64 2 3 25 2 3 97 2 4 25 1 end

Thanks

Jack Liang

]]>

rdplot wynik_gm_m_std2015 dist_rus_aus_bord if abs(dist_rus_aus_bord)<50& ordinary==1&(ktory_zabor==2|ktory_zabor==3), p(1) covs (perc_boys l_uczG perc_dysl log_popul proc_pom_spol_2008 log_doch_wlas_pc_2015 perc_higher ) weights(l_uczG) h(50)

Array ]]>

this is what I go when I do icd9p check pr1, any

1. Invalid placement of period 0

2. Too many periods 0

3. Code too short 0

4. Code too long 0

5. Invalid 1st char (not 0-9) 27,472

6. Invalid 2nd char (not 0-9) 0

7. Invalid 3rd char (not 0-9) 0

8. Invalid 4th char (not 0-9) 0

-----------

Total 27,472

]]>

I am analysing childhood cancer data. I want to study 5-year survival (cohort and period) in different time-periods. I order to take into account different age structure in different periods, I want to standardise to my own data in the latest period.

I know how to calculate the weights:

egen agegr_st=cut(age) if age<15, at(0,5,10,15,100)

table agegr_st if age<15, c(min age max age)

tab agegr_st if age<15

gen standwei=0.423 if agegr_st==0

replace standwei=0.245 if agegr_st==5

replace standwei=0.332 if agegr_st==10

But can someone tell me how I have to write my sts command in order to use the weights (I know how to do it with strs, but I need to use sts).

I am using STATA 14.2

Thank you in advance!]]>