I have been trying to impute missing values in my longitudinal dataset using -ice- package developed by Patrick Royston in Stata 13.

I have data from 1988 to 2014 for 153 countries, and 9 variables (2 dependent, 7 predictors). There roughly 15% of rows that have one or more of the variables as missing.

I have been following every rule in the book so far:

1. Data is reshaped to wide to accommodate the longitudinal nature of it during imputations;

2. I have defined a custom equation for every variable Y that lists all the other 8 variables and values from other years for Y on the right hand side;

Dry-run shows no sign of any problem, and the actual imputation command runs smoothly too.

And now the problem: When I open the imputed dataset, I see many empty rows in between different imputations. It seems like that -ice- runs from row #1 (Afghanistan) to #65 (Iran) without any problem and imputes (with valid non-missing) all the missing values, then stops and imputes all missing values for all variables in other rows. I am quite sure that this cannot be due to misspecification because even the string variable "country" has missing values for rows-countries following Iran after imputations.

Has anyone dealt with a similar situation before? I know Patrick Royston is not a member of Statalist so I emailed him directly for help. Any advice is highly appreciated!]]>

I'd like to use margins and marginsplot to generate conditional parallel trends graphs using marginsplot with the linear predictions from marginsfollowing linear regression:

Code:

reg ln_wage i.immiyear##ethn /// yoe ftexp_after c.ftexp_after#c.ftexp_after /// 1.male /// 1.married /// 1.marr_male /// age c.age#c.age /// ysm /// 1.good_german_now /// 5.lfs /// 2.firmsize /// ue_rate /// 1.bula_neu /// 2.regtyp /// 1.poland 1.romania 1.ussr /// if inrange(immiyear,1993,1998) & inrange(immiage, 18,64), r margins immiyear, at( ethn==1 ethn==0 ) marginsplot, x(immiyear) recast(line) xline(1996)

Predictive margins | Number | of obs = | 542 |

Model VCE : Robust | |||

Expression : Linear prediction, predict() | |||

1._at : ethn_ger = 1 | |||

2._at : ethn_ger = 0 | |||

Delta-method | |||

Margin Std. Err. t | P>t | [95% Conf. | Interval] |

_at#immiyear | |||

1 1993 . (not estimable) | |||

1 1994 . (not estimable) | |||

1 1995 . (not estimable) | |||

1 1996 . (not estimable) | |||

1 1997 . (not estimable) | |||

1 1998 . (not estimable) | |||

2 1993 . (not estimable) | |||

2 1994 . (not estimable) | |||

2 1995 . (not estimable) | |||

2 1996 . (not estimable) | |||

2 1997 . (not estimable) | |||

2 1998 . (not estimable) | |||

When I use the -noestimcheck-option, margins does work and I get predictions, that look reasonable.

So, finally, here is my question: Is it ok to turn off the estimability check of the margin command in my case - with a linear model and quite a few factor variables (i.e. dummies)?

Thanks a lot and best regards,

Boris Ivanov]]>

Code:

/* Example #1 */ di %20.0fc 123456789 /* Example #2 */ local x 123456789 di `x' /* Example #3 */ local x `=123456789' di %20.0fc `x' /* Example #4 */ local x : di %20.0fc `=123456789' di `x'

Code:

. /* Example #1 */ . di %20.0fc 123456789 123,456,789 . . /* Example #2 */ . local x 123456789 . di `x' 1.235e+08 . . /* Example #3 */ . local x `=123456789' . di %20.0fc `x' 123,456,789 . . /* Example #4 */ . local x : di %20.0fc `=123456789' . di `x' 123 456 789

Best wishes,

Alan]]>

I am attempting to run some simulations, incrementing sample size and stratification factors to test for balance. The below code does most of what I need it to do.

1. It creates datasets of different sample sizes based on the values in -- local sample --

2. It creates covariate variables based on the categories in -- local cats --

3. It conducts stratified randomisation at different sizes of the dataset

The bit i'm struggling with is highlighted in red below. If you run the code you can see on the first loop that it creates what I need, iteratively adding "obs1" and then "obs1 obs2" and then "obs1 obs2 obs3", etc, until it is complete. But it then goes on because it is a nested loop. What I want to do is after each iteration stratify based just on obs1, and then obs1 and obs2, etc., until i'm stratifying my randomisation on all the observations generated in the first loop.

I've tried different things, such as moving both the bottom two snippets into the top bit of the code but to no avail. If anyone has any ideas, i'd welcome them!

Code:

```
**Clear memory
clear
local sample 100 200 400 800 1600 3200 6400 10000 //set up local for different sample sizes
local strata = ""
foreach size of local sample{
preserve
qui set obs `size' //set different sample sizes
qui gen id = _n //generate a unique id
local cats "`"2"' `"2"' `"2"' `"2"' `"2"' `"2"'" //set up local for category numbers of strat vars
forval i = 1/6{
local catind :word `i' of `cats'
qui gen obs`i' = mod(_n,`catind') //generate 6 i number of strat vars based on values of local cats
}
qui ds obs*, skip(1)
local stratlist "`r(varlist)'" //store all strat vars in a local
*di `"`stratlist'"'{
forval stratnum = 1/6{
local strat: word `stratnum' of `stratlist'
local strata `"`strata' `strat'"'
di `"`strata'"'
}
qui egen strata=group(`strata') //gen a variable that makes unique groups based on strat vars
set seed 31540 //setting a seed for replicability
qui gen randomnum = runiform() //generating a random number
qui bysort strata: egen order=rank(randomnum) //generating a rank order var based on the random number
qui bysort strata: gen treat = (order <= _N/2) //assigning condition based on rank
foreach var of local strata{
tab `var' treat, r
}
des, s
restore
}
```

Code:

mata: result = 0 for (i = 1; i <= 100; i++) { result_i = // Operations using the i-th observation result = result + result_i } end

Code:

mata: result_thread1 = result_thread2 = 0 // Thread 1 waitfor_thread1 = 1 for (i = 1; i <= 50; i++) { result_i = // Operations using the i-th observation result_thread1 = result + result_i } waitfor_thread1 = 0 // Thread 2 waitfor_thread2 = 1 for (i = 51; i <= 100; i++) { result_i = // Operations using the i-th observation result_thread2 = result + result_i } waitfor_thread2 = 0 // Collect while (waitfor_thread1 & waitfor_thread2) {} result = result_thread1 + result_thread2 end

Thanks!]]>

I am creating 100 random variables at the beginning as follows:

set obs 100

gen z1 = rnormal(6,1)

gen a=3

gen B=2

gen gama=1

gen y=a+B*z1+rnormal(0,1)

gen ym= y+ gama*rnormal(0,1)

I am trying to run regression of ym on z1 1,000 times. I want to record average,min,max of a and B (coefficients and constant) and R^2. Also I want to record t statistics for each regression.

Your helps will be appreciated.

Thanks in advance.

Ulas]]>

I would like to export statistics (mean min max sd p50 sd) from Stata to Word. I am using the

Code:

univar SIZE FFLOAT

Also I would like to sort the variables in the descriptive statistics by a dummy variable named CRDELIST (which equals 1 for cross-delisting and 0 for cross-listing).

Thanks]]>

b) I never did path analysis

c) Im a sociologst, so i suck ath math

I downloaded pathtreg via findit command and I´ve using this FAQ http://www.ats.ucla.edu/stat/stata/faq/pathreg.htm I tried doing a very simple model with just 3 variables and i think it worked

But i need to be able to do this classic model by Blau and Duncan http://dspace.library.uu.nl/bitstrea...802/image2.gif

What happens if some cases have missing data?

Any other advic onPath analysis or Stata in general?

THANKS!]]>

I prepared the following example. Consider this dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte S002EVS double S017 byte(F050 F051 F052 F053 F054 F055 F059) Year Weight Variables 1 1.0042262823081847 0 .a 0 0 0 0 0 1 1.2452805990773763 1 1 1 1 1 1 1 1 .9072044203720419 .a .a .a .a .a .a .a 1 1.0972472427211808 0 0 0 0 0 0 0 1 .9072044203720419 0 1 0 0 0 0 0 1 1.2772878112625108 1 .a 1 0 1 0 0 1 .6291417645137987 1 .a 1 0 1 .a 1 1 .6811534843146201 1 0 0 0 0 0 0 2 .3732121389114244 1 .a 1 0 0 1 0 2 .3732121389114244 1 1 1 0 1 1 1 2 2.3170253624084234 0 1 0 0 0 1 0 2 .3732121389114244 1 1 1 0 0 1 0 2 .3732121389114244 .a 0 0 0 0 0 0 2 2.3170253624084234 .a 0 0 0 0 0 0 2 .3732121389114244 1 .a 1 1 1 1 1 2 .3732121389114244 .a 0 1 0 0 0 0 2 .3732121389114244 1 0 1 .a .a 1 1 2 .3732121389114244 0 0 0 0 0 0 0 2 .3732121389114244 1 0 0 0 0 0 0 2 1.555050578797592 1 1 1 1 1 1 1 2 1.5083990614336737 .a 0 0 0 0 0 0 2 .6686717488829684 0 0 0 0 0 0 0 2 1.5083990614336737 1 .a 1 .a .a .a .a 2 1.897161706133059 0 0 0 0 0 0 0 2 .6686717488829684 1 0 .a 0 0 0 0 2 .1244040463038087 1 1 1 0 0 1 0 3 1.4629639878194398 .a .a .a .a .a 1 .a 3 1.212610284928649 0 .a .a 0 0 0 .a 3 1.0152899045483108 0 0 .a 0 0 0 .a 3 1.1133530127472218 0 0 .a 0 0 0 .a 3 2.009412926773146 1 .a .a 0 0 1 .a 3 1.119732314482523 0 0 .a 0 0 1 .a 3 .8222739038680567 1 1 .a 0 0 1 .a 3 1.316188761651862 0 0 .a 0 0 0 .a 3 .8831216549330272 0 0 .a .a 0 0 .a 3 1.119732314482523 1 0 .a 0 0 1 .a 3 1.316188761651862 1 1 .a 0 0 0 .a 3 .8831216549330272 1 1 .a 1 1 1 .a 3 .44226835369296097 1 1 .a 1 1 1 .a 4 .7997490250066603 0 0 .a 0 0 0 .a 4 .7836737805734815 1 1 .a 1 1 1 .a 4 .9710811465300937 1 1 .a 1 1 1 .a 4 1.1524775830534932 0 0 .a 0 0 0 .a 4 1.4225175893045623 1 1 .a 0 1 0 .a 4 .7997490250066603 1 0 .a 0 0 0 .a 4 1.4225175893045623 1 1 .a 1 1 1 .a 4 1.3632131789935933 1 1 .a 0 0 0 .a 4 .7997490250066603 0 0 .a 0 0 0 .a 4 .7644363603754695 1 .a .a 0 0 0 .a 4 .665211916477448 1 0 .a 0 1 1 .a end label values S002EVS S002EVS label def S002EVS 1 "1981-1984", modify label def S002EVS 2 "1990-1993", modify label def S002EVS 3 "1999-2001", modify label def S002EVS 4 "2008-2010", modify

It represents values of six dummy variables of a single country over four data surveys.

My goal is to reproduce this output:

Array

Note that the picture represents one country on one year but I want to show the values of variables among four data sessions.

I started to implement this code:

Code:

sort S002EVS S003 by S002EVS S003: tabulate S002 F050 [iw=S017]

(Example)

S002EVS = 1981-198, S003 = Belgium

no observations

How can I reproduce that table?

I hope I was clear enough.

Thanks for the attention.

]]>

I would like to ask you for help regarding my thesis. I am writing a work regarding FDI and their determinants. I have data for 38 countries during the period of 18 years. And my initial fixed effect model is showing very low R2. Only 6%.

So I have decided to try to take three years averages of the same data and run again the FE. The significance of the variables is almost the same however this time the R2 is almost 80%. Could you advice me what could be the reason?

I have to admit I am not very skilled in econometrics and I am not sure if FE is even the right model, I was only able to test that it is better than RE.

Second of all how can I test for heteroscedastocoty or autocorelation after running FE? And is it even necessary?

Thank you very much for your help.]]>

I have been running panel data fixed effects models with robust standard errors but do not get the intuition behind why I am doing this. I understand clustered standard errors for pooled OLS models (because of correlation in the error across time), but am confused with the fixed effects, so it would be great if someone could clarify why I am doing this. Essentially, all I know is that it is robust to heteroskedasticity and autocorrelation, but do not see how it applies to my panel data. Thanks in advance!]]>

I had seen previously from Austin Nichol's slides here that biprobit can be used for endogenous switching (self selection) with binary treatment and binary response, but I did not know that it could be extended to a panel data context. For the cross-sectional case, the command would be simply something like:

Semykina & Wooldridge (2015) suggest that the above command can be modified for panel data, explaining briefly in a footnote:

in Stata estimating treatment effects can be implemented by **pooling the data and estimating the augmented equation (with time averages) using the “biprobit” command**. Standard errors robust to serial dependence can be obtained using “**cluster**” option.

Can anyone here provide more details on how to implement this with stata? For example, what do they mean by “the augmented equation (with time averages)”? what are these time averages? Should I include year dummies? Is the panel data structure dealt with random effects in this method?

Also, are there ways to get

ALTERNATIVES:

As an alternative to the -biprobit- command, I think there must be a way to do this with the -cmp- command by David Roodman in a manner similar to that discussed in posts such as this or this. Any guidance on how to exactly implement the self selection case and calculate ATE and ATET with the -cmp- command would also be appreciated.

The -biprobit- command looks more attractive to me at this point because my panel data has a survey structure with probability weights and -biprobit- works with the -svy:- prefix. Although the requirement to use vce(cluster) noted by Semykina & Wooldridge (2015) will probably not let me use the svy prefix anyways.

Another approach based on control functions is outlined in Murtazashvili & Wooldridge (2016) but I can't find stata code for that either. I know the control function approach is what is used by stata's -eteffects- command, but that command also does not handle panel data.

Murtazashvili & Wooldridge (2016) - A control function approach to estimating switching regression models with endogenous explanatory variables and endogenous switching

Semykina & Wooldridge (2015) - Binary response panel data models with sample selection and self selection]]>

Is there an equivalent command to -esttab- which I can use to create a table of the -margins- results?

Thanks,

Anat ]]>

I am dealing with a panel data set with 31 provinces over 16 years examining the amount of development finance as logged Y variable and a set of X variables. I am using 12.1 version of stata and using both reg and xtreg to examine the effect (the reason I use reg only to justify a FE analysis, it is not center of my research).

The problem is: province number 31 does not receive development finance for 11 out of 16 years, thus the logged Y variable is 0 for those 11 years. My question is how to deal with this province in the fixed effects analysis, because it changed the significant variables to be insignificant in the results?

I have tried both: excluding the province and including the dummy: i.province31 in reg. But the dummy is being omitted in the xtreg. So I tried excluding the province and compared xtreg with and without the province 31, the R squared are the following:

with province31: R-sq: within = 0.0375

without province31: R-sq: within = 0.1105

The two codes I used are:

Code:

xtreg lnFINPP_PY2 GRPpc_L rSTBDGT_L TOPOP_L rFRBDGT_L EAST , cluster(province_id) fe

Code:

xtreg lnFINPP_PY2 GRPpc_L rSTBDGT_L TOPOP_L rFRBDGT_L EAST if(province_id < 31), cluster(province_id) fe

Array

I will attach both outputs, I hope you can see it and it is readable. I am new to stata and thankful for any suggestions. Thankyou

Mei]]>

Does any of you guys know how to code a calculation of person-time on a MI dataset?

I have tried:

mi estimate: by(varname) per(1000) dd(4)

as I would if the data was not MI-set.

But it say r(198), invalid "by"

Thanks.

Kind regards

Marie

]]>