I tried to figure out a problem that came up when I used xtsum yesterday to inspect a new panel data set.

Within variation for var "c_id" should be "0" and it is. But for var "year" the between-variation was not "0" what it should be. There is no value like 2008.5 allowed - only full years. Also my dummy variable for the year 2004 does not look right...

Do you have an explanation for that? Also T-bar looks awkward. I analyse a 8 year period. I already visually inspected the data and checked for any year values that might not be integers, but maybe there is another way to check for those deviations?

Thank you!

Stata 11.2

:

sort c_id year xtset c_id year xtsum c_id year dummy04 Variable | Mean Std. Dev. Min Max | Observations -----------------+--------------------------------------------+---------------- c_id overall | 1319.154 1049.926 1 4611 | N = 64340 between | 1048.901 1 4611 | n = 4611 within | 0 1819.384 1819.384 | T-bar = 6.67492 | | year overall | 2007.994 2.026894 2004 2011 | N = 64340 between | .4408717 2006.5 2008.5 | n = 4611 within | 1.97871 2004.137 2011.851 | T-bar = 6.67492 | | dummy04 overall | .0438763 .204824 0 1 | N = 64340 between | .0592206 0 .2 | n = 4611 within | .1954326 -.1561237 .9188763 | T-bar = 6.67492

I'm trying to make an analysis with plausible values. The dependent variable is english skills of students. As the study design included incomplete booklets 5 plausibel vlaues were created for the test results.

To make things more complicated, it is a multilevel data structure (Students nested in classes, nested in schools, nested in school types).

Now for the start I wanted to calculate a simple model with only gender, age and migration status as independent variables on the individual level and no variables on the other levels.

It looks like it worked (at least there was no error message), but the output lookes quite different from what I'm used to and I would be so thanksful to get a few hints from some of you

This is the command I used:

And this is the output I got:

Estimates for pv2 complete

Estimates for pv3 complete

Estimates for pv4 complete

Estimates for pv5 complete

Number of observations: 598

Average R-Squared: .

Coef Std Err t t Param P>|t|

pv5: Age -.00253257 .0023581 -1.0739901 . .

pv5: Sex .16583687 .06449874 2.5711647 . .

pv5: Deutsch .05954325 .06045505 .98491764 . .

pv5: ISEI .00362554 .00172722 2.0990664 . .

pv5:_cons -.19917917 .11831648 -1.683444 . .

lns1_1_1:_cons -1.8443343 55.885289 -.03300214 . .

lns2_1_1:_cons -1.8443525 55.917522 -.03298344 . .

lns3_1_1:_cons -1.8443384 55.922589 -.0329802 . .

lnsig_e:_cons -.59434564 .03554486 -16.721 . .

Is this what it is sopposed to look like? If it is, why is there no indication of significants in the last two columns?

I'm quiet confused. I wasn't able to find a useful discripiton anywhere.

Could someone help me out here?

Thanks in advance

Minka

P.S.:If there is already a thread to this topic, please let me know

]]>

I run the following 2 commands:

gsort var1 var2 -var3

by var1 var2 -var3: gen n=_n

Now stata tells me that the data are not sorted so I can't generate _n (I guess it doesn't get the minus of the gsort command before).

My question is now wheter there still is a possibility to generate _n with var3 beeing sorted in descending order? (so var3 descending while n ascending)

Thank you!]]>

I have generated 30 year dummy variables using

:

tabulate year, gen(y)

:

rename y1 y1971

:

foreach

Thank you for your help.]]>

I thought that there is a command for mixed logit model (mixlogit) in previous versions. But, it seems that I can't use 'mixloigt' anymore in Stata 13. Does anybody knows what command is now used for mixed logit?

Thank you very much in advance.

Woohyun]]>

1. How can I access the 3 data sets used in this FAQ

2.Do you have any insights / suggestions of approaches to use when it is a group-level variable which is missing? (I think the examples in the FAQ are all based on scenarios in which the missing values are in variables which vary within-group.)

Thank you

Ian Dohoo]]>

I am using STATA v10. I am performing a meta-analysis of diagnostic test accuracy. When there are 4 or more studies, I've been performing a bivariate analysis using the 'midas' command without difficulty. However, when this model fails to converge, or when I have three studies, then I've been using 'metan' to determine the logit sensitivities and specificities with their respective 95% CI. This works very well but I need to calculate the LR+ / LR- / DOR and I2 along with their 95% CI. I can calculate LR+/LR- and DOR manually, but not sure how to calculate their respective 95% confidence intervals.

Here is an examples for the sensitivity point estimate:

list studyid tp fp fn tn if parameter2==4

+-----------------------------+

| studyid tp fp fn tn |

|-----------------------------|

20. | 20 9 8 17 31 |

21. | 21 4 2 6 9 |

22. | 22 50 42 15 43 |

+-----------------------------+

metan b1 se_b1 if parameter2==4 , randomi nograph z /* summary logit sensitivity */

Study | ES [95% Conf. Interval] % Weight

---------------------+---------------------------------------------------

20 | -0.636 -1.444 0.172 34.34

21 | -0.405 -1.671 0.860 29.09

22 | 1.204 0.627 1.781 36.57

---------------------+---------------------------------------------------

D+L pooled ES | 0.104 -1.239 1.447 100.00

---------------------+---------------------------------------------------

Heterogeneity chi-squared = 15.25 (d.f. = 2) p = 0.000

I-squared (variation in ES attributable to heterogeneity) = 86.9%

Estimate of between-study variance Tau-squared = 1.1972

Test of ES=0 : z= 0.15 p = 0.879

. di invlogit(.104) /* sensitivity point estimate */

.52597659

. di invlogit(-1.239) /* sens LCI */

.2246101

. di invlogit(1.447) /* sens UCI */

.8095363

The same is easily done with the specificity. I can then use the sensitivity and specificity to generate the LR+ and LR- and diagnostic odds ratio.

However, I do not know how to calculate the 95% confidence intervals for the LR+ / LR- or DOR. Furthermore, I do not know how I can calculate the 95%CI for the I-squared statistic.

Any help would be greatly appreciated.

Don Griesdale

]]>

I'm trying to match a pattern in a plain text file containing line breaks, e.g. from...

Authors

Mueller A. Candrian G. Kropotov JD. Ponomarev VA. Baschera GM.

Authors Full Name

Mueller, Andreas. Candrian, Gian. Kropotov, Juri D. Ponomarev, Valery A.

... I'm trying to extract the authors, i.e. the second line of the text. The word "Authors" is followed by a line break, then two blanks (which you probably can't see because they've been stripped off by the forum software) , then come the authors, and the line ends with another line break.

I've tried to match this using

:

loc rgx (Authors[^!-~] )(.*)([^!-~]) replace authors = regexs(2) if regexm(abstract, "`rgx'")

How can I match the CR/LF character? I've searched far and wide, but there doesn't seem to be any dedicated regex symbol in Stata?

Alex

Equipment: Mac OS 10.10, Stata 13]]>

Please help me to address the following issue. I will keep my example simple:

1. DV - rating score (0-100) - data for which was collected in March 2014, the rating itself was published in May 2014.

2. IV - number of Twitter followers (0 - to infinity) - data collected in the end of July 2014.

I am afraid that the reviews may point out that

My initial idea was to: (A) re-collect IV data (which I did today for 20% randomly selected entities), and (B) run a paired t-test to see if there any significant increase (or decrease) over time. The null was rejected, that is increase in mean over time is significant. However, I am not sure that this is the correct approach, because really there are many possible factors that could impact this significant increase.

Thank you in advance,

Anton]]>

The independent variables are financial performance measures = ROA, ROE etc.

I have 16 banks each in both sample.

The years are from 2002 to 2013.

Can someone please recommend a model that I can use with Stata to build a model?

Many thanks,

JC]]>