Thank you so much

Luke Brown

Code:

xtreg riskavers hhsex age educ race logsaving crisis logincome, vce(cluster YY1) Random-effects GLS regression Number of obs = 31879 Group variable: YY1 Number of groups = 6551 R-sq: within = 0.2245 Obs per group: min = 1 between = 0.2259 avg = 4.9 overall = 0.2260 max = 6 Wald chi2(7) = 11592.78 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 (Std. Err. adjusted for 6551 clusters in YY1) ------------------------------------------------------------------------------ | Robust riskavers | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hhsex | -.0983859 .007495 -13.13 0.000 -.1130758 -.083696 age | -.0046486 .0001839 -25.27 0.000 -.005009 -.0042881 educ | .0414245 .0010909 37.97 0.000 .0392863 .0435627 race | -.0267025 .0031038 -8.60 0.000 -.0327859 -.0206191 logsaving | .0056858 .0005938 9.57 0.000 .0045219 .0068497 crisis | -.0392017 .0055675 -7.04 0.000 -.0501137 -.0282897 logincome | .0899202 .0021053 42.71 0.000 .0857939 .0940464 _cons | .6533855 .0272177 24.01 0.000 .6000397 .7067313 -------------+---------------------------------------------------------------- sigma_u | .05883202 sigma_e | .48710283 rho | .01437794 (fraction of variance due to u_i) ------------------------------------------------------------------------------

Code:

mlogit riskavers hhsex age educ race logsaving crisis logincome, baseoutcome(1) vce(cluster YY1) Iteration 0: log pseudolikelihood = -26224.432 Iteration 1: log pseudolikelihood = -21636.01 Iteration 2: log pseudolikelihood = -21376.921 Iteration 3: log pseudolikelihood = -21374.942 Iteration 4: log pseudolikelihood = -21374.941 Multinomial logistic regression Number of obs = 31879 Wald chi2(14) = 5973.95 Prob > chi2 = 0.0000 Log pseudolikelihood = -21374.941 Pseudo R2 = 0.1849 (Std. Err. adjusted for 6551 clusters in YY1) ------------------------------------------------------------------------------ | Robust riskavers | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1 | (base outcome) -------------+---------------------------------------------------------------- 2 | hhsex | -.3511427 .0340069 -10.33 0.000 -.417795 -.2844904 age | -.020097 .0008849 -22.71 0.000 -.0218314 -.0183627 educ | .2007731 .0058042 34.59 0.000 .189397 .2121492 race | -.173934 .0142654 -12.19 0.000 -.2018938 -.1459743 logsaving | .0406524 .0033108 12.28 0.000 .0341633 .0471415 crisis | -.2012526 .0283804 -7.09 0.000 -.2568771 -.1456281 logincome | .5681068 .015723 36.13 0.000 .5372903 .5989234 _cons | -6.04118 .18116 -33.35 0.000 -6.396247 -5.686113 -------------+---------------------------------------------------------------- 3 | hhsex | -.3277998 .0828562 -3.96 0.000 -.4901949 -.1654046 age | -.0365087 .0021477 -17.00 0.000 -.0407182 -.0322993 educ | .1516283 .0124127 12.22 0.000 .1272998 .1759568 race | -.0236554 .026387 -0.90 0.370 -.0753731 .0280622 logsaving | -.0086072 .0064052 -1.34 0.179 -.0211612 .0039468 crisis | -.183439 .0579029 -3.17 0.002 -.2969265 -.0699515 logincome | .7799633 .0246435 31.65 0.000 .7316628 .8282637 _cons | -9.82149 .2998162 -32.76 0.000 -10.40912 -9.233861 ------------------------------------------------------------------------------

The question is, how do I approach analysis of this data? Do I reshape the data to wide with i(caseid) and j(time)? Is it possible to compare the same individuals through time while comparing the villages to each other?

Any guidance would be appreciated.

]]>

I have a dataset where I have weekly sales from a farmers market (there many gaps in the data, especially as the market is closed several months each year). Then, I have sales by individual vendor types. I have one time series analysis that looks at aggregated sales (of all vendor types), and it is clear there is an autoregressive process that must be included. With the multilevel analysis, I want to consider the random effects of the vendor type. When I do this, and include the residual structure, the conversion is very slow (in fact I have not been able to see any results from the analysis). Here is the code for that, followed by sample data. Does anybody have any suggestions for improving the convergence? Maybe there is something I am overlooking.

Thank you!

-Steve

Code:

tsset vend_id date3 mixed lnsales_type10v lnspec_index /// ||vend_type: , mle residuals(ar 1, by(vend_type) t(date3))

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(lnsales_type10v lnspec_index) str9 vend_type float date3 3.505557 1.6897603 "spec" 19230 3.987673 1.6897603 "nonedible" 19230 4.1076307 1.6897603 "plants" 19230 4.795667 1.776106 "plants" 19181 4.824479 1.609438 "nonedible" 19811 4.922168 1.670123 "nonedible" 19573 4.928991 1.7769105 "nonedible" 19552 5.108971 1.684198 "nonedible" 19608 5.161642 1.7881165 "plants" 19972 5.166727 1.609438 "produce" 19811 5.192368 1.5377253 "produce" 19762 5.200264 1.8057457 "nonedible" 19895 5.201256 1.7337543 "produce" 19321 5.20396 1.609438 "plants" 19811 5.228194 1.764295 "plants" 19209 5.232668 1.7796197 "plants" 19202 5.232712 1.5377253 "nonedible" 19762 5.236389 1.7176515 "produce" 19790 5.243749 1.6897603 "value" 19230 5.296916 1.609438 "spec" 19811 5.327488 1.5145186 "produce" 19398 5.336576 1.6397433 "produce" 19426 5.339828 1.789192 "plants" 19216 5.380818 1.7403235 "nonedible" 19615 5.400657 1.764015 "plants" 19272 5.414137 1.7466246 "plants" 19188 5.414655 1.7120484 "nonedible" 19937 5.43238 1.728827 "plants" 19223 5.452424 1.79868 "nonedible" 19916 5.457228 1.7917595 "plants" 19195 5.487449 1.670238 "nonedible" 19601 5.488313 1.6897603 "meat" 19230 5.498344 1.8590926 "plants" 19545 5.50289 1.7026595 "produce" 19447 5.505159 1.771281 "nonedible" 19174 5.510217 1.7940716 "nonedible" 19517 5.514417 1.754488 "plants" 19643 5.521911 1.7149657 "nonedible" 19496 5.530543 1.7574022 "nonedible" 19559 5.544161 1.628456 "produce" 19454 5.547513 1.7149657 "plants" 19496 5.570927 1.78413 "plants" 19251 5.574079 1.688008 "produce" 19818 5.577326 1.74204 "plants" 19958 5.579881 1.8174162 "nonedible" 19888 5.588661 1.7403235 "plants" 19615 5.596269 1.7881165 "nonedible" 19972 5.600256 1.7897793 "plants" 19944 5.618566 1.655423 "plants" 19657 5.626764 1.628456 "nonedible" 19454 5.6287 1.704262 "nonedible" 19587 5.645447 1.686756 "nonedible" 19489 5.646227 1.684198 "plants" 19608 5.658646 1.776106 "nonedible" 19181 5.663717 1.7120484 "plants" 19937 5.677672 1.704262 "plants" 19587 5.677943 1.754019 "plants" 19930 5.678123 1.6464264 "nonedible" 19580 5.683178 1.6945958 "nonedible" 19909 5.686351 1.7278557 "nonedible" 19244 5.691782 1.7412496 "plants" 19951 5.709225 1.7013754 "plants" 19636 5.710659 1.754019 "nonedible" 19930 5.714527 1.609438 "meat" 19811 5.721739 1.7721362 "plants" 19650 5.721868 1.7278557 "nonedible" 19265 5.744672 1.5377253 "plants" 19762 5.746185 1.7796197 "nonedible" 19202 5.747989 1.762772 "plants" 19279 5.748619 1.6945958 "plants" 19909 5.760365 1.688008 "plants" 19293 5.762837 1.7574022 "plants" 19559 5.765787 1.754488 "nonedible" 19643 5.766636 1.699146 "plants" 19160 5.767956 1.6897603 "produce" 19230 5.770072 1.789192 "nonedible" 19216 5.771064 1.727043 "plants" 19622 5.778542 1.7562047 "nonedible" 19860 5.786503 1.781418 "nonedible" 19146 5.789114 1.764295 "nonedible" 19209 5.792663 1.670123 "plants" 19573 5.793485 1.7769105 "plants" 19552 5.800488 1.670238 "nonedible" 19461 5.805037 1.7512915 "nonedible" 19139 5.806499 1.6028227 "nonedible" 19965 5.812849 1.776106 "plants" 19566 5.821338 1.7647308 "spec" 19832 5.827877 1.4319825 "nonedible" 19797 5.831261 1.6464264 "plants" 19580 5.844341 1.6846718 "plants" 19629 5.845448 1.670238 "plants" 19601 5.846124 1.7466246 "nonedible" 19188 5.857612 1.6479865 "plants" 19034 5.861122 1.7104533 "nonedible" 19104 5.869641 1.79868 "plants" 19916 5.870623 1.720624 "plants" 19111 5.872681 1.6672574 "produce" 19825 5.876374 1.7647308 "nonedible" 19832 5.891682 1.7940716 "plants" 19517 5.892911 1.7278557 "plants" 19265 end

I have a population of people with diagnosed end-stage renal disease (ESRD). The data comes from a national registry of people with ESRD, established in 1960.

I am estimating annual rates of amputations in this population between 2000 and 2015, age-standardised.

Below I provide the code I have used to obtain this data.

stset dox1, fail(lea1==1) origin(born) entry (entry) scale(365.25) id(usrds_id)

stsplit _year, after(time=d(1/1/2000)) at(0(1)15) trim

replace _year=2000 + _year

gen pop = 1

gen agecat = .

replace agecat = 1 if _t0>17 & _t0<45

replace agecat = 2 if _t0>=45 & _t0<65

replace agecat = 3 if _t0>=65 & _t0<75

replace agecat = 4 if _t0>=75 & _t0<.

collapse (sum) pop lea1, by(_year agecat)

where dox1 is date of exit (either date of amputation, date of death or 31december 2015, whichever occurred first)

lea1 = amputation event

entry = 01 january 2000 or date of ESRD registration if thereafter

Once i have counts of amputations and 'population at risk' by year and agegroup, i run the following command to obtain age-standardised results by year:

foreach x of varlist lea1 {

set more off

qui dstdize `x' pop agecat, by(_year) using("2000_pop")

putexcel set lea.xlsx, sheet("`x'", replace) modify

matrix C = r(Nobs)', r(crude)'*1000, r(adj)'*1000, r(lb_adj)'*1000, r(ub_adj)'*1000, r(se)'*1000

matrix rowname C = 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

matrix list C

putexcel A1=("year") B1=("pop") C1=("`x'crude") D1=("`x'Rate") E1=("`x'LL") F1=("`x'UL") G1=("`x'SE") A2=matrix(C, rownames)

}

Overall - i show that between 2000 and 2009, rates of amputation declined but thereafter, they did not change. I am trying to explore possible reasons for this. One element i want to explore is disease duration, i.e. with increased survival in this population (leading to increased disease duration), is the 'population at risk' in more recent years different to those in earlier years such that amputations are less likely?

My question: people in my dataset have varying degrees of disease duration (with respect to date of ESRD diagnosis to dox1). On average, disease duration increased 2 years between 2000 and 2015. Is there a way to 'adjust/standardize' for disease duration in this dataset?

Many thanks

Jess ]]>

I am attempting to fit a generalised linear structure equation using panel data, with y as categorical response.

I wanted to constrain the error term in the structural equation to 0, as following

var(e.y@0)

Stata returns the message that "invalid covariance specification; e.y does not identify an error for an uncensored gaussian response with the identity link"

Does gsem include an error term by default please? Sem estimates error terms. I read the manual and couldn't find out whether gsem estimates error terms by default. Shall I include a latent variable as the disturbance please?

Thank you very much for your very kind help!!!

Boshuo

PhD student, Imperial College London]]>

Code:

local vars "i.female i.aframer i.asian i.latino i.other c.pared i.fedgrant i.sf_loansyest1" noi mi estimate, or saving(miest, replace): ologit arts `vars', cluster(schoolid) qui mi query local M=`r(M)' scalar r2=0 scalar cstat=0 qui mi xeq 1/`M': ologit arts `vars'; scalar r2=r2+e(r2_p); lroc, nog; scalar cstat=cstat+r(area) scalar r2=r2/`M' scalar cstat=cstat/`M' noi di "Pseudo R=squared over imputed data = " r2 noi di "C statistic over imputed data = " cstat

Code:

qui mi xeq 1/`M': ologit arts `vars'; scalar r2=r2+e(r2_p); lroc, nog; scalar cstat=cstat+r(area) last estimates not found

I had the command structure work for me last week on a series of logits, but for some reason, the ologit is not giving me an output.

Thanks for any help!]]>

I'm using a corruption index as an explanatory variable, This index ranks countries from 0 to 100, where 100 is the least corrupt. Instead of using the index in its current form, I'm interested in creating a dummy variable for the 25th percentile. The dummy would take the value of 1 if the country is ranked among the most corrupt (bottom 25%) and 0 otherwise.

I've found different codes online, but they all seem to be more complex than needed and don't quite work with my data.

I'd highly appreciate any help.

Best wishes,

Henry ]]>

q).

Each hospital (cluster) has its own random intercept; the variance of all the random intercepts is reported in the regression results as a random effect, the variance of the constant. In this example, the dependent variable is operative time or the duration of surgical procedures and a large random intercept represents longer duration surgeries while a small random intercept represents shorter duration surgeries.The formula applied to estimate reliability for each cluster (hospital) is : r = y/(y+q)

which is equation 2.4 from Rabe-Hesketh and Skrondal, Multilevel Modeling, vol. 1.

An example where the standard error of the individual random intercept for hospital A =

0.0502928 and the variance of all the random intercepts for the 20 hospitals = 0.0058763;An example where the standard error of the individual random intercept for hospital A =

Reliability =

0.0058763/(0.0058763 + (0.0502928^2) = 0.70

The reliability for all of the surgeon clusters can be represented graphically using a lowess curve:

The reliability for all of the surgeon clusters can be represented graphically using a lowess curve:

Array

I appreciate constructive comments.

]]>

Code:

svyset[pw=pwgtp][ gen ca_yoe02=0 replace ca_yoe02=1 if nativity==2 & yoep==2002 & st==06 svy: tab ca_yoe02, count format(%15.2g)

Second, how can I efficiently construct my desired variable (described above). I figured I would use a sequence of the state codes and create a loop using code similar to that above. I'm not sure if this is the right away to go about it, or if I am on the wrong track. I am out of my league at the moment.

P.S.

I combed through previous threads on the subject and found some relevant posts, but could not break through my own problem. I saw a similar problem concerning the differences between storing survey data and nonsurvey data, but not for the counts.]]>

I have a theoretical question. If I have Y (dep. var.), X1, X2 (indep. var.) + a dummy "RACE" in the long format, containing 3 categories (white, black, other).

What is the difference in running:

Y X1 X2 RACE

and

Y X1 X2 i.RACE

I understand the meaning of i.RACE (providing the estimates for 2 categories against the benchmark category) but I don't know what the coefficient for RACE alone means (how to interpret it). Which is the correct approach, and why? I am a little bit confused, since I found mixed opinions on the internet.

Thx

Jack]]>

Yet, I am pretty sure that I do not have 2048 variables in the file that I want to open. Is there any other reason why I might be getting this error?]]>

I have data that looks like the following:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(y x1 x2) 23 23 42 123 3 324 3 23 . 21 32 32 3 4 3 212 32 2 2 3 32 end

In the above data, I have a regressand y, which I regress on covariates x1 and x2. In the data construction, I need to work with the projections of y on x1 and x2 (as I will be using these fitted values as a regressor for an auxiliary regression). However, the constraint I face is the usual one- the prediction (y hat) is not generated for row 3, where the datapoint for x2 is missing. While I understand why this is reasonable, I still need to obtain data on that fitted value. As such, I would still like to compute the predicted value, despite the missing var3. One way to do this would be to for instance:

Code:

replace x2=0 if missing(x2) predict, xb

Thanks!

Chinmay]]>

Date 1 | Variable 1 | Date 2 | Variable 2 |

1-1-2001 | 2 | 1-1-2001 | 13 |

2-1-2001 | 3 | 2-1-2001 | 23 |

3-1-2001 | 6 | 4-1-2001 | 12 |

4-1-2001 | 8 | 5-1-2001 | 8 |

5-1-2001 | 11 | 7-1-2001 | 5 |

]]>

As part of my thesis i look at the influence of size(that is the natural logarithm of total assets) en leverage(debt/equity) on the choice for a Big Four auditor(dummy variable 1=big4 0=non big4).

I changed log-odds into the odds-ratio, because i read on the internet that the interpretation would be easier. But could someone explain to me how i should interpret the oddsratio of 2.10 bij size when it is the natural logarithm and the odds-ratio of leverage? I also read something about using margins to explain the effect, but then i get only one outcome for all the variables

so, logit Big4 size leverage and then my control variables

Table 6.Logit regression |
||||

Dependent variable: | Predicted Sign | Coef. | Odds ratio | Z-value |

Big4 | ||||

CONSTANT | -13.2069*** | 1.84 e-6 | -21.64 | |

SIZE | + | 0.7430*** | 2.1023 | 26.72 |

LEV | + | 1.7508*** | 5.7594 | 7.86 |

ROA | ? | -1.8180*** | 0.1623 | -2.95 |

ASSETSTRUC | ? | -1.0201*** | 0.3605 | -6.31 |

CASHASSET | ? | -0.5796 | 0.5601 | -1.56 |

INDUSTRY FE | YES | |||

YEAR FE | YES | |||

OBS. | 7743 | |||

PSEUDO R^{2} |
0.0971 | |||