To give a background of what I am doing, I have a comparable (sampling design-wise) dataset for 30 countries. For each of these, I am estimating proportions and associated standard errors for the same set of K=20 CATEGORICAL variables. My goal is to create, for every country, an Excel sheet (I must stick to this program for this project) with 3 colums (category, estimated proportion, estimated standard error) and 20 consecutive blocks of rows (one block for each of the 20 variables). Each of these 20 blocks is made up of 1+q_k rows:

1) 1 row indicating the label of the k-th variable (column 1) and nothing on columns 2 and 3

2) q_k additional rows showing the name of the categories for this k-th variable (in the column 1) and its associated proportions (column 2) and standard errors (column 3).

I am sticking to the resultssets approach from Newson, R. (2004) From datasets to resultssets in Stata (http://www.rogernewsonresources.org....4/overhed2.pdf) which, basically, consists on turning the dataset into the desired statistical table. So, for each of the 30 countries, I apply a nested loop through the 20 variables which provides me the variable-especific proportions and standard errors for each category using -parmby- (the q_k rows above). After this, I use -insob- (from SSC) to insert a new observation which will have the variable label at the beginning of this block (the first row of the block). Iterating this procedure I get each of 20 the blocks described above. After appending these, I export them with -export excel- with the sheet() option. A MWE of this is

Code:

local countries_list country_1 country_2 ... country_30 local vars_list var_1 var_2 ... var_20 foreach country of local countries_list { use "Dataset `country'.dta", clear foreach var of local vars_list { local var_label: variable label `var' preserve parmby "proportion `var'", label rename(parm category estimate proportion stderr se) norestore insob 1 1 replace category="Variable `var': `var_label'" in 1 save "`country' `var'.dta", replace restore } clear foreach var of local vars_list { append using "`country' `var'.dta" } export excel category proportion se using "Descriptives.xlsx", sheet("`country'") firstrow(variables) sheetmodify }

Array

My problem is that I want to apply, for each block of each of these country-specific sheets, italics to the first row (which contains the label of the k-th variable) and indentation the q_k rows (which shows the categories of the k-th variable) in the first column. The complication arises because the range of the cells that I want to format changes in a non-uniform way between countries. This is because the number of categories for a given variable (say, var_k) might be different for every country, given a country-specific codification (e.g. geographical regions differ between countries) or, even in the case when a variable have theorethically a unique codification, a given category might be absent on particular countries. Hence, row i on a given sheet (country) might not refer to the same thing (label or categories) on different sheets. I am pretty sure that specifying these cells manually is not the way to go.

replace category="\hspace{3bp} " + category

which would give me what I want. Is there a similar way to do this in Excel? Just for the record. I am aware that the -putexcel- command allows formatting cells. However, this solution turns out to be very inneficient since it requires me to specify the range of the cells that I wish to format.

Thanks for you time]]>

I am estimating the following model:

where y and m are two continuous variables, d is a binary indicator =1 if m>0 (0 otherwise), d*m is an interaction term and X includes several covariates.

I run the following command to get the graph of the regression line:

but I believe lfit is graphing the regression line from the model without controls. Is there a way to tell lfit to take into account the covariates? I tried the alternative command binscatter but it didn't really capture the exact relation between y and m I have estimated.

Regards,

Egidio

]]>

Senior Research Associate/Research Associate '

UCL Department / Division: MRC Clinical Trials Unit at UCL

Location of position: London

Grade: 7-8

Hours: Full Time

Salary (inclusive of London allowance): Grade 7 £34,635 - £41,864 per annum Grade 8 £43,023 - £48,023 per annum

Duties and Responsibilities:

The MRC Clinical Trials Unit at UCL (MRC CTU) is an international centre of excellence for clinical trials and associated research, undertaking the design, conduct, analysis and reporting of large-scale, multi-centre studies, resolving internationally important questions in infectious diseases and cancer, and delivering swifter and more effective translation of scientific research into patient benefits. It does this by carrying out challenging and innovative studies, and developing and implementing methodological advances in study design, conduct and analysis. It hosts one of the MRC's five regional Hubs for Trials Methodology Research. The MRC CTU at UCL is a UKCRC registered trials unit and is part of the Institute for Clinical Trials and Methodology at UCL, within the Faculty of Population Health Sciences in the UCL School of Life and Medical Sciences.

Writing user-friendly statistical software is central to the dissemination of our methodology. Most of the Unit's statistical software is written for the statistical package Stata, with the aim of disseminating methodology developed in the Unit nationally and internationally.

The methodology theme has expanded considerably over recent years, enabling us to recruit to a new position to work alongside statistical methodologists in developing the Unit's statistical software. The position will involve writing new software to implement newly developed methodology; enhancing existing software by adding new functionality, correcting errors, improving documentation and supporting users; and improving the methodology theme's approach to validating its statistical software.

Key Requirements

The successful candidate will have a degree in a scientific subject, preferably a higher degree related to statistics; interest in methodological development in a biomedical environment; preferably experience or knowledge of medical statistics, experience of writing and documenting software (ideally statistical software in Stata); ability to write good code; and will work effectively with methodologists and users.

Further Details

Candidates may also be interested in three other roles currently being advertised:

Ref: 1732436: Research Associate/Assistant 'Medical Statistician - (Systematic Reviews & Meta-analysis)'

Ref: 1732458: Medical Statistician (Clinical Trials) x2

Ref: 1732462: Medical Statistician (Clinical Trials)

If you have any queries regarding the vacancy or the application

process, please contact Carole Booth (c.peel@ucl.ac.uk<mailto:c.peel@ucl.ac.uk>) or for an informal discussion regarding the post please call Prof. Ian White (Tel: + 44 (0)20 7670 4715).

For more information, please click HERE

Closing Date: 23 Jul 2018

Latest time for the submission of applications: 23:59

Our department holds an Athena SWAN Bronze award, which illustrates our commitment to addressing gender equality.

]]>

Context:

I'm working with health system data that changes over time, in both added and dropped patients, as well as status changes over time (i.e. "most recent pain assessment rating"). The example code below runs through roughly 25 datasets, each with 3000 observations and 30 variables. For variables like the pain rating above, it is sometimes coded as a "3" on a Likert-type scale or as "Moderate" pain; therefore some databases are numeric/double while others are string/str3. I'm looking for a way to generate a new var in the middle of the merge process, something like "gen newvar = oldvar if _merge==5" that way the values of the oldvar are not overwritten/turned into missing data and the newvar values are retained in their own varlist.

Code:

use "DatabaseA.dta" merge 1:1 studyId using"DatabaseB.dta", replace update force drop _merge save "DatabaseAB.dta" clear use "DatabaseAB.dta" merge 1:1 studyId using"DatabaseC.dta", replace update force drop _merge save "DatabaseABC.dta" clear use "DatabaseABC.dta" merge 1:1 studyId using"DatabaseD.dta", replace update force drop _merge save "DatabaseABCD.dta" clear

https://www.statalist.org/forums/for...udinal-dataset

Thanks!]]>

]]>

I want to calculate the average treatment effect within panel data with a panel of 3 time points and about 178 individuals for ordinal outcomes (scale of 4 or 5). The reason I mainly consider fixed-effects is that the treatment and the control group were not selected at random. Therefore I am concerned about unobserved heterogeneity.

My first approach is the hybrid model using the stata command 'xthybrid' after 'xtset id wave' in the paper from Schunck and Perales (2017). In general that worked pretty well, but I had problems with 'xtsum[...] if e(sample)' as there appears a table without any numbers and "xtgraph[...] if e(sample)" where the error message "__000002 not found" ocurred.

Therefore my second approach is the recommendation of Allison (2009) to generate variables for within and between cluster effects on my own using, e.g.

egen M_a = mean(a), by(id)

gen F_a = a - M_a

Afterwards I ran the regression with the generated variables with meglm using Stata/SE 15.1.

So my main question is:

1) What are the main differences between the calculation with

xthybrid[...], family(ordinal) link(logit) vce(robust) clusterid(id) full

and

meglm[...], family(ordinal) link(logit) vce(robust) ||id:

because the results are very close to each other but somewhat different (see Screenshot). What are the reasons/issues I have to consider, if I want to decide which calculation is more appropriate in a certain case?

2) Are there any other recommendations how I should calculate Fixed-Effects for ordinal variables with a panel of 3 time points and about 178 individuals? The only suggestion apart from xthybrid and meglm I found so far

Thank you very much in advance and all the best,

Sigi

P.S: Sorry for the Screenshot. "W_" and "F_" represent the within effect and "B_" and "M_" the between effect.

Literature:

Allison, P. D. (2009).

Schunck, R., & Perales, F. (2017). Within-and between-cluster effects in generalized linear mixed models: A discussion of approaches and the xthybrid command.

]]>

At the moment I am trying to create a variable that captures the industry sector's code that contains the most jobs in a city, for each year. I am using a dataset full of companies' information such as the city it is located in, the year, the number of jobs, the industry the company operates in, and the corresponding industry. I have created a sample of the dataset below.

I already created the variable MostJobsSector, which gives the highest number of jobs that is located by one industry, per city, per year. The following code was used:

Code:

egen MostJobsSector= max(SBIBanenGEM), by (plaats year)

1 Chicago 1997 01 5 8 8 01

2 Chicago 1997 01 3 8 8 01

3 Chicago 1997 02 2 3 8 01

4 Chicago 1997 02 1 3 8 01

5 Chicago 1998 01 4 7 9 02

6 Chicago 1998 01 2 7 9 02

7 Chicago 1998 01 1 7 9 02

8 Chicago 1998 02 9 9 9 02

9 Chicago 1998 03 6 6 9 02

10 Miami 1997 01 3 9 9 01

11 Miami 1997 01 4 9 9 01

12 Miami 1997 01 2 9 9 01

13 Miami 1997 02 8 8 9 01

14 Miami 1998 01 1 1 8 03

15 Miami 1998 02 5 5 8 03

16 Miami 1998 03 4 8 8 03

17 Miami 1998 03 4 8 8 03

]]>

For some reason, when I run this code on the dataset, egen creates a variable containing the observation number of each row instead of the maximum within observations. I just can't seem to figure out what's going on. Thank you in advance!

The code is:

Code:

egen maxv = rowmax(__*) sum maxv

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(__BS111400_p __BS111A00_p __BS112500_p maxv) .02986717 .002714447 0 1 .072100684 .06976562 .000021618145 2 4.318047e-06 .000016268317 .013628376 3 .0019975295 .02660001 .01942118 4 .00011598477 .0001928249 .00017276 5 1.0046353e-10 3.17521e-10 .02495834 6 .00027324088 .00020779375 .00002767016 7 .006443177 .012688478 .008952809 8 .025992706 .003201397 .00483725 9 .00007730682 .00003245989 .00002056536 10 end

I used Cox-Snell residual graph but need quantitative measure.

Any idea?

Thanks

]]>

I'm writing a master thesis on the effect of a policy change on tax avoidance. The rule only applied to about 500 observations (companies) in my dataset, which in total contains around 80 000. I would like to identify a control group of about equal size.

Relevant matching variables are the continuous variables

Is it possible to perform an excact match on the binary variables and nearest-neighbor matching or equivalent on the continuous variables? I tried teffects nnmatch but couldn't figure out how to match without replacements.

Any help is greatly appreciated.]]>