two groups of samples, TOP (9) and BOTTOM (10), which I have classified with the dummy variable GRP, 0 and 1 respectively. I have performed experiments on these samples to measure the outcome/dependent variable OUT, measuring this at baseline, and then at 6 further timepoints (CONC, for concentration) where a drug concentration was increased incrementally at each timepoint.

I am interested to know the following:

i. is there a difference overall between groups TOP and BOTTOM? (as these are the only possible groups for an individual sample to belong to, I assume GRP is an explanatory variable rather than a level... So level 1 is the repeated measures, and level 2 is the individual sample.

ii. is there a difference in either group in the outcome variable with increasing concentration of the drug? i.e. is there a significant increase/decrease in OUT with increasing CONC (drug concentration)?

I have done quite a bit of reading, and as far as I can see, I would begin with something like

xtmixed OUT CONC

What I am not sure about is:

i. the precise order of the next steps - do I need to build the command layer by layer, e.g. first add random intercept, then random slope, then add GRP as a fixed explanatory variable?

ii. how to test the significance of the differences I am interested in (above)

iii. how to generate mean predicted values for each group (TOP and BOTTOM) at each drug concentration (CONC) so I can plot the two predicted group lines

Any help would be gratefully received.

Jem]]>

as mentioned in one of my earlier posts I am quite new to Stata and advanced econometrics. Currently, I am looking for interesting/creative ways to graphically present my regression results (coefficients). Of course displaying my data in tabular form is still the standard procedure, however, it would be great to display some results graphically as well - just as an additional way to make the data more intuitively understandable. It would be great if you could recommend some research papers or Stata commands, which would give an idea of how such an approach could look like. Maybe we can all get some nice ideas together here

Thank you very much in advance! Looking forward to your replies!

Best]]>

I am running a Feasible GLS -xtgls- model with panel heteroskedasticity and AR(1) correlation. Additionally, in order to include fixed effects, I add in a dummy variable for each company I am examining.

My problem is that after I run the model, I receive incredibly large Wald Chi-Square values (on the order of 50,000) and Log-likelihood of around -700 or so. Is this even possible? These are some of the highest Chi-square values I have ever seen.

Initially I thought this was a problem of sample size (too many variables, not enough observations), but running a reduced equation gives me more or less the same results. (On a side note, does FGLS make any assumptions or considerations on sample size? I have 79 degrees of freedom and around 450 observations...)

Any info or help on the previous questions would be really useful (reduced output included below. Not sure why, but this last model didn't give me a LL number).

:

Cross-sectional time-series FGLS regression Coefficients: generalized least squares Panels: heteroskedastic Correlation: common AR(1) coefficient for all panels (0.4707) Estimated covariances = 71 Number of obs = 1116 Estimated autocorrelations = 1 Number of groups = 71 Estimated coefficients = 86 Obs per group: min = 2 avg = 15.71831 max = 36 Wald chi2(85) = 65657.28 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ loginvscaled | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- loggh | -.2849315 .0356504 -7.99 0.000 -.3548049 -.215058 logta | .5194881 .0240318 21.62 0.000 .4723866 .5665896 logcogs | .0753178 .0153522 4.91 0.000 .0452281 .1054076 | compid | 2 | 1.951157 .1397242 13.96 0.000 1.677302 2.225011 3 | -.2511948 .199553 -1.26 0.208 -.6423115 .1399219 4 | .713452 .141207 5.05 0.000 .4366914 .9902125 5 | .3644441 .1644973 2.22 0.027 .0420353 .6868529 6 | .5342415 .1845942 2.89 0.004 .1724436 .8960395 .......

Panos]]>

I want to create a variable whose values are taken from the other variables if the values of the latter are meaningful. So, I wrote a "foreach" statement to pass those values, which however didn't work out. The following is my problem and codes. Please check it out for me.

I have a variable named "domain" which is the web addresses, such as "yahoo.com", "news.yahoo.com", "mail.google.com", "travel.yahoo.com" etc.

In order to extract the main domain name, such as "yahoo" and "google", I first split the string variable "domain" into different parts. For example, "news.yahoo.com" would be split into three new string variables, whose values are "news", "yahoo" and "com".

/* three new variables are generated, domain1, domain2 and domain3 */

Then, I want to create a new variable "domainname" that takes meaningful domain names from "domain1", "domain2" or "domain3".

replace domainname="`name'" if domain1=="`name'" | domain2=="`name'" | domain3=="`name'"

}

I cannot see why the above codes didn't work out. (Nothing happened!) Thanks for help.

Claire]]>

To compute rolling 60 month standard deviation of returns ,I ran this code but got an error:

tsset permno date

quietly rolling sd_ret = r(sd), window(60) step(1) : summarize ret

error: eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-

********************

permno date cusip ret

10008 31-Dec-85 36547310

10008 31-Jan-86 36547310

10008 28-Feb-86 36547310 0.11965812

10008 31-Mar-86 36547310 0.083969466

10008 30-Apr-86 36547310 0.028169014

10008 30-May-86 36547310 -0.01369863

10008 30-Jun-86 36547310 -0.048611112

10008 31-Jul-86 36547310 -0.153284669

10008 29-Aug-86 36547310 -0.060344826

10008 30-Sep-86 36547310 -0.155963302

10008 31-Oct-86 36547310 0.086956523

10008 28-Nov-86 36547310 0.100000001

10008 31-Dec-86 36547310 0.090909094

********************

could you tell me what I did wrong?

Thanks,

Rochelle

]]>

I am aware of these old (but still quite interesting) discussions about the relative speed of -collapse when one uses large datasets: http://www.stata.com/statalist/archi.../msg00498.html

I face the same problem today. I must collapse very large datasets several times, and using the good old -collapse works, but takes ages to complete.

I tried the Mata route, but I am fairly new to Mata.

I downloaded MOREMATA but I am not able to run a collapse with a 2-key identifier using mm_collapse

For instance, following the mm_collapse official documentation I tried to replicate

:

sysuse auto collapse price turn, by(make rep78)

:

sysuse auto mata: X = st_data(., ("price", "turn")) mata: ID = st_data(., ("make", "rep78")) mata: mm_collapse(X, 1, ID)

Is there a mistake in my code?

Many thanks!]]>

i found which variables have the problem but i cannot understand why.

any ideas? what can i do?]]>

I'm running a linear regression and planned to use outreg2 to organize the results. My version is STATA 13, fully updated, and the latest version of outreg2, and I was able to run outreg2 for several months with no issues, so this is a recent issue as of the last couple of weeks.

This is my coding:

outreg2 using myreg.doc, replace ctitle(Model1)

The error is that

Any thoughts/aid would be much appreciated.

Josh Alley]]>

I have a variable, primarycarephysicianstandard, and several variables like it, that have several observations, which I want to pull apart into separate variables, while still keeping some part of the variables as it currently exists. Specificially, the variable tabulates as:

$10

$10 copay before deductible

$10 copay before deductible and 15% coinsurance after deductible

$15 copay after deductible

$30 copay and 20% coinsurance after deductible

30%

No charge after deductible

*Note: the variable continues in this fashion with different numbers, and there are several variables like this one*

What I want to do is split the variable up into three variables so that it looks something like the following:

Each observation will have one of these rows

Primarycarephysicianstandard (original variable) | Primarycarephysician Dollar (new variable) | Primarycarephsyciain Percent (new variable) |

Dollar amount | 10 | |

Dollar copay before deductible | 10 | |

Dollar copay before deductible and percent coinsurance after deductible | 10 | 15 |

Dollar copay after deductible | 15 | |

Dollar copay and percent coinsurance after deductible | 30 | 30 |

Percent | 30 | |

No charge after deductible | ||

No charge |

Obs | Primarycarephysician | Primarycarephysician dollar | Primarycarephysician percent |

1 | Dollar copay before deductible and percent coinsurance after deductible | 115 | 15 |

2 | Dollar copay after deductible | 20 |

I have tried using 'split' and concatenating the pieces I need back together, which does not work because the text does not line up using spaces, and I cannot figure out how to add characters to delimit the text the way I want. I have also tried 'regexm' and 'generate' and 'replace' for the new variables.

Any help on this would be so very much appreciated!]]>

I have two datasets that contain industry codes, I want to see how much overlap in between. Below is the first few observations from each data.

data one

industryA

1111A0

1111B0

111200

111335

1113A0

111400

111910

111920

1119A0

1119B0

112100

data two

industryB

111200

111335

111400

111910

111920

112100

112300

113300

114100

114200

115000

the number of observation is 470 for one and 450 for two.

How should I join the two data ?

after that, should i use

gen flag=0

replace flag=1 if industryB~=industryA

thanks,

Rochelle

]]>