My data looks as follows:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int unitid str18 firmid int time float Investment 1 "0876711D LN Equity" 2003 . 1 "0876711D LN Equity" 2004 . 1 "0876711D LN Equity" 2005 . 1 "0876711D LN Equity" 2006 . 1 "0876711D LN Equity" 2007 . 1 "0876711D LN Equity" 2008 . 1 "0876711D LN Equity" 2009 .04298013 1 "0876711D LN Equity" 2010 -.016369443 1 "0876711D LN Equity" 2011 -.012404328 1 "0876711D LN Equity" 2012 -.05723596 1 "0876711D LN Equity" 2013 . 1 "0876711D LN Equity" 2014 . 1 "0876711D LN Equity" 2015 . 1 "0876711D LN Equity" 2016 . 1 "0876711D LN Equity" 2017 . 1 "0876711D LN Equity" 2018 . 2 "1218069D LN Equity" 2003 . 2 "1218069D LN Equity" 2004 .4360955 2 "1218069D LN Equity" 2005 .26635048 2 "1218069D LN Equity" 2006 .23388498 2 "1218069D LN Equity" 2007 .21420796 2 "1218069D LN Equity" 2008 .2275067 2 "1218069D LN Equity" 2009 .16295853 2 "1218069D LN Equity" 2010 .11736838 2 "1218069D LN Equity" 2011 .0952867 2 "1218069D LN Equity" 2012 . 2 "1218069D LN Equity" 2013 . 2 "1218069D LN Equity" 2014 . 2 "1218069D LN Equity" 2015 . 2 "1218069D LN Equity" 2016 . 2 "1218069D LN Equity" 2017 . 2 "1218069D LN Equity" 2018 . 3 "1334987D LN Equity" 2003 . 3 "1334987D LN Equity" 2004 .005009345 3 "1334987D LN Equity" 2005 .016525032 3 "1334987D LN Equity" 2006 .009007737 3 "1334987D LN Equity" 2007 .0027638024 3 "1334987D LN Equity" 2008 .025354864 3 "1334987D LN Equity" 2009 .007856831 3 "1334987D LN Equity" 2010 .01920615 3 "1334987D LN Equity" 2011 . 3 "1334987D LN Equity" 2012 . 3 "1334987D LN Equity" 2013 . 3 "1334987D LN Equity" 2014 . 3 "1334987D LN Equity" 2015 . 3 "1334987D LN Equity" 2016 . 3 "1334987D LN Equity" 2017 . 3 "1334987D LN Equity" 2018 . 4 "1561649D LN Equity" 2003 . 4 "1561649D LN Equity" 2004 .2383473 4 "1561649D LN Equity" 2005 .23322147 4 "1561649D LN Equity" 2006 .2747023 4 "1561649D LN Equity" 2007 . 4 "1561649D LN Equity" 2008 . 4 "1561649D LN Equity" 2009 .16537003 4 "1561649D LN Equity" 2010 .4768854 4 "1561649D LN Equity" 2011 .14333837 4 "1561649D LN Equity" 2012 .13682278 4 "1561649D LN Equity" 2013 .11177154 4 "1561649D LN Equity" 2014 .2387057 4 "1561649D LN Equity" 2015 . 4 "1561649D LN Equity" 2016 . 4 "1561649D LN Equity" 2017 . 4 "1561649D LN Equity" 2018 . 5 "1638414D LN Equity" 2003 . 5 "1638414D LN Equity" 2004 .12095035 5 "1638414D LN Equity" 2005 .1120581 5 "1638414D LN Equity" 2006 .10006714 5 "1638414D LN Equity" 2007 .11736046 5 "1638414D LN Equity" 2008 -.1074682 5 "1638414D LN Equity" 2009 .10434808 5 "1638414D LN Equity" 2010 .0954181 5 "1638414D LN Equity" 2011 .14149284 5 "1638414D LN Equity" 2012 .28253892 5 "1638414D LN Equity" 2013 .15810315 5 "1638414D LN Equity" 2014 .03957232 5 "1638414D LN Equity" 2015 . 5 "1638414D LN Equity" 2016 . 5 "1638414D LN Equity" 2017 . 5 "1638414D LN Equity" 2018 . 6 "1655637D LN Equity" 2003 . 6 "1655637D LN Equity" 2004 .002285661 6 "1655637D LN Equity" 2005 .0007066662 6 "1655637D LN Equity" 2006 .0010206974 6 "1655637D LN Equity" 2007 .001707869 6 "1655637D LN Equity" 2008 .010133424 6 "1655637D LN Equity" 2009 .05300895 6 "1655637D LN Equity" 2010 .003342496 6 "1655637D LN Equity" 2011 .0040739654 6 "1655637D LN Equity" 2012 .0010755167 6 "1655637D LN Equity" 2013 .002353301 6 "1655637D LN Equity" 2014 .0014642834 6 "1655637D LN Equity" 2015 .003463925 6 "1655637D LN Equity" 2016 -.1074682 6 "1655637D LN Equity" 2017 . 6 "1655637D LN Equity" 2018 . 8 "3572335Q LN Equity" 2003 . 8 "3572335Q LN Equity" 2004 .09133787 8 "3572335Q LN Equity" 2005 .1582893 8 "3572335Q LN Equity" 2006 .18906696 end

I have a data base like this one

FID Time C UGM t 0 1/1/2015 25 72749 02/3634 0 1/2/2015 16 72749 09/3636 0 1/3/2015 12 72749 01/3639 0 1/4/2015 19 72749 08/3641 0 1/5/2015 14 72749 02/3644 0 1/6/2015 15 72749 09/3646 0 1/7/2015 19 72749 03/3649 0 1/8/2015 17 72749 10/3651 0 1/9/2015 18 72749 05/3654 0 1/10/2015 24 72749 11/3656 0 1/11/2015 19 72749 06/3659 0 1/12/2015 73 72749 12/3661 0 1/1/2016 79 72749 07/3664 0 1/2/2016 37 72749 02/3667 0 1/3/2016 26 72749 07/3669 0 1/4/2016 42 72749 02/3672 0 1/5/2016 35 72749 08/3674 0 1/6/2016 37 72749 03/3677 0 1/7/2016 28 72749 09/3679 |

My code is

Code:

gen t = date(Time,"DMY") format %tmNN/CCYY t

Thanks]]>

I have the following daily data shape:

date day city_1_price_A ... city_70_price_A city_1_price_B ... city_70_price_B Crude1 Crude2 avg_price_A

04jun2007 monday 115.4 ... 115.4 114.4 ... 113.9 44.05 47.20 112.51

05jun2007 tuesday 115.4 ... 115.4 114.4 ... 113.4 43.90 47.73 112.02

06jun2007 wednesday 115.4 ... 115.4 114.4 ... 113.4 43.89 47.57 111.69

where the variables city_[i]_price_A and city_[i]_price_B correspond to the average daily gas price before and after taxes and margins for the city i; Crude[i] is the daily price of crude i; and where avg_price_A is the daily average of city_1_price_A through city_70_price_A.

I am trying to reshape my data so it becomes compatible with the command xtset. I tried using reshape long, but it does not work as I do not have a common price_A variable for all cities, but instead a price variable for each city. In the same vein, I do not have an ID variable specific to each city. Could any of you give me an advice?

Thank you!

Julien]]>

I have an FMM model with three class. Because the dv is binary, logistic regression is used. Let's say, I have a five variables in my main model:

Code:

fmm 3 , lcprob(X Y Z) : logit dv V1 V2 V3 V4 V5

Thanks]]>

I am new to this forum and want to ask for your opinion on a time-series regression (2006-2018) I want to perform using Stata. My regression should have a binary dependent variable outcome (either 0 or 1) and be regressed on endogenous and exogenous factors. The dependent variables are companies in the technology sector that are publicly traded. They either have a 0 when they are listed or get assigned a 1 when they get bought out. I want to regress the binary output on firm-level data such firm profitability, rank of company in sector and other firm-level data and on macro-economic data such as GDP growth, dry powder in the market etc. My data table will be in this approx. format.

Year | Company Code | Profitability | Rank | GDP growth | Dry powder | Binary output | Leverage |

2008 | GICX | 0.5 | 150 | 1% | 5bn | 0 | 45% |

2009 | GICX | 0.4 | 120 | 2% | 10bn | 0 | 40% |

2008 | PRO | 0.5 | 160 | 1.8% | 15bn | 0 | 38% |

2009 | PRO | 0.7 | 170 | 2.3% | 20bn | 1 | 35% |

I hope it is clear what I want to achieve and appreciate your help. Many thanks in advance.

Best,

Anna ]]>

I wanted to create an indicator variable combining two or more categorical variable. So for one of my variables i want to combine the variable gender (female=1 and male=0) and race (white=1, black=2, Hispanic=3 and other=4) to create 12 variables. Generally, when i create indicator variables using one categorical variable, i usually do one of the three options: manually generate variable or tabulate with generate ( ) or make a loop with the generate command. I tried to do this using 2 variables using the second or third method, but i kept getting an error from STATA. Is there an easy way to generate indicator variables using 2 or more categorical variables?

]]>

Hi. This is the first time I post questions on the forum. I apologize if my post is not sufficient informative or not in the best format.

I would like to estimate a model with both sample selection bias and endogenous treatment variable issue. I am concerned with three variable: Y, the outcome variable; I, the indicator of being selected into the treatment group; and Z, a continuous treatment variable. My goal is to evaluate the effect of Z on Y, but I encountered two problems

(1) Not everyone in my sample received the treatment Z. Some respondents self-select to the treatment group (I == 1) but the others do not (I == 0). In other words, I have something like this:

Code:

Z = . if I == 0 Z ∈ [1,10] if I == 1

I am a bit of puzzled by what methods should I use to account for my problems. In particular, would like to deal with the problem (1) because I would like to know "what are the effects of the Z for those who are not treated at all (I == 0)". I do not want to use I as the treatment variable, neither do I want to exclude all of those who have I == 0, because I believe that I is associated with Y in some ways. But I find that problem (1) is not a typical sample selection issue that Heckman selection model is used for, because the outcome variable Y is observed for the group with I == 0 as well. Could I use Heckman selection model in this case? If so, could I just include Inverse-Mills Ratio in 2SLS for adjustment?

For a real-life example, I could stand for childlessness, Z could stand for the relationship with children, and Y could stand for mental health. People self-select to be childless, but I am wondering what the effects of relationship with children on the childless people are had they have children.

I would appreciate if anyone could give me some advice on what models I should use or what theoretical frameworks should I use to understand my research questions.

Sincerely

Boyan

]]>

I wish to loop n times, say:

forvalues i = 1(1)n{

and n is defined by the last number of id:

egen id = group(firm)

May I kindly ask how to write this code please?

]]>

I have estimated the VAR(1) model of the form: z

Where z

N

and

N

where e1' is supposed to pick the first element of

and

State vector variables are R_Me (log excess market return), TY (term yield), PE (price earings ratio), VS (value spread)

How can I obtain error terms for each montly observation separately? Is there a way to back out the u

Any help will be much appreciated!

* Example generated by -dataex-. To install: ssc install dataex clear input long Date double(R_Me TY PE VS EtR_Me N_dr N_cf Rrf FFS2BM1 FFS3BM1 FFS4BM1 FFS5BM1 FFS5BM5 RISK10 RISK14 RISK17 RISK19) 199011 .058061626 .67 2.9222078 1.6413188 .0035540038 .062132986 -.0076253644 .005849983 .1183 .1051 .0897 .0646 .0435 .12173959 .10203556 .036494877 .095742293 199012 .023693474 .75 2.9608154 1.6537005 .0090106973 .0178459 -.003163123 .005433334 .0435 .0645 .042 .0325 .0079 .05887135 .048686188 .023024607 .04431314 199101 .042781392 .96 2.9476627 1.664684 .0055701994 -.0051041534 .042315346 .005116646 .0977 .0935 .0841 .0516 .0395 .11749612 .099675956 .01321588 .09292419 199102 .068041102 1.06 3.0515163 1.6708693 .0087658785 .097320626 -.038045402 .004933343 .1229 .0924 .0889 .0815 .08 .11978292 .099244337 .077034147 .094222122 199103 .023502177 1.11 3.0757451 1.6804807 .010210728 -.0010602499 .014351699 .004883342 .0761 .0743 .0481 .0455 .012 .049943459 .028751036 .030019418 .032099285 199104 -.0013861052 1.23 3.0923817 1.6590837 .0058615677 -.0052015584 -.0020461145 .004741713 -.0132 -.0289 -.0078 -.0001 -.0019 -.01046825 -.016978515 -.015338632 .011341767 199105 .035248183 1.5 3.084884 1.67436 .004309962 .0082276908 .02271053 .004549997 .0544 .0376 .0658 .0397 .0686 .058022122 .056415447 .043939377 .067192412 199106 -.049887627 1.42 3.0835875 1.7203005 .0094085029 -.027222457 -.032073673 .004649961 -.0574 -.0516 -.0528 -.0428 -.0395 -.079378644 -.07320167 -.038958906 -.047176416 199107 .041101192 1.71 3.0866172 1.7329534 .00031467686 .042206667 -.0014201516 .00466666 .0424 .0598 .0591 .0643 .0081 .073272477 .050325015 .076738999 .056630268 199108 .022064351 1.61 3.1083668 1.7328576 .010546596 .0053573976 .0061603572 .004500037 .0387 .0546 .0166 .0438 .0176 .039886239 .021745287 .03410797 .034028991 199109 -.015428971 1.23 3.1014802 1.7581476 .007799295 -.014220606 -.0090076597 .004316644 .0064 -.0012 -.0037 -.021 -.0382 -.0010723043 -.029284393 -.0073710169 -.0080010347 199110 .013373658 1.65 3.0994341 1.7707362 .0016168025 .010040765 .0017160908 .004199968 .0239 .0297 .0253 .0147 .005 .0012886541 .0027192137 .024965763 .040451829 199111 -.041675451 1.65 3.0957319 1.7764372 .0068914496 -.027300574 -.021266327 .003699999 -.0502 -.0355 -.0244 -.0121 -.0831 -.051966509 -.039022418 -.01342243 -.046044798 199112 .09856959 1.68 3.1021296 1.801241 .0016835556 .069642376 .027243658 .00312507 .1093 .1452 .1337 .1534 .1286 .14410748 .11005604 .14353534 .12990558 199201 -.0048462536 1.89 3.1703972 1.7699284 .01469567 -.0038516738 -.01569025 .0032 .0842 .015 -.0026 -.0325 .0675 .030558833 .041570478 -.040628776 .021325619 199202 .0099079047 1.68 3.1616104 1.7086906 .0056947576 -.019679887 .023893034 .003299991 .0072 .0007 -.0053 .0022 .1012 .036302385 .04335575 .0057724954 .04176327 199203 -.027359047 1.66 3.1482634 1.6515203 .0066343946 -.046236251 .012242809 .003399936 -.0849 -.0526 -.0563 -.0287 .008 -.041113085 -.028939616 -.026796741 -.022945689 199204 .010674801 2.12 3.147724 1.6043993 .0038999019 -.0046238379 .011398737 .003075019 -.0707 -.0527 -.0191 -.0011 .0704 -.024273608 -.0060293406 .0116734 -.0067946755 199205 .0033786935 2.05 3.1650627 1.5754549 .01102512 -.0028647191 -.0047817072 .003125016 -.0035 .002 .0006 .0131 -.0003 .0028963673 -.00037492673 .011265314 -.0079230652 199206 -.022481149 2.2 3.1478011 1.4847344 .010000813 -.059766014 .027284052 .003058318 -.0539 -.0488 -.0182 -.0308 .0072 -.036795002 -.026061295 -.030658052 -.014513251 199207 .036516521 2.54 3.1629036 1.4758222 .0099050674 .023913187 .0026982671 .002650003 .031 .0523 .0535 .0486 -.0033 .042440025 .026477158 .051032197 .042972537 199208 -.023591751 2.47 3.1684525 1.4582289 .017535883 -.037485448 -.003642186 .00261665 -.0401 -.0441 -.0128 -.0058 -.088 -.035956176 -.039910077 -.010938171 -.036942995 199209 .0099214541 2.72 3.1676465 1.4569487 .011560726 .0012279816 -.0028672533 .002424986 .0205 .0302 .0033 .0083 .0019 .035946488 .029197342 -.0065920437 .031592421 199210 .008369042 2.67 3.1511369 1.4829317 .016347348 -.020301042 .012322736 .002475039 .0479 .0597 .0529 .0201 -.0013 .061445302 .021498778 .0060441428 .030091503 199211 .03665324 2.62 3.1737824 1.5110869 .015794764 .027320823 -.006462348 .002725014 .0948 .0946 .0717 .0368 .0852 .085926641 .079620172 .02495823 .060305642 199212 .014727906 2.61 3.2005716 1.4884234 .017462875 -.0056763363 .0029413675 .002683292 .0159 .0341 .0168 -.0071 .0284 .016971998 .033786524 .0077716184 .040448992 199301 .0097941412 1.98 3.196606 1.4406005 .015228246 -.024033194 .018599089 .002483298 -.0059 -.0212 .0123 -.0241 .0928 .029637817 .053976633 -.02022988 .024467618 199302 .0029814606 2 3.2083474 1.3626069 .011387599 -.022881827 .014475688 .002466725 -.0829 -.0805 -.0529 -.0306 .0209 -.035884624 .010879687 -.0027664923 -.0019350573 199303 .022255393 2.02 3.2238624 1.3478493 .011678315 .0085172658 .0020598123 .002449972 .0113 .0466 .0301 .0067 .0344 .020475056 .028704573 .0081915352 .047276357 199304 -.028227883 2.18 3.2045645 1.3356704 .013583876 -.052764174 .010952415 .002399998 -.0309 -.0451 -.0484 -.0599 .0115 -.038069128 -.046358244 -.03604462 -.017833153 199305 .026448772 2.03 3.2060167 1.3699759 .01027856 .027506062 -.01133585 .002550011 .0732 .0761 .0706 .0313 -.0053 .064019802 .058474132 .020602644 .02753563 end]]>

Perhaps this is too obvious of a question, but when running the 2sls command on my model I get this error " [my vars] not endogenous in original model". I take this to mean that Stata does not believe endogeneity is an issue in my OLS regression?

Is this what Stata means?

thanks!

Donovan]]>

I am using version 14.2. The following example illustrates:

sysuse lifeexp, clear

keep if lexp>70 & lexp<79 & region==1

rename lexp byvar

rename country catvar

graph hbar, over(catvar) by(byvar)

By default, the category axis (unlike the y-axis) is repeated across subplots. There is no

]]>

I want to estimate centiles for a single observation after using a finite mixture model (Mixture of 4 Gaussians/normal distributions).

The aim is to graph those values later on.

Below I do the same for an OLS regression and a Quantile Regression.

First I expand the data set by 99 to calculate 99 centiles for each observation

Id2 is the identifier for an observation and idq the quantile.

For the OLS regression I simply take the point estimate and add the centile of the residual.

For the Quantile regression I use the predicted value for the given quantile directly.

My question is: How do I proceed the same way if I have estimated classprobabilities, classpostprobabilities and point estimates given a latent class for each observation. As for OLS the assumption of normality within each group holds.

I'm stuck but somehow I should be able to

1. estimate the parameters of the normal distributions

2. calculate the centiles for a given observation by knowing the classprobabilities for the given observation and the parameters of the normal distributions.

Code:

regress $y3 $x1 $x2 $x3 $x4 predict OLS predict OLSres, res gen id2 = _n expand 99 gen QRden = . gen OLSden = . gen FRMden = . bysort id2: gen idq = _n forvalues q=1/99{ local fq = `q'/100 qui qreg $y3 $x1 $x2 $x3 $x4 if idq==1, q(`q') capture drop aux predict aux replace QRden = aux if idq==`q' qui centile OLSres, centile(`q') replace OLSden = OLS + r(c_1) if idq==`q' }

Any help is appreciated.

Kind regards

Steffen Plützke]]>

I'm wishing to calculate a sample size based on the comparison of means between two groups, adjusting for a baseline measure, i.e. using an ANCOVA

I used to do this with sampsi, but power is now the recommended alternative. I do not seem to be getting the same answer if I use "sampsi" with the pre() argument or if I use "power repeated". I show some example code below. Any assistance for where I'm going wrong or have misunderstood the power repeated command would be much appreciated.

Code:

** Using Stata 15.1 ** Say I had two groups, with means 0.74 at baseline (in both groups) and SD=0.18 (again in both groups) ** At follow-up, we wish to calculate the sample size based on a change in the means to 0.76 and 0.74 for the two groups ** The correlation between baseline and follow-up measures is 0.5 ** For 80% power, alpha=0.05, using the pre(1) argument to specify 1 baseline measure sampsi 0.76 0.74, sd(0.18) power(0.8) pre(1) r01(0.5) ** This gives the following answers: POST 1,272 per group, CHANGE 1,272 per group, ANCOVA 954 per group ** If I try and use the power repeated command I get a very different answer that does not correspond with any of the three analyses above. matrix M = (0.74, 0.76 \ 0.74, 0.74) power repeated M, power(0.8) corr(0.5) varerror(0.0324) factor(between) ** 3,816 per group ** Furthermore, if I set the correlation between the baseline and follow-up measures to zero I thought I might get the same sample size using a power twomeans calculation as using ANCOVA ** using sampsi sampsi 0.76 0.74, sd(0.18) power(0.8) pre(1) r01(0) ** 1272 per group using ANCOVA ** using power twomeans power twomeans 0.76 0.74, sd(0.18) power(0.8) ** 1273 per group -- this agrees with sampsi, pre() above ** using power repeated matrix M = (0.74, 0.76 \ 0.74, 0.74) power repeated M, corr(0) varerror(0.0324) factor(between) ** 2544 per group

]]>

I am modelling household water use data (i.e., total annual household water use). The distribution of this variable is log-normal -- but rather than log-transforming this outcome variable, I have elected to use a Poisson model w/robust standard errors (see: https://blog.stata.com/2011/08/22/us...tell-a-friend/).

My question here is concerning model fit/comparison across models. I am wondering if AIC and BIC (as estimated by Stata's

Thanks very much for your time,

Matt]]>

Is there a command in Stata (I have 15.1) that can accommodate fixed effects (city and age_category) as well as account for optional time series issues? I'm not even sure how to test for serial auto-correlation here.. Any idea?

Thanks,

Anat

]]>