I am working on a Willingness to Pay study where we used a payment card-- so I have interval dollar amounts as my response variable. I am trying to use STINTREG to model WTP with a Weibull distribution but am having trouble interpreting the coefficients. I was wondering if there was any way to get the coefficients to be comparable to a interval tobit regression, which is another specification I am trying.

Thanks!]]>

I'm analyzing patent data for my thesis. I have a dataset with unique patents from 1999-2004, so no duplicates. I'd like to run two different regressions with two fixed effects. The first fixed effect is a year fixed effect, from 1999 until 2004. The second is a regional fixed effect based on the CBSA location of the first inventor of the patent. But I have my doubts on the way to execute it.

1st regression: poisson regression( because it is a count data variable)

Number of inventors in patent = indepvar+ Year fixed effect + regional fixed effect

2nd regression: lineair regression

Depvar(i.e. a probability) = indepvar+ Year FE+ regional FE

If my research is right there are 2 different ways of setting the fixed effect:

1) adding i. :

poisson depvar indepvar i.year i.cbsa

regression depvar indepvar i.year i.cbsa

2) via panel data:

xtset cbsa year

xtpoisson depvar indepvar year cbsa, fe

xtreg depvar indepvar year cbsa, fe

My questions:

-is there a preference between the 2 possibilities? should I expect a difference in the outcome between the 2? for example on Rsquared or significance

-I'm I allowed to set it as panel data? The patent-id is only included in the data set once, not reoccurring throughout the years

Thanks

Ludo

]]>

I have a pretty large dataset with about 472 observations and 166 variables. It's government healthcare data on Accountable Care Organizations (ACOs). Each organization is an observation row and each variable is a column. Some of these organizations have partner groups, and I'd like a to create a 'dummy' variable to indicate this. Like 0 if not, 1 if yes. It would be the 167th variable.There is a variable (1st) for ACO names, however it is a string variable as the names are spelled out. Which ones have partners and which ones don't isn't part of this dataset, hence why I'd like to add it. I have a list of the partnered groups so what would be the fastest/most efficient way to do this for the select entries it applies to?

Thanks!]]>

I'm using stata to do IV with LASSO selected instruments. I use the following code to run the regression:

ivlasso cm ( i.edu i.region urban cost i.hc1 i.hc2 i.hc3 food_cons nfood_cons) (ir_part_any = dr5-drt25 dr_1015-dr2_0614 any_shock1014 all_shock1014 most_shock1014 any_shock0614 all_shock0614 most_shock0614 dr2_0612 any_shock0612 all_shock0612 most_shock0612) if female==1 & cost>0 & age>=18, cluster(id_geo_ex) first idstats

where cm is the outcome variable and ir_part_any is the endogenous variable.

Can anyone tell me how to export the result of this regression to latex? Thanks a lot!

Best,

Jiajing]]>

reshape wide X, id(id) j(event_time)

Thanks!]]>

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str16 firstauthor byte tp float fp byte(fn tn) "Derlin 2013" 5 3 5 18 "Sachpekidis 2015" 2 1 3 2 "Basha 2018" 6 2 2 12 "Cascini 2013" 7 4 2 16 "Zamagni 2007" 6 2 8 6 "Derlin 2012" 65 14 54 64 "Sager 2011" 1 0 1 8 end

Code:

metandi tp fp fn tn

invsym(): matrix has missing values

r(504);

r(504);

To try to understand what is going on under the hood to get this error, I ran the following command:

Code:

metandi tp fp fn tn, detail

Refining starting values:

Iteration 0: log likelihood = -28.696589 (not concave)

Iteration 1: log likelihood = -26.381488 (not concave)

Iteration 2: log likelihood = -24.186128

Iteration 3: log likelihood = -23.026692

Performing gradient-based optimization:

Iteration 0: log likelihood = -23.026692

Iteration 1: log likelihood = -22.455397

Iteration 2: log likelihood = -22.443629

Iteration 3: log likelihood = -22.443627

Mixed-effects logistic regression Number of obs = 14

Binomial variable: _metandi_n

Group variable: _metandi_i Number of groups = 7

Obs per group:

min = 2

avg = 2.0

max = 2

Integration points = 5 Wald chi2(2) = 55.41

Log likelihood = -22.443627 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

_metandi_t~e | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

_metandi_d1 | .2043004 .1555728 1.31 0.189 -.1006167 .5092175

_metandi_d0 | 1.578185 .2154021 7.33 0.000 1.156005 2.000366

------------------------------------------------------------------------------

------------------------------------------------------------------------------

Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]

-----------------------------+------------------------------------------------

_metandi_i: Unstructured |

sd(_metan~1) | 4.47e-08 .1778994 0 .

sd(_metan~0) | 7.12e-08 .2327559 0 .

corr(_metan~1,_metan~0) | -.1365545 1869501 -1 1

------------------------------------------------------------------------------

LR test vs. logistic model: chi2(3) = 0.00 Prob > chi2 = 1.0000

Note: LR test is conservative and provided only for reference.

invsym(): matrix has missing values

Iteration 0: log likelihood = -28.696589 (not concave)

Iteration 1: log likelihood = -26.381488 (not concave)

Iteration 2: log likelihood = -24.186128

Iteration 3: log likelihood = -23.026692

Performing gradient-based optimization:

Iteration 0: log likelihood = -23.026692

Iteration 1: log likelihood = -22.455397

Iteration 2: log likelihood = -22.443629

Iteration 3: log likelihood = -22.443627

Mixed-effects logistic regression Number of obs = 14

Binomial variable: _metandi_n

Group variable: _metandi_i Number of groups = 7

Obs per group:

min = 2

avg = 2.0

max = 2

Integration points = 5 Wald chi2(2) = 55.41

Log likelihood = -22.443627 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

_metandi_t~e | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

_metandi_d1 | .2043004 .1555728 1.31 0.189 -.1006167 .5092175

_metandi_d0 | 1.578185 .2154021 7.33 0.000 1.156005 2.000366

------------------------------------------------------------------------------

------------------------------------------------------------------------------

Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]

-----------------------------+------------------------------------------------

_metandi_i: Unstructured |

sd(_metan~1) | 4.47e-08 .1778994 0 .

sd(_metan~0) | 7.12e-08 .2327559 0 .

corr(_metan~1,_metan~0) | -.1365545 1869501 -1 1

------------------------------------------------------------------------------

LR test vs. logistic model: chi2(3) = 0.00 Prob > chi2 = 1.0000

Note: LR test is conservative and provided only for reference.

invsym(): matrix has missing values

Please let me know if there is a way around the missing matrix values. I am able to get the pooled sensitivity and specificity (my outcome of interest) with

Best,

Sharath Rama]]>

Manual: "estat classification requires that the current estimation results be from logistic, logit, probit, or ivprobit"

Thank, regards]]>

how to conduct the following model in stata? It's simultaneous equations with two way fixed effects.

two fixed effects: id and year

Two equations:

y1=x1+x2+x3+y2

y2=y1+x2+x3+y1

Thanks!

]]>

Code:

projmanager x.stpj

I am using the function “Catplot” on Stata and I am having troubles to get out the right figure that I want to present. My variables are: employed (binary), parental education (low, medium, high) and time (0,1). The time variable represents two different cross-sectional waves (2005 and 2011). My aim is to compare the level of employment by different social backgrounds between the two waves. For instance, the number of individuals who have a low parental background and were no employed in 2005 is 16 (28%) compared with 41 (72%) who were employed. On the other hand, the number of individuals who have a low parental background and were not employed in 2011 is 27 (24%) compared with 87 (76%) that were employed.

What I would like from “Catplot” to produce is a graph that shows people who were not employed in both waves by their different parental background on one side of the graph. For example for one category, I would like to see: Not employed 2005: (28%(16/55)) and Not employed 2011: (24%(27/114)).

The output that I am getting is including all the frequencies together (16+41+27+87+74+102+201+260+179+201+747+677=2612). These are the frequencies of being employed by different social background in two waves (2x3x2=12 categories). Hence, the graph shows: Not employed 2005: 0.6% (16/2612) and Not employed 2011: 1.6% (41/2612).

I tried several ways to find the solution by using the option “by”, with no success.

Here is the function that is producing the latter result:

catplot time parents_educ employed percent blabel(bar, position(time) format(%9.1f)) title ("Germany") var1opts(gap(0)) recast(bar) bar(1, blcolor(red) bfcolor(red)) asyvars bar(1, color(black))

Thanks in advance]]>

I have panel data, which looks like the following:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long directorid str16 securitydescription float(date sellD) 2009 "Com" 462 0 2009 "Com" 527 1 2009 "Com" 532 1 2009 "Com" 559 0 2010 "Com" 462 0 2010 "Com" 577 1 2010 "Com" 578 1 2163 "Com" 672 0 3688 "Com" 545 1 3688 "Com" 550 1 3689 "Com" 399 0 3689 "Com" 400 0 3689 "Com" 405 0 3689 "Com" 410 0 3689 "Com" 438 0 3689 "Com" 440 0 3689 "Com" 447 0 3689 "Com" 450 0 3689 "Com" 459 0 3689 "Com" 480 0 3689 "Com" 481 0 3689 "Com" 490 0 3689 "Com" 493 0 3689 "Com" 502 0 3689 "Com" 504 0 3689 "Com" 505 0 3689 "Com" 510 0 3689 "Com" 512 0 3689 "Com" 516 0 3689 "Com B" 526 0 end format %tm date

Code:

bys directorid securitydescription date : gen D=1 if sellD==1&L12.sellD==1&L24.sellD==1&l48.sellD==1

But the code does not work. Stata reports "not sorted". I know Stata may not be able to handle two cross-section ids, but what is the most efficient way to achieve the same effect?]]>

I am working on extracting information from text data and was wondering if there is a way to count the number of non-missing variables in each row. The following is what my dataset looks like. I would like a variable that shows how many variables are non-empty by row (e.g. 2 in row 1, 2 in row 2, so forth).

One option I considered is to encode the string variables and use egen's -rownonmiss-. However, that seems a bit roundabout. Is there an alternative?

Thanks very much!

Krishna

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str7 var1 str6 var2 str5 var3 "asdfdgf" "" "mnbvc" "" "qwerty" "mnbvc" "" "qwerty" "" "asdfdgf" "qwerty" "mnbvc" "asdfdgf" "qwerty" "mnbvc" end

Code:

quietly xtreg yield co2 co2_2 temp temp_2 co2_temp, fe estimates save w1, replace quietly xtreg yield co2 co2_2 temp temp_2 co2_temp rgdpo_pc, fe estimates save w2, replace quietly xtreg yield co2 co2_2 temp temp_2 co2_temp rgdpo_pc hc, fe estimates save w3, replace

Code:

outreg2 [w1 w2 w3 m1 m2 m3] using table1, tex replace tfmt(type) ctitle("(1)";"(2)";"(3)";"(4)";"(5)";"(6)")

Code:

estimates restore w1

My main questions are: how do I properly store regression estimates so that I can recall them into the environment and use like regular post-regression estimates?]]>