version 10

sysuse auto

gen dummy=0

gen string="AMC"

local kw Buick Ford "Audi Fox" Impala

local N: word count `"`kw'"'

local i=1

while `i'<=`N' {

local n: word `i' of `kw'

gen X1=regexm(make,`"`n'"')

gen X2=regexm(string,`"`n'"')

replace dummy=1 if X1==1 | X2==1

drop X1 X2

local i =`i'+1

}

drop if dummy==0

]]>

I get estimates for the coefficients but no errors or p-values. The model runs without problems if I either leave out the svy command or the random effect. It will also run with just the categorical variable. Any suggestions on how to make this work?

My other problem is how to compare models. Likelihood ratio test are not possible with survey estimations ("lrtest is not appropriate with survey estimation results"*/) and estat ic to obtain AIC is also not possible with this type of model. (/*"invalid subcommand ic"*/).

Thanks for any comments.

]]>

Code:

graph box AFP, over(pyap_dic) title(AFP and pYAP)

Array

There is a big gap between 1500 and 4200 so I would like to create a scale break before showing the two "outliers".

I did read http://www.stata.com/support/faqs/gr.../scale-breaks/. My understanding from that website is that although scale breaks are not recommended, they can still be created.

so I tried:

Code:

gen AFP_break = cond(AFP == 4000, 0, AFP) label def AFP_break 0 "4000" label val AFP_break AFP_break label var AFP_break AFP graph box AFP_break, over(pyap_dic) ylabel (0 3000 7000, valuelabel) title(AFP and pYAP) yline(4000)

Array

Which is not what I want. I only want the scale to be shorten between 1500 and 4200.

Is it possible to do so?

Thank you,

JM Giard

]]>

Dear all,

I would like to kindly ask for suggestions on the following imputation/random assignment problem:

Let’s assume that a categorical variable X (values are 1,2,3,4) has the following distribution (based on ‘reference data’):

X | % | Cum. % | |

1 | 20% | 20% | |

2 | 20% | 40% | |

3 | 30% | 70% | |

4 | 30% | 100% | |

Total | 100% |

sysuse nlsw88, clear

// Further, let's assume that the frequency weights are distributed in the following (very unequal) way:

gen weight = (2.25/(_n^(2.5)))*20000

//Now, I generate the new variable X and randomly assign its values to the observations, replicating the initial distribution:

set seed 123

gen random = runiform()

gen x=0

replace x=1 if random < .2

replace x=2 if inrange(random,.2,.4)

replace x=3 if inrange(random,.4,.7)

replace x=4 if random>.7

//Replicating the initial sample distribution has worked relatively well in the unweighted case:

tabulate x

X | Freq | % | Cum. % |

1 | 439 | 19.55% | 19.55% |

2 | 440 | 19.59% | 39.14% |

3 | 695 | 30.94% | 70.08% |

4 | 672 | 29.92% | 100% |

Total | 2246 | 100% |

tabulate x [aw=weight]

X | Freq | % | Cum. % |

1 | 55.4565667 | 2.47% | 2.47% |

2 | 1,690.2808 | 75.26% | 77.73% |

3 | 312.920003 | 13.93% | 91.66% |

4 | 187.342635 | 8.34% | 100% |

Total | 2,246 | 100% |

Any suggestion is greatly appreciated!

-----------------------------------------------------------

Plain Stata code:

sysuse nlsw88, clear

gen weight = (2.25/(_n^(2.5)))*20000

sum weight

set seed 123

gen random = runiform()

gen x=0

replace x=1 if random < .2

replace x=2 if inrange(random,.2,.4)

replace x=3 if inrange(random,.4,.7)

replace x=4 if random>.7

tab x

tab x [aw=weight]]]>

I am working on DHS child dataset and wanted to calculate the twinning rate (number of twins per 1000 births in a given country and survey year). I have a twin dummy variable called count (1=single birth, 2=twin birth, 3=triplet birth), country code (country_recode), id(hh_id), child date of birth (ch_dob). I would really appreciate it if you could let me know what code I should use to calculate the the number of twins in each 1000 births by country and survey year.

Thanks]]>

I have panel data on agricultural production in Kenya and would like to analyze it through the state contigent stochastic frontier. How can i run the model in stata? ]]>

//INPUTS

local Peer1 PEER1_PV_PS PEER1_PV_PE PEER1_PV_EVEBITDA PEER1_PV_EVEBIT

local ncol = 2

local nrow = 2

//LOOP

local col1: word `ncol' of `c(ALPHA)'

local ++ncol

foreach var in `Peer1'{

local name1 `var'

summarize `var', detail separator(0)

local col: word `ncol' of `c(ALPHA)'

putexcel `col'`nrow'=(`name1') `col1'`nrow'=rscalarnames `col'`nrow'=rscalars using"V:\SummaryStats.xlsx", modify

local ++ncol

}

Where each of PEER1_PV_PS PEER1_PV_PE PEER1_PV_EVEBITDA PEER1_PV_EVEBIT are series (of which there are at ~40 of), and the sections in green are those which I'm having issues with.

The current output looks like this: Array

The output I want looks like this: Array

]]>

I have been trying to figure out the best way to get descriptive statistics in an automatic way to LaTeX.

By reading through different posts i have found out that there are different possible solutions for that. I so far have tried my luck with outreg2 and have come so far:

Code:

outreg2 using "$workdir/descriptives.doc", replace sum(log) tex(frag) label keep(var1 var2 var3 var4 var5 var6)

This basically does the job for me. However, I would further like to change the standard column names (N mean sd min max) to (Observations, Mean, Standard Dev., Minimum, Maximum).

When using outreg 2 for regression results, I can do this individually for each column with the ctitle option.

If I try it for the descriptive statistics, where I only have 1 command for 5 columns, I can only change the ctitle of all 5 columns at once.

Code:

outreg2 using "$workdir/descriptives.doc", replace sum(log) tex(frag) label keep(var1 var2 var3 var4) ctitle("Observation", "Mean", "Standard Dev.", "Min", "Max")

Thank you very much in advance!]]>

The problem is that N_t, the number of bidders in auction t appears in one of the parameters I want to estimate via maximum likelihood:

\[ \rho=\gamma_1N_t+\gamma_0 \]

This means that N_t shows up in my ml program as both a dependent variable and an independent variable

Code:

program gammaweibull3 args lnf lambda rho theta quietly replace `lnf' = 3 * ln(`theta') + lngamma(1/`theta' + 3 ) - lngamma(1/`theta') + ln(`rho'*`lambda'*($ML_y1 /`lambda')^(`rho'-1)) + ln(`rho'*`lambda'*($ML_y2 /`lambda')^(`rho'-1)) + ln(`rho'*`lambda'*($ML_y3 /`lambda')^(`rho'-1)) + (1/`theta' + 3 )*ln(1+`theta'*(($ML_y1 /`lambda')^(`rho') + ($ML_y2 /`lambda')^(`rho') + ($ML_y3 /`lambda')^(`rho'))) end ml model lf gammaweibull3 (lambda: bid1 bid2 bid3 = nbidders ...) (rho:) (theta:) ml maximize

Code:

initial: log likelihood = 1796499.8 rescale: log likelihood = 1796499.8 rescale eq: log likelihood = 3.88e+298 could not calculate numerical derivatives -- discontinuous region with missing values encountered r(430);

1 1 43

1 1 23

1 1 55

1 1 33

1 1 23

1 1 56

0 1 34

0 2 54

0 2 40

Testing the layout of dataset]]>

Thank you.]]>