Any suggestion?

Thanks

Franco

]]>

my name is Marc and I am currently working on a panel data model with stata. I think I got rather simple questions

I am using panel data with approx. 1500-2500 observations (depending on which independent variable I use) and approx. 220 entities (different banks). The dependent variable is metric, the independent and the eight control variables as well. For example variables show financial ratios or total assets of the entities. For preparing the sample I eliminated outliers by winsorising (1% & 99%) the respective variables and transformed four of the control variables (with log, square and 1/cubic) -

Next, I am setting up a random effects model and running the Breusch-Pagan test for random effects (xttest0). Results show a p-value of 1 implying that there are no random effects as u is zero. After, I am also conducting the Hausman test to investigate whether a fixed effects model would be preferred over the random effects model. The p-value of zero shows that I should use FE over RE. When running the FE model the specified F-test shows a value of zero implying that FE can be used (?).

After, I am testing the FE model for autocorrelation (no AC) and Heteroskedasticity (yes). Because of the heteroskedasticity I am changing the FE model to ...,fe vce(cluster variable). Also, I am testing for normally distributed error term (according to swilk, sfrancia non-normally distributed but kdensity, pnorm and qnorm are actually not that bad), multicollinearity (no), linearity (yes), zero population mean (yes) and for model misspecification rebuilding the linktest with the FE model predictions and its squares (no omitted variables).

I hope you can follow my approach - happy to provide more information if anything is too vague. In case you find any steps (sample preparation, conducted tests, conclusions) to be questionable or wrong please let me know. Isn't it unusual to use an OLS model for panel data? I am kind of reluctant to apply OLS as I would like to use FE because I think it fits the research question more and the data should actually include fixed effects.

Many thanks & best regards

Marc]]>

Code:

mixed depvar c.agesp*##i.gender cov1 cov2 cov3.......... || ID: age, mle cov(un) residuals(exp, t(time))

and the fitted vs observed plot looked like:

Array

I feel I have tried almost everything to reduce heteroscedasticity, but nothing seems to help much. However, when including a frequency weight in the model:

Code:

mixed depvar c.agesp*##i.gender cov1 cov2 cov3.......... [fw=time] || ID: age, mle cov(un)

Array

and the fitted vs observed plot looked like:

Array

which seems to indicate a much bettwe fit than the fitted vs observed plot above.

However, I have never used frequency weights before, and I am not sure whether it is appropriate to use it here (I used it since the earlier time points contain much more observations than the later ones).

Any comments would be greatly appreciated!

Kjell Weyde

]]>

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int id float tm long x1 1101 564 . 1101 565 . 1101 566 . 1101 567 . 1101 568 . 1101 569 11922113 1101 570 . 1101 571 . 1101 572 . 1101 573 . 1101 574 . 1101 575 9817626 1101 576 . 1101 577 . 1101 578 12423117 1101 579 . 1101 580 . 1101 581 11760556 1101 582 . 1101 583 . 1101 584 10615331 1101 585 . 1101 586 . 1101 587 13033535 1102 564 . 1102 565 . 1102 566 . 1102 567 . 1102 568 . 1102 569 6037845 1102 570 . 1102 571 . 1102 572 . 1102 573 . 1102 574 . 1102 575 6207117 1102 576 . 1102 577 . 1102 578 12815413 1102 579 . 1102 580 . 1102 581 20180703 1102 582 . 1102 583 . 1102 584 17554606 1102 585 . 1102 586 . 1102 587 16931368 end format %tm tm

I'd like to construct monthly data so that in the same half-year (2007 and before) or the same quarter, the monthly observations are the same as the end observations of the half-year or quarterly observations. Any suggestions? ]]>

this is my first stage regression for second stage I want to use another variable shtd as my dependent variable and use predicted value of bdr from first regression as independent variable. so second model will be like.

bysort Country: regress shtd fam MTB

Can anyone guide.]]>

Variable // Missing // Total // Percent Missing

----------------+-----------------------------------------------

colonial // 838 // 518,480 // 0.16

rivalry_code // 113,166 // 518,480 // 21.83

alliance // 38,283 // 518,480 // 7.38

no. of borders // 718 // 518,480 // 0.14

conflict // 3,220 // 518,480 // 0.62

signatory // 1,940 // 518,480 // 0.37

election // 89,573 // 518,480 // 17.28

polity // 93,543 // 518,480 // 18.04

GDP // 42,506 // 518,480 // 8.20

dyad_conflict // 696 // 518,480 // 0.13

----------------+-----------------------------------------------

The unit of analysis is directed-dyad year (Country A- Country B Year, Country B-Country A Year).

This includes values for dyads for all countries and years between 2000-2013.

However, some of the independent variables cut off in 2009 (exchange), 2010 (rivalry). Contiguity aka"borders the country in the dyad" ends in 2006, and thus are missing (However, in this time period most of the borders have not changed with the exception of a few nations, so there maybe a way around it).

The diplomacy and religion variables only have values for every half decade, and I'm not sure if interpolating will work, except maybe for the religion variable (diplomacy is a dummy if a diplomat from a given country visited a country, and religion includes percentages, but the dataset also has population numbers). There are not too many variables that have values until 2013 except for the DV.

As if that were not bad enough, the asylum rate is the DV.

The DV, regards granting asylum to migrants from a sending state. The asylum is granted/denied in the host state. This explains the large percentage of missing values, as you do not have individuals from every single country applying for asylum at the first instance in a given host state in a given year. However, there is still an astronomically large amount of missing values. Because it is a rate, I cannot fill in the missing values with zeros (something that other researchers have done with other migration variables when the unit of analysis is directed dyad year).

I have seen individuals use asylum data, but only in regions, or within a few cases. Even more unsettling, it is for my thesis. Do I totally have to scrap this dependent variable? I could *maybe* fill in the gaps if I add asylum applications that were appealed, but I would much rather keep it "clean" and only input first instances applications. However, right now it looks like it has to be done away with...]]>

Using Stata 13, I try to overlay/ combine a scatter graph over a boxplot graph using the command line:

graph box

Receiving the error message r(198) stating:

'graph is not a twoway plot type' or 'option | not allowed' depending which graph is first in the command line.

Is there anyway to combine the two graphs?

Thank you

Ruth

]]>

How many replications are needed for estimates of standard error (Bootstrap and Jackkinefe)?

I've just identified an indication - http://www.stata-journal.com/sjpdf.h...iclenum=st0073

Tks]]>

I am trying to do sequence analysis using SQ-Ados. I am using Stata 14.

When I run sqom, I get this error message:

<istmt>: 3499 sqomref() not found

I've read past threads regarding this issue but they were from 5+ years ago. Any advice on what to do?

thanks,

Angela]]>

I am currently working with survey data gathered from a cluster sampling frame of primary schools. At each school, the 4th graders had to sit for an examination and answer one survey.

Due to the complex sample design applied in this survey, standard methods to estimate standard errors cannot be used because they would overestimate sampling variance considerably. Then I need to use replication methods to overcome this problem. As far as I understand, the replication weights were created using the Jackknife Repeated Replication method - 2-PSU per stratum (JK2); however,

jkzone: is the variable that captures the assignment of schools or students to variance zones

jkrep: is the variable that captures whether the case is to be dropped or have its weight doubled for each set of replicate weights

To built the replication weights they have paired 2 PSUs per jackknife zone (jkzone), not per stratum and the sampling zones were constructed within explicit strata (idstrate).

The PSU in the sample is schools, and school (schwgt) and students (totwgt) weights are included. The database also includes stratification variables idstrate and idstrati, explicit and implicit respectively. But, according to the company that collected the data, they should only be used if you decide to do subgroup comparisons. Otherwise, there is no need using them.

I have used the command

survwgt create jk2, strata(jkzone) psu(idschool) weight( totwgt ) stem(jkn_)

stratum with more than 2 PSUs detected

fpc must be >= 0

stratum with more than 2 PSUs detected

fpc must be >= 0

survwgt create jkn, strata(jkzone) psu(idschool) weight( totwgt ) stem(jkn_)

stratum with only one PSU detected

fpc must be >= 0

stratum with only one PSU detected

fpc must be >= 0

survey designs that exactly match the specifications for the type of weights requested (two PSUs per stratum for BRR, etc.) Any collapsing of strata or PSUs, splitting of certainty PSUs, or other adjustments to approximate the appropriate design must be done outside of the program.

There is no methodological note available for the database but according to the company that collected the data, the replication weights for this specific sampling design can only be created using SPSS. I do not know how to use SPSS and I do not have an SPSS license, That's why I would like to use Stata, but I'm not really sure that I can actually use it to create the replication weights. So, (i) can anyone please tell me if it is possible to use Stata to calculate the replication weights under this sample design and (ii) if it is possible, can you explain why I'm obtaining the error messages mentioned?

Thank you very much in advance for your help. I look forward to your reply.

Tatiana Zarate

]]>

I have a report drafted in word, and I would now like to incorporate graphs from Stata into this document. I have been using rtfutil to export my graphs, but is there a way to specify where in the word document I would like to place the images? I would do it by hand, but I have to create about 50 reports using the template and the report itself is pretty long.

Is there a way to insert some HTML code either into the word document or into my stata code to direct the position?

Thank you for your help!

Sincerely,

Katharine ]]>

I am a new user and have a question on the regression table for two-parts model.

I use the tpm command from this article: https://www.econ.uzh.ch/dam/jcr:0000...023/sj15-1.pdf

Also I use the estout from ssc to print out the tables.

tpm will generate one long table with two regressions: one with logit/probit, the other with ols or glm for the truncated data.

I simply want to print out a table of the marginal effects and the standard error of the overall model. The marginal effects combine the values from two regression and should result in one table.

For some reason, I am only able to get two tables printed out for the marginal effects.

For an example, here is a short replication.

clear

set more off

eststo clear

set obs 100000

gen int y = rpoisson(2)

gen double x = rnormal(1,1)

gen double z = runiform()

gen double w = rbeta(1,3)

eststo: quietly twopm y x, firstpart(logit) secondpart(regress)

estadd margins, dydx(*)

eststo: quietly twopm y x z w, firstpart(logit) secondpart(regress)

estadd margins, dydx(*)

esttab using asdf.tex, replace drop(_cons) ///

cells("margins_b(fmt(3) star)" "margins_se(fmt(3) par)") ///

eqlabels(none) collabels(none) nomtitles

]]>