Please can someone assist with the following; I'm very new to stata;

Var1

Bongo

Kassena

Bolgatanga

Tolon

Salvelgu

Yendi

Karaga

This variable(Var1) is a string variables.In one step how can i replace Bongo=1, Kassena=2 Bolgatanga=3 Tolon =4 etc

Thanks for your assistance

Daniel

]]>

The estimated coefficients for capital and labor of a Cobb-Douglas production function using levpet (LP) and xtreg, fe (FE) strongly differ with the estimated coefficients obtained by OLS, GEE, RE, and BE. Please, see results below.

I wonder if the discrepancy is because I have only two time periods, or because I have a reduced number of firms with observations in both periods, or because I am not properly using some command, among other reasons. I would highly appreciate your help as

Results:

:

---------------------------------------------------------------------------------------- (OLS) (GEE) (RE) (BE) (FE) (LP) va va va va va va ---------------------------------------------------------------------------------------- k 0.379*** 0.377*** 0.363*** 0.400*** 0.122** 0.0538 (0.0276) (0.0239) (0.0273) (0.0253) (0.0596) (0.107) lw 0.709*** 0.703*** 0.709*** 0.678*** 0.532** 0.542*** (0.0376) (0.0347) (0.0385) (0.0364) (0.228) (0.0459) _cons 5.915*** 5.944*** 6.063*** 5.774*** 9.280*** (0.243) (0.203) (0.239) (0.213) (0.996) ---------------------------------------------------------------------------------------- N 971 971 971 971 971 971 R-sq 0.671 0.666 0.163 adj. R-sq 0.670 0.665 0.161 rmse 1.100 0.610 1.102 0.175 ---------------------------------------------------------------------------------------- Standard errors in parentheses * p<0.10, ** p<0.05, *** p<0.01

:

xtset idpanel year *OLS regress va k lw, vce(cluster idpanel) *GEE xtgee va k lw *RE xtreg va k lw, re vce(cluster idpanel) *BE xtreg va k lw, be *FE xtreg va k lw, fe vce(cluster idpanel) *LP levpet va, free(lw) proxy(e) capital(k) valueadded reps(250)

k=ln(K): Capital

lw=ln(LW): Labor

va=ln(VA): Value added

VA=Y-M-E

M: Raw materials and intermediate goods

E: Electricity

Data characteristics:

My final goal is to determine Total Factor Productivity (TFP). After a data cleaning process, I have an unbalanced panel of 888 firms. Only 83 firms have observations in both time periods which give me a total number of observations of 971. All observations belong to the same 4-digit level International Standard Industrial Classification (ISIC) code.

:

xtdescribe idpanel: 101078, 101081, ..., 503957 n = 888 year: 1, 2, ..., 2 T = 2 Delta(year) = 1 unit Span(year) = 2 periods (idpanel*year uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 95% max 1 1 1 1 1 2 2 Freq. Percent Cum. | Pattern ---------------------------+--------- 479 53.94 53.94 | 1. 326 36.71 90.65 | .1 83 9.35 100.00 | 11 ---------------------------+--------- 888 100.00 | XX

I want to generate a new string variable from an old string variable that consists of everything except the last letter.

Consider the following example where I want to generate a new variable where the word is in singular instead of plural.

string1 | string2

balls | ball

tables | table

rings | ring

How do I go from string1 to string2?

I have tried substr(), but it haven't been able to make it work since it doesn't count backwards, or does it?

Shikasta

]]>

The estimated coefficients ...

----------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------

k 0.379*** 0.377*** 0.363*** 0.400***

(0.0276) (0.0239) (0.0273) (0.0253) (0.0596) (0.104)

----------------------------------------------------------------------------------------

Standard errors in parentheses

* p<0.10, ** p<0.05, *** p<0.01

Stata commands:

xtset idpanel year

*OLS

regress va3 k lw, vce(cluster idpanel)

]]>

svyset [iw=pwwgt0], sdrweight(pwwgt1-pwwgt160)

I would like to calculate the median value of certain variables, and confidence intervals around the median. I cannot figure out how to do this using the SVY commands.

I have searched the STATA listserve, and I did see a similar thread, and the SOMERSD and SCSOMERSD ado packages were suggested for a similar but not identical problem. (http://www.stata.com/statalist/archi.../msg00933.html) This package seems to permit the use of pweights and clustering when calculating confidence intervals around medians, but I donâ€™t think they permit the use of replicate weights.

Does anyone have any suggestions?

Thank you very much,

Rhiannon Patterson

US Government Accountability Office

701 Fifth Avenue, Suite 2700,

Seattle, WA 98104]]>

:

set seed 12345 gen contvar = rnormal(0,1)

:

. set seed 12345 . gen contvar=rnormal(0,1) . sum contvar Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- contvar | 1600 ,01877 1,024139 -3,645242 3,620961

:

forval i=1(11)33{ local j=`i'+1 local k=`i'+2 local l=`i'+3 local m=`i'+4 local n=`i'+5 local o=`i'+6 local p=`i'+7 local q=`i'+8 local r=`i'+9 local s=`i'+10 esttab est`i' est`j' est`k' est`l' est`m' est`n' est`o' est`p' est`q' est`r' est`s' using pooled`i'-`s'.csv }

:

esttab est1-est11

Thanks]]>

The options I have considered are:

1. using stdize command. However, the output doesn't look like it takes person years into account and that beats the purpose of survival analysis

:

. dstdize x pop ageband sex, by(smoking calendaryear) print -----------Standard Population----------- Stratum Pop. Dist. ---------------------------------------- 50 Female 8580 0.204 50 Male 7402 0.176 60 Female 8309 0.197 60 Male 7867 0.187 70 Female 3579 0.085 70 Male 3979 0.095 80 Female 1082 0.026 80 Male 969 0.023 90 Female 182 0.004 90 Male 142 0.003 ---------------------------------------- Total: 42091 ------------------------------------------------------------------- -> smoking calendaryear= 0 102 -----Unadjusted----- Std. Pop. Stratum Pop. Stratum Pop. Cases Dist. Rate[s] Dst[P] s*P ------------------------------------------------------------------- 50 Female 37 11 0.125 0.2973 0.204 0.0606 50 Male 40 9 0.135 0.2250 0.176 0.0396 60 Female 34 9 0.115 0.2647 0.197 0.0523 60 Male 38 9 0.128 0.2368 0.187 0.0443 70 Female 38 11 0.128 0.2895 0.085 0.0246 70 Male 28 11 0.095 0.3929 0.095 0.0371 80 Female 41 10 0.139 0.2439 0.026 0.0063 80 Male 27 3 0.091 0.1111 0.023 0.0026 90 Female 11 2 0.037 0.1818 0.004 0.0008 90 Male 2 0 0.007 0.0000 0.003 0.0000 ------------------------------------------------------------------- Totals: 296 75 Adjusted Cases: 79.3 Crude Rate: 0.2534 Adjusted Rate: 0.2681 95% Conf. Interval: [0.2099, 0.3262] ------------------------------------------------------------------- -> smoking calendaryear= 0 104 -----Unadjusted----- Std. Pop. Stratum Pop. Stratum Pop. Cases Dist. Rate[s] Dst[P] s*P ------------------------------------------------------------------- 50 Female 35 12 0.127 0.3429 0.204 0.0699 50 Male 36 11 0.130 0.3056 0.176 0.0537 60 Female 33 11 0.120 0.3333 0.197 0.0658 60 Male 44 15 0.159 0.3409 0.187 0.0637 70 Female 33 14 0.120 0.4242 0.085 0.0361 70 Male 33 18 0.120 0.5455 0.095 0.0516 80 Female 29 17 0.105 0.5862 0.026 0.0151 80 Male 20 9 0.072 0.4500 0.023 0.0104 90 Female 7 2 0.025 0.2857 0.004 0.0012 90 Male 6 1 0.022 0.1667 0.003 0.0006 ------------------------------------------------------------------- Totals: 276 110 Adjusted Cases: 101.6 Crude Rate: 0.3986 Adjusted Rate: 0.3680 95% Conf. Interval: [0.3053, 0.4307] ------------------------------------------------------------------- Summary of Study Populations: smoking calend~r N Crude Adj_Rate Confidence Interval -------------------------------------------------------------------------- 0 102 296 0.253378 0.268057 [ 0.209945, 0.326170] 0 104 276 0.398551 0.368006 [ 0.305284, 0.430729]

The other option that I can think of is to use poisson regression and then use 'predict ir' to calculate adjusted incidence rates. But not sure what the syntax will be or if that's the correct way to do it.

Any help or suggestions will be appreciated.

Thanks,

Nafeesa]]>

Please can someone help me with the stata codes that will concatenate the names provided below.

I have over 500 names and want to make use of the loops and egen commands.

My data is as shown below and my expected results is beneath the data

Thanks for your assistance

Data:

first_name_1 first_name_2 last_name_1 last_name_2

Daniel Joseph Kanyam Mensah

Michael John Kombat Smith

Expected Result:

fullname1 fullname2

Daniel Kanyam Joseph Mensah

Michael Kombat John Smith

]]>

i am currently analyzing factor, which could determine the capital structure of firms. I have a short unbalanced panel with data of firms. Aim of the research is to see which factors (e.g. profitability, tangibility, market-to-book, size, ...) have a significant influence on the leverage of a comapny. I have run a regression with 15 independent variables and would like to downsize the number of independent variables towards a core-model, which would have higher power and less random noise caused by redundant variables.

I would highly appreciate helping comments, on which method to apply and how to apply it! Many Thanks in advance.

This is the regression I ran:

. xtreg tda prof dtas chgas mtb capex tang rndex sgaex depr itc gdp age indlevtdm indgrw, f

> e vce (cluster id)

note: age omitted because of collinearity

Fixed-effects (within) regression Number of obs = 7755

Group variable: id Number of groups = 965

R-sq: within = 0.9190 Obs per group: min = 1

between = 0.9873 avg = 8.0

overall = 0.9645 max = 13

F(13,964) = 545.83

corr(u_i, Xb) = 0.3903 Prob > F = 0.0000

(Std. Err. adjusted for 965 clusters in id)

------------------------------------------------------------------------------

| Robust

tda | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

prof | -.0397562 .0225531 -1.76 0.078 -.0840151 .0045027

dtas | -.0008746 .00088 -0.99 0.321 -.0026016 .0008524

chgas | .0004872 .0002263 2.15 0.032 .0000431 .0009312

mtb | .8993015 .057161 15.73 0.000 .7871272 1.011476

capex | .0040609 .0062637 0.65 0.517 -.008231 .0163529

tang | .0031389 .0053662 0.58 0.559 -.0073919 .0136698

rndex | -.0002065 .0014029 -0.15 0.883 -.0029595 .0025466

sgaex | .0001069 .0011072 0.10 0.923 -.002066 .0022797

depr | .0281598 .0481765 0.58 0.559 -.0663831 .1227028

itc | .8408791 .0883071 9.52 0.000 .6675828 1.014175

gdp | -.0001767 .0001617 -1.09 0.275 -.0004941 .0001407

age | (omitted)

indlevtdm | .0106162 .0055294 1.92 0.055 -.0002349 .0214673

indgrw | -.0003558 .0002821 -1.26 0.207 -.0009093 .0001977

_cons | .0246801 .0172344 1.43 0.152 -.0091412 .0585014

-------------+----------------------------------------------------------------

sigma_u | .02603672

sigma_e | .02679437

rho | .485662 (fraction of variance due to u_i)

------------------------------------------------------------------------------

.

Best Regards,

Kaspar]]>

Any help will be greatly appreciated!

]]>

yrm: date variable, e.g. 1998m1,1998m2...

exchcd: exchange code, = 1 if the stock is listed on NYSE (New York Stock Exchange)

My codes are as follows:

forvalues i = 20(20)80{

qui bysort yrm: egen ME_Brkp`i' = pctile(ME) if exchcd == 1, p(`i')

}

// so obviously, for many non-NYSE listed stocks, has the missing values for all ME breakpoints

//need to assgin these NYSE based breakpoints to those non-NYSE ones too

forvalues i = 20(20)80{

qui by yrm (permno), sort: replace ME_Brkp`i' = ME_Brkp`i'[_n-1] if ME_Brkp`i' >= .

}

if ME_Brkp20 !=. & ME_Brkp40 !=. & ME_Brkp60 !=. & ME_Brkp80 !=. {

qui {

g ME_rank = 1 if ME <= ME_Brkp20 // Smallest Stocks

replace ME_rank = 2 if ME > ME_Brkp20 & ME <= ME_Brkp40

replace ME_rank = 3 if ME > ME_Brkp40 & ME <= ME_Brkp60

replace ME_rank = 4 if ME > ME_Brkp60 & ME <= ME_Brkp80

replace ME_rank = 5 if ME > ME_Brkp80 & ME!=. //Big stocks

}

}

forval r = 1(1)5{

forvalues i = 20(20)80 {

qui bysort yrm ME_rank: egen BM_p`i'_`r' = pctile(cond(ME_rank==`r',BM,.)), p(`i')

}

}

g BM_rank = .

forval r=1(1)5{

qui{

replace BM_rank = 1 if BM<=BM_p20_`r' & ME_rank== `r'

replace BM_rank = 2 if BM>BM_p20_`r' & BM<=BM_p40_`r' & ME_rank== `r'

replace BM_rank = 3 if BM>BM_p40_`r' & BM<=BM_p60_`r' & ME_rank== `r'

replace BM_rank = 4 if BM>BM_p60_`r' & BM<=BM_p80_`r' & ME_rank== `r'

replace BM_rank = 5 if BM>BM_p80_`r' & ME_rank== `r' & BM!=.

}

}

Step 3: each of the resulting 25 fractile portfolios are further subdivided into quintiles based on the 12-month past returns of stocks

forval mer = 1(1)5{

forval bmr = 1(1)5{

forvalues i = 20(20)80 {

qui bysort yrm ME_rank BM_rank: egen Mom_p`i'_`mer'_`bmr' = pctile(cond(ME_rank==`mer',BM_rank==`bmr',Mom,.)), p(`i')

}

}

}

g Mom_rank =.

forval mer=1(1)5{

forval bmr=1(1)5{

qui{

replace Mom_rank = 1 if Mom<=Mom_p20_`mer'_`bmr' & ME_rank== `mer' & BM_rank== `bmr'

replace Mom_rank = 2 if Mom>Mom_p20_`mer'_`bmr' & Mom<=Mom_p40_`mer'_`bmr' & ME_rank== `mer' & BM_rank== `bmr'

replace Mom_rank = 3 if Mom>Mom_p40_`mer'_`bmr' & Mom<=Mom_p60_`mer'_`bmr' & ME_rank== `mer' & BM_rank== `bmr'

replace Mom_rank = 4 if Mom>Mom_p60_`mer'_`bmr' & Mom<=Mom_p80_`mer'_`bmr' & ME_rank== `mer' & BM_rank== `bmr'

replace Mom_rank = 5 if Mom>Mom_p80_`mer'_`bmr' & ME_rank== `mer' & BM_rank== `bmr' & Mom!=.

}

}

}

Can you identify if there is something wrong with the above codes?

Since when I tab the resulting variables:

tab ME_rank

tab BM_rank

tab Mom_rank

I got very confusing and suspicious results for the rank based on Mom (Momentum, the past 12 month return):

tab Mom_rank

Mom_rank | Freq. Percent Cum.

------------+-----------------------------------

1 | 1,912,688 93.12 93.12

5 | 141,391 6.88 100.00

------------+-----------------------------------

Total | 2,054,079 100.00

some observations should have Mom_rank 2,3, or 4 values. Why does Mom_rank only take two values: 1 and 5? And over 90% of them hold value 1?

]]>

This is my first post and I am quite new in the world of Stata.

I have looked in other post and have not found anything.

Here is what I want to do.

I have a series of world GDP 1948-2013

I have another series that is a weighted sum of GDP of countries in the world. I only have weights from 1990-2013 and hence the series is this long.

I will like to expand this second series back until 1948. Any ideas?

I have thought that i can use the World Serie as a tendency predictor for the other Serie.

However ignore haw could I do it.

Can someone help me.

Thank you.]]>

I have a question, which I guess is basic, but I cant get the answer Im looking for in the internet and forums. Probably it will not be a yes or no answer and thats why. It would be great if you could help:

In my backward regression model I have different categorical variables. How to put the categorical variable in the model is not the problem. But: if in the backward regression one of the subgroups of the categorical variable is the one with the highest p-values and would be the one to fall out of the model next but for example the other subgroup has boarderline significance (0.07), do you leave the variable in and remove the variable with the second highest p-value from the model or do you remove the categorical variable?

Sorry if its a too basic question.

Thanks for your help.

Kind regards,

Isabel]]>