I am trying to calculate the standardized differences in complex weighted data. I was able to calculate using stddiff command for the plain data but it does not work with svy command. Please help me with this issue.

Thank you.]]>

I have q25_* which asks respondents to indicate which advice sources they use

There are 15 different response options and respondents may choose up to 15 answers.

The attached image represents how this data has been inputted to STATA

n = 100 for q25_O1 spread across codes 1 - 15

n = 76 for q25_O2 spread across codes 1 - 15 (76 out of the 100 respondents chose more than one answer)

etc...

I want to create a variable that sums each of the response codes for each of q25_* and this will generate a cumulative frequency, eg n = 100 + 76 + n3 + n4 etc... spread across response options 1 - 15

Is someone able to help with syntax to generate this variable?

]]>

I would ike to create a new variable whose value equals to the value of another variable in last year. How can I achieve this? For example, I have several companies in the dataset and each company has quarterly earnings data from year 2000-2019. I would like to make Earnings 2019,Q1 of company A= Earnings 2018,Q1 of company A, Earnings 2019,Q2 of company A= Earnings 2018,Q2 of company A and so on.

I am new here, sorry for my poor description. Thank you in advance.]]>

Code:

lpow_{t}=β_{1}+β_{2}lpro_{t}+β_{3}t+β_{4}t^{2}+u_{t Where: lpow=natural logarithm for mining power use lpro=natural logarithm for mining production t= time (in quarters)}

I have some text I to customly place in the graph, so I have this code:

Code:

graph twoway line y_var x_var, text(4 5 {bf:"Some interesting text"})

Code:

invalid point, "Some interesting text"

I have tried this:"{bf:"Some interesting text"}", but does not work. I was unable to find others with similar problems or think of other solutions. Anyone here know why this does not work?]]>

For better understanding and learning STATA, I would like to ask you the following question. I have a municipality-level panel dataset containing some information about the kindergarten services offered by each municipalities, like the following:

Code:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int code_province double code_municipality float year double(children nr_seats) float(pop0_2 pop_tot) 1 1008 2004 71 75 437 16752 1 1008 2005 73 75 416 16769 1 1008 2006 73 75 392 16726 1 1008 2007 73 75 391 16760 1 1008 2008 74 90 411 17029 1 1008 2009 87 90 415 16998 end

However, because many municipalities are too small to offer any kindergarten service and their demand for childcare service is basically served by the closest large municipality, I'm interested in aggregating those data at a higher geographical level, that in my case is the province level (code_province).

For this purpose, I've tried 2 different codes in STATA, each one giving me a different result:

First code:

Code:

. preserve . gen coverage=nr_seats/pop0_2 (480 missing values generated) . recode coverage(.=0) if nr_seats==0|pop0_2==0 (coverage: 480 changes made) . collapse coverage, by(code_province year) . sum coverage Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- coverage | 1,320 .0502159 .0455661 0 .2749822 . restore

Code:

. preserve . collapse nr_seats pop0_2, by(code_province year) . gen coverage=nr_seats/pop0_2 . sum coverage Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- coverage | 1,320 .0844444 .0592051 0 .2932529 . restore

as I understand it, however, the two codes should not give such a different results; indeed the first one is giving the average coverage rate by province, weighting all the municipalities belonging to the province the same, regardless of the population in each. The second one is calculating the average nr. of seats and the average nr. of kids aged 0-2 by province, again without any weighting, and then is calculating the ratio between the two.

Could some one please give me some insights on why the two codes give such a different results?

Thank you so much,

Chiara]]>

I am using Stata 16, on mac. I was wondering how would I write a program that selects a particular number of observations from a specified distribution such as uniform, poisson, beta? Also how would I write a nested loop if I wanted to draw a sample mean from the 3 distributions I mentioned previously with sample sizes of 3, 50 ,500 ,and 10000 and replicate 500 times and plot a histogram of the resulting distributions of the sample means?

Thank you in advance for your help,

Jason Browen]]>

I am using Stata 16, on mac. I was wondering if I wanted to randomly draw 500 observations from a distribution such as poisson, gamma, beta, or binomial and create histograms of each how would I do so?

Thank you in advance for your help

Jason Browen]]>

For the following repeated codes :

tab1 variable1 if (variableA == 1) & (variableB== 1), missing

tab1 variable1 if (variableA == 1) & (variableB== 1) & (variableC== 1), missing

tab1 variable1 if (variableA == 1) & (variableB== 1) & (variableD== “no reason”), missing

tab1 variable2 if (variableA == 1) & (variableB== 1), missing

tab1 variable2 if (variableA == 1) & (variableB== 1) & (variableC== 1), missing

tab1 variable2 if (variableA == 1) & (variableB== 1) & (variableD== “no reason”), missing

tab1 variable3 if (variableA == 1) & (variableB== 1), missing

tab1 variable3 if (variableA == 1) & (variableB== 1) & (variableC== 1), missing

tab1 variable3 if (variableA == 1) & (variableB== 1) & (variableD== “no reason”), missing

…….

…….

If I repeat doing the same above for variable1, variable2, variable3……variable6

Other than using macro below, is there any more efficient way? e.g. using loop?

Thank you for help.

local repeat variable1 variable2 variable3 variable4 variable5 variable6

tab1 `repeat’ if (variableA == 1) & (variableB== 1), missing

tab1 `repeat’ if (variableA == 1) & (variableB== 1) & (variableC== 1), missing

tab1 `repeat’ if (variableA == 1) & (variableB== 1) & (variableD== “no reason”), missing

]]>

companyid | year | employeeid | Female, older than 22 and have master degree | Female, older than 22 and have a master degree |

293475073 | 2005 | 10 | 1 | 1 |

293475073 | 2005 | 11 | 0 | 0 |

293475073 | 2005 | 5 | 1 | 1 |

982726482 | 2005 | 12 | 0 | 0 |

982726482 | 2005 | 11 | 0 | 0 |

982726482 | 2005 | 5 | 1 | 1 |

23452345234 | 2006 | 10 | 1 | 1 |

23452345234 | 2006 | 6 | 0 | 0 |

23452345234 | 2006 | 7.00E+00 | 1 | 0 |

18938422 | 2006 | 8.00E+00 | 1 | 0 |

18938422 | 2006 | 6 | 0 | 0 |

18938422 | 2006 | 1.50E+01 | 0 | 0 |

18938422 | 2006 | 1.60E+01 | 1 | 0 |

980232133 | 2007 | 1.70E+01 | 1 | 1 |

980232133 | 2007 | 1.80E+01 | 0 | 0 |

3456546 | 2007 | 1.90E+01 | 0 | 0 |

3456546 | 2007 | 1.70E+01 | 1 | 1 |

Code:

* Example generated by -dataex-. To install: ssc install dataex (y1 lnexport exporter iv) 1 0 0 2.70805 0 0 0 2.995732 0 0 0 1.7917595 0 .6931472 1 4.6728287 0 0 0 4.65396 0 0 0 2.0794415 0 0 0 3.4011974 0 .26236427 1 4.787492 1 0 0 4.339467 1 0 0 3.64632 0 0 0 1.609438

Code:

ivprobit y1 (lnexport=iv) lnfirm_age competition lnlargest_own training_l10 ln_managexp i.isic, vce(cluster ccode)

Code:

Probit model with endogenous regressors Number of obs = 4,696 Wald chi2(32) = 2.92e+07 Log pseudolikelihood = 91.552004 Prob > chi2 = 0.0000 (Std. Err. adjusted for 33 clusters in ccode) -------------------------------------------------------------------------------------- | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------------+---------------------------------------------------------------- lnexport | 2.567695 1.070549 2.40 0.016 .4694578 4.665933 -------------------------------------------------------------------------------------- Wald test of exogeneity (corr = 0): chi2(1) = 9.20 Prob > chi2 = 0.0024

Code:

biprobit (y1=exporter lnfirm_age competition lnlargest_own training_l10 ln_managexp i.isic) (exporter=iv lnfirm_age com > petition lnlargest_own training_l10 ln_managexp i.isic), vce(cluster ccode)

Code:

Seemingly unrelated bivariate probit Number of obs = 4,700 Wald chi2(30) = . Log pseudolikelihood = -4117.8559 Prob > chi2 = . (Std. Err. adjusted for 33 clusters in ccode) ------------------------------------------------------------------------------- | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------- y1 | exporter | .2636794 .3614948 0.73 0.466 -.4448375 .9721962 --------------+---------------------------------------------------------------- exporter | iv | .3972677 .051231 7.75 0.000 .2968567 .4976787 --------------+---------------------------------------------------------------- rho | -.0661718 .1771233 -.3926681 .2751372 ------------------------------------------------------------------------------- Wald test of rho=0: chi2(1) = .138756 Prob > chi2 = 0.7095 .

2) please confirm if biprobit command is correct and im correctly using "exporter" in first equation

3) using biprobit can i use continuous iv when both endogenous regressor and outcome are binary?

4) if i outcome, endogenous regressor and iv all are in binary form... is it ok to use biprobit regression in that case too?

Thanks in Advance and sorry for long post, many questions ... i hope experts will help me to clear my doubts ]]>

I'm looking for something equivalent to -suest- that allows for two-way clustering.

In particular, I want to test the equality of the two regression coefficient across the same model but different samples. That in itself is not a problem -- the problem is that I want to use two-way clustered standard errors, and I have not found a solution for this.

Suppose I have one outcome

Code:

reg y x if sample1 == 1 estimate store e1 reg y x if sample2 == 1 estimate store e2 suest e1 e2, r cluster(c1 c2) test [e1_mean]x = [e2_mean]x

Does anyone have any suggestions of how I might proceed? This feels like a straightforward problem but I cannot figure out a solution. Thanks in advance.

Andrew (using Stata 15 on a Macbook Pro OS 10.14.6 Mojave)]]>

likes comments date date1

1 0 27jul2009 14apr2009

0 0 19sep2009 15apr2009

1 0 21sep2009 16apr2009

1 3 21sep2009 17apr2009

2 8 21nov2009 18apr2009

1 0 24nov2009 19apr2009

2 0 25nov2009 20sep2009

1 0 21dec2009 21sep2009

3 5 22dec2009 22apr2009

1 0 30dec2009 23apr2009

1 0 30dec2009 24apr2009

1 0 14jan2010 25apr2009

1 0 14jan2010 26apr2009

I would like to sum the variables likes and comments, if the date in date1 appears in date.

for Excel that woud be =SUMIF (range, criteria, [sum_range])

For example: 21sep2009 is in date1 and appears in date. So now all likes and commentst are summed up so in sum i have 2 likes and 3 comments for that date. I wish to have a new variable showing "2" and another variable showing "3" next to the date 21sep2009 in date1.

I hope someone can halp me on that!!

]]>

Here is my output.

Code:

. logit homeown i.raceth##i.forborn $dems $ses yrsusa i.linguiso $other if both==1, nocons Iteration 0: log likelihood = -1276644.7 Iteration 1: log likelihood = -849822.09 Iteration 2: log likelihood = -846538.72 Iteration 3: log likelihood = -846524.62 Iteration 4: log likelihood = -846524.62 Logistic regression Number of obs = 1,841,809 Wald chi2(125) = 507020.07 Log likelihood = -846524.62 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------------------------------------------------ homeown | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------------------------------------------------+---------------------------------------------------------------- raceth | 4. NHC | .6385632 .027372 23.33 0.000 .5849151 .6922114 5. NHJ | .5653426 .0400141 14.13 0.000 .4869164 .6437688 6. NHF | -.098571 .0338973 -2.91 0.004 -.1650085 -.0321335 7. NHI | -.1735716 .0397069 -4.37 0.000 -.2513957 -.0957475 8. NHK | -.2241101 .0483639 -4.63 0.000 -.3189015 -.1293186 9. NHV | .348966 .0560098 6.23 0.000 .2391888 .4587433 | 1.forborn | -1.316316 .0124516 -105.71 0.000 -1.340721 -1.291912 | raceth#forborn | 4. NHC#1 | .3350237 .0314141 10.66 0.000 .2734531 .3965943 5. NHJ#1 | -1.139156 .0558045 -20.41 0.000 -1.248531 -1.029781 6. NHF#1 | .0531232 .0382891 1.39 0.165 -.021922 .1281684 7. NHI#1 | .0593582 .0423115 1.40 0.161 -.0235708 .1422871 8. NHK#1 | -.0756405 .0529992 -1.43 0.154 -.179517 .028236 9. NHV#1 | .4663223 .0603573 7.73 0.000 .3480241 .5846204 | age | .0706146 .0014749 47.88 0.000 .0677239 .0735054 age2 | -.0002098 .0000164 -12.82 0.000 -.0002419 -.0001778 1.female | .006187 .0039075 1.58 0.113 -.0014716 .0138457 | marst3 | previously married | -1.198443 .0050531 -237.17 0.000 -1.208347 -1.188539 never married | -1.298763 .0049671 -261.47 0.000 -1.308498 -1.289027 | educ5 | HS graduate | .340975 .0098948 34.46 0.000 .3215815 .3603685 Some college | .5482986 .0096026 57.10 0.000 .5294779 .5671193 Bachelor's degree | .9143729 .0098288 93.03 0.000 .8951088 .9336369 Grad+ | .9483004 .0102377 92.63 0.000 .9282348 .968366 | logfinc | .1916358 .0013543 141.50 0.000 .1889814 .1942903 yrsusa | .0399718 .000417 95.86 0.000 .0391545 .0407891 1.linguiso | -.4252366 .0133153 -31.94 0.000 -.451334 -.3991391 1.mover | -1.289979 .005414 -238.27 0.000 -1.300591 -1.279368 | met2013 | 10580. albany-schenectady-troy, ny | -3.652022 .0430616 -84.81 0.000 -3.736421 -3.567623 10740. albuquerque, nm | -3.687591 .0502616 -73.37 0.000 -3.786102 -3.58908 ... 49660. youngstown-warren-boardman, oh-pa | -3.260858 .0470553 -69.30 0.000 -3.353084 -3.168631 ------------------------------------------------------------------------------------------------------------------------ . margins raceth, at(forborn=(0 1)) Predictive margins Number of obs = 1,841,809 Model VCE : OIM Expression : Pr(homeown), predict() 1._at : forborn = 0 2._at : forborn = 1 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#raceth | 1#1. NHW | . (not estimable) 1#4. NHC | . (not estimable) 1#5. NHJ | . (not estimable) 1#6. NHF | . (not estimable) 1#7. NHI | . (not estimable) 1#8. NHK | . (not estimable) 1#9. NHV | . (not estimable) 2#1. NHW | . (not estimable) 2#4. NHC | . (not estimable) 2#5. NHJ | . (not estimable) 2#6. NHF | . (not estimable) 2#7. NHI | . (not estimable) 2#8. NHK | . (not estimable) 2#9. NHV | . (not estimable) ------------------------------------------------------------------------------

Code:

. tab raceth forborn if both==1,m | forborn raceth | 0 1 | Total ------------+----------------------+---------- 1. NHW | 1,564,121 116,852 | 1,680,973 4. NHC | 9,099 40,712 | 49,811 5. NHJ | 4,945 3,989 | 8,934 6. NHF | 4,850 21,479 | 26,329 7. NHI | 3,595 38,288 | 41,883 8. NHK | 2,442 13,908 | 16,350 9. NHV | 1,732 15,797 | 17,529 ------------+----------------------+---------- Total | 1,590,784 251,025 | 1,841,809

Thanks!

]]>

I am using odkmeta but while trying to run my odkmeta dofile, I am facing below issue (error message):

column header type not found

invalid type() suboption

invalid survey() option

r(111);

Would you help me?

Here are csv files, I am using:

Array

Array

Thanks]]>