HELP! Put Excel - GLM, Sort, By

Melissa Bujtor

Join Date: Jul 2017

Posts: 29
#1

HELP! Put Excel - GLM, Sort, By

14 Jul 2017, 15:23

Hi All,

Have been using STATA for only 6 months now, first time posting.

Am completing a piece of analysis work that involves using GLM.

My original code is:

putexcel set "directory", sheet("name")modify
putexcel A1=("Variable") B1 = ("b") C1=("ll") D1=("ul") E1=("P")
loc row = 2

foreach x of varlist [indep var] {
glm [dep var] `x',family(gaussian) link(identity) vce(robust)
putexcel A`row' = ("`x'") B`row' = (_b[`x']) C`row' = (_b[`x']-1.96*_se[`x']) D`row' = (_b[`x']+1.96*_se[`x']) E`row' = (2*ttail(e(df), abs(_b[`x']/_se[`x']))) F`row' = matrix(e(N))
loc row = `row' + 1
}

I now want to run the analysis, sorting first my [indep var] by ethnicity. I have created a new variable "ChineseOnly" and coded it as Chinese 0 and Others 1. My new code is:

Sort Chinese Only
putexcel set "directory", sheet("name")modify
putexcel A1=("Variable") B1 = ("b") C1=("ll") D1=("ul") E1=("P")
loc row = 2

foreach x of varlist [indep var] {
by ChineseOnly: glm [dep var] `x',family(gaussian) link(identity) vce(robust)
putexcel A`row' = ("`x'") B`row' = (_b[`x']) C`row' = (_b[`x']-1.96*_se[`x']) D`row' = (_b[`x']+1.96*_se[`x']) E`row' = (2*ttail(e(df), abs(_b[`x']/_se[`x']))) F`row' = matrix(e(N))
loc row = `row' + 1
}

The analysis is running fine, however the put excel is not. STATA is only exporting the last set of GLM results. So for example if I have one indep variable, there will be two sets of GLM results 1.indep variable + Chinese (coded as 0) and 2. indep variable + Others (coded as 1), however STATA is only exporting for the last set of GLM results produced in this case being 2.indep variable + Others.

I can not work out how to alter the put excel so that it exports all sets of results for the macro.

Any advice would be greatly appreciated as I need to run the analysis on a large list of dep variables.

thanks!

Mel
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

14 Jul 2017, 15:41

Welcome to Statalist, Melissa.

Something like the following may be what you want. Or perhaps you want the ChineseOnly loop to be the outer loop.

Code:

putexcel set "directory", sheet("name")modify
putexcel A1=("Variable") B1 = ("b") C1=("ll") D1=("ul") E1=("P")
loc row = 2
foreach x of varlist [indep var] {
  forvalues co = 0/1 {
    glm [dep var] `x' if ChineseOnly==`co', family(gaussian) link(identity) vce(robust)
    putexcel A`row' = ("`x'") B`row' = (_b[`x']) C`row' = (_b[`x']-1.96*_se[`x']) D`row' = (_b[`x']+1.96*_se[`x']) E`row' = (2*ttail(e(df), abs(_b[`x']/_se[`x']))) F`row' = matrix(e(N))
    loc row = `row' + 1
}
}

Comment

Melissa Bujtor

Join Date: Jul 2017

Posts: 29
#3

14 Jul 2017, 16:10

William, thank you for coming back to me so promptly and for the warm welcome! I think this is what I am trying to achieve, so I could repeat the code with 0/1 and then 1/1 and the results should separately export to excel for the two groups, am I understanding correctly?

With regards to:
forvalues co = 0/1 {
Is this a new variable I should define / create? As when I prepare the code and run it I am receiving error message "no variables defined".

Here is a direct copy and paste of the code I have used and received this error for:

foreach x of varlist [indep var] {
forvalues co = 0/1 {
glm bw_for_ga_zscore_gusto `x' if ChineseOnly=='co', family(gaussian) link(identity) vce(robust)
putexcel A`row' = ("`x'") B`row' = (_b[`x']) C`row' = (_b[`x']-1.96*_se[`x']) D`row' = (_b[`x']+1.96*_se[`x']) E`row' = (2*ttail(e(df), abs(_b[`x']/_se[`x']))) F`row' = matrix(e(N))
loc row = `row' + 1
}
}

(*nite - I have obvs included the actual indep variable when I run it.)

Many thanks
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

14 Jul 2017, 17:09

The error message you are receiving suggests that you did not read your data into memory in before running the code.

As you know, your foreach command goes through the loop with the local macro x set to the successive independent variables in your list of independent variables. The foreach command goest through the loop once with the local macro co set to 0 and once with it set to 1. So your glm command will get run the first time using the data with ChineseOnly==0 and the second time using the data ChineseOnly==1. You can look at help forvalues for a more complete explanation of the command.

These are fairly basic topics and I didn't explain them because I thought, given the code you shared, that you had this basic knowledge of local macros, loops, etc. Perhaps not, though, so let me advise. I'm sympathetic to you as a new user of Stata - it's a lot to absorb. And even worse if perhaps you are under pressure to produce some output quickly. Nevertheless, I'd like to encourage you to take a step back from your immediate tasks. When I began using Stata in a serious way, I started as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through Stata's Help menu. The objective in doing this was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and manual.

Stata supplies exceptionally good documentation that amply repays the time spent studying it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

I would also suggest that you review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. The more you help others understand your problem, the more likely others are to be able to help you solve your problem.
Comment
Melissa Bujtor

Join Date: Jul 2017

Posts: 29
#5

16 Jul 2017, 15:32

William, again, thanks for coming back to me.

I have the reference material you are referring to, and yes, time and deadlines are an issue! Always...

I understand what the error message is saying also what the nested loop achieves, what I am having trouble with is setting up my data into groups, such that:

forvalues co = 0/1 or forvalues co = 1/1

...is usable, or as you are terming is 'read into memory'. The co you refer in your code, are you suggesting I should create a local variable, termed co, using my categorical data from my variable ChineseOnly, such that all Chinese will be 0/1 and all others will be 1/1. If so how best to do this?

I have tried creating a new variables and also a temp variable but I can still not get the macro to run.

At a loss.

Thanks
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

16 Jul 2017, 16:33

OK, I have looked more closely at your code in post #3 and found at least one error in it, although the error does not yield the error message you described.

First, you do not have to "set your data into groups" to use my code: it was designed to run with the data as you described it:

I have created a new variable "ChineseOnly" and coded it as Chinese 0 and Others 1.

Let me next make your code from post #5 readable by posting it into a CODE block as requested in the Statalist FAQ linked to from the top of the page, especially sections 9-12 on how to best pose your question.

Code:

foreach x of varlist [indep var] { forvalues co = 0/1 { glm bw_for_ga_zscore_gusto `x' if ChineseOnly=='co', family(gaussian) link(identity) vce(robust) putexcel A`row' = ("`x'") B`row' = (_b[`x']) C`row' = (_b[`x']-1.96*_se[`x']) D`row' = (_b[`x']+1.96*_se[`x']) E`row' = (2*ttail(e(df), abs(_b[`x']/_se[`x']))) F`row' = matrix(e(N)) loc row = `row' + 1 } }

About this code you wrote

Here is a direct copy and paste of the code

but that cannot be the case because as you previously wrote

I have obvs included the actual indep variable when I run it

but you do not show that here. So I am a little unsure of just what it is you ran. Nevertheless, in what you posted, there is a problem highlighted in red. The co in my code is a local macro created by the forvalues command in the code I added, exactly as the local macro x in your original code is created by the foreach command in your original code. As a local macro, co should be referred to in the same fashion as your local macros x and row. Where you have

Code:

'co'

you should have

Code:

`co'

as I specified in my code in post #2.
Comment
Melissa Bujtor

Join Date: Jul 2017

Posts: 29
#7

16 Jul 2017, 17:49

Yes! Thank you...the Code: `co' was my whole issue. Really appreciate the assistance and thank you for pointing out my errors explicitly.

Also, noted for future reference regarding the CODE block to make my posts more usable.

Have a great day / evening ahead William.

Kind regards
Mel
Comment
Melissa Bujtor

Join Date: Jul 2017

Posts: 29
#8

27 Jul 2017, 14:56

An additional question in relation to this topic. I now need to export to excel however my 'forvalues' includes three categories, where by the aforementioned only included two.

I am wanting to run a table that looks at various outcomes (weight, height etc), for the different categories of my diplotype (taster, non taster, heterzygous), and show the output by ethnicity. Doing so using ANOVA analysis. A sample of my data is as follows:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(Infant_diplo wk3_zwei) long MothersEthnicity_Nu
0 -6.01 2
0 -.88 1
0 -1.28 2
0 -.38 2
0 . 2
end
label values Infant_diplo Infant_diplo
label def Infant_diplo 0 "Non-tasters", modify
label values MothersEthnicity_Nu MothersEthnicity_Nu
label def MothersEthnicity_Nu 1 "Malay", modify
label def MothersEthnicity_Nu 2 "Indian", modify

I have prepared the syntax as follows

Code:

sort Infant_diplo putexcel set "FileName", sheet ("ANOVA")modify putexcel A1=("Variable") B1 = ("Mean") C1=("SD") loc row = 2 foreach x of varlist wk3_zwei { by Infant_diplo: summarize `x' if MothersEthnicity_Nu ==0 putexcel A`row' = ("`x'") B`row' = (r(mean)) C`row' = (r(sd)) loc row = `row' + 1 } sort Infant_diplo putexcel D1 = ("Mean") E1=("SD") loc row = 2 foreach x of varlist wk3_zwei { by Infant_diplo: summarize `x' if MothersEthnicity_Nu ==1 putexcel D`row' = (r(mean)) E`row' = (r(sd)) loc row = `row' + 1 } sort Infant_diplo putexcel F1 = ("Mean") G1=("SD") loc row = 2 foreach x of varlist wk3_zwei { by Infant_diplo: summarize `x' if MothersEthnicity_Nu ==2 putexcel F`row' = (r(mean)) G`row' = (r(sd)) loc row = `row' + 1 } sort Infant_diplo putexcel H1=("F") I1=("DF1") J1=("DF2") loc row = 2 foreach x of varlist wk3_zwei { regress `x' i.MothersEthnicity_Nu putexcel H`row' = (e(F)) I`row' = (e(df_m)) J`row' = (e(df_r)) loc row = `row' + 1 }

The variable, Mother's ethnicity, has 3 categories (Chinese, Malay, Indian). Once again when I export to excel only the final set of results is being output. However the previous solution, using:

forvalues co = 0/1{ etc.

no longer works as I have 3 different groups within the variable Mothers Ethnicity.

I believe I should create a nested loop again, however am having trouble working out how to do so. Any advice on how to get results to export, by each ethnicity, for each diplotype group would be appreciated, thank you.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#9

27 Jul 2017, 16:19

I do not see the problem. These are very simple changes to the code you have used previously.

Since MothersEthnicity_Nu takes the values 0, 1, or 2, you change

Code:

forvalues co = 0/1 {

to

Code:

forvalues co = 0/2 {

If you fail to see why that is the case, help forvalues explains the syntax of the command.

Having done that, you add to the regress command an if clause similar to the one on your earlier glm command

Code:

if MothersEthnicity_Nu==`co'

And more generally, the advice I gave in post #4 still holds; the trip will be easier if you read the map before you set off on the trail.
1 like
Comment
Melissa Bujtor

Join Date: Jul 2017

Posts: 29
#10

28 Jul 2017, 13:34

Thanks again for coming back to me William. I have re-read what you have suggested etc, I do read up before jumping online, but I unfortunately havent achieved the solution, apologies if you find that a little frustrating!.

In my previous code, I had only been sorting by 1 variable. Looking at the outcome variable by ethnicity:

Code:

foreach x of varlist Maternal_diplo{ forvalues co = 0/1 { regress DP3_log `x' if MalayOnly==`co', family(gaussian) link(identity) vce(robust) putexcel A`row' = ("`x'") B`row' = (_b[`x']) C`row' = (_b[`x']-1.96*_se[`x']) D`row' = (_b[`x']+1.96*_se[`x']) E`row' = (2*ttail(e(df), abs(_b[`x']/_se[`x']))) F`row' = matrix(e(N)) loc row = `row' + 1 } }

I understand what you are saying by 0/1 vs 0/2, for the three ethnicities, however the output I am seeking is the dependent variable, by taster status, by ethnicity. ie: The code, when not in a macro looks like

Code:

sort by Infant_diplo by Infant_diplo: oneway wk3weight MothersEthnicity, tabulate

The macro I have written is

Code:

sort Infant_diplo putexcel D1 = ("Mean") E1=("SD") loc row = 2 foreach x of varlist wk3_zwei { by Infant_diplo: summarize `x' if MothersEthnicity_Nu ==1 putexcel D`row' = (r(mean)) E`row' = (r(sd)) loc row = `row' + 1 }

Provides me the correct output, but I am struggling to export it to excel. If I were to use the same code we spoke on last time

Code:

putexcel A1=("Variable") B1 = ("Mean") C1=("SD") loc row = 2 foreach x of varlist wk3_zwei { forvalues co = 0/2 { summarize `x' if MothersEthnicity_Nu ==0 putexcel A`row' = ("`x'") B`row' = (r(mean)) C`row' = (r(sd)) loc row = `row' + 1 } }

How do I get the forvalues line to recognise that I want to sort it by firstly Mother's Ethnicity, displaying mean and standard deviation by Mothers Ethnicity within the group of tasters, then within the group of heterozygous, then non-tasters (which is the three categories of the variable Infant_diplo). Thanks Mel
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#11

28 Jul 2017, 15:54

I see now that by writing

no longer works as I have 3 different groups within the variable Mothers Ethnicity

you mean to communicate that for any given value of MothersEthnicity, you have three subgroups from the three categories of Infant_diplo, rather than just that there are 3 values rather than 2 for MothersEthnicity.

Perhaps something like this will accomplish what you need.

Code:

putexcel A1=("Variable") B1 = ("Mean") C1=("SD") local row = 2 foreach x of varlist wk3_zwei { forvalues id = 0/2 { // assuming Infant_diplo takes the values 0, 1, and 2 forvalues me = 0/2 { // assuming MothersEthnicity takes the values 0, 1, and 2 summarize `x' if MothersEthnicity==`me' & Infant_diplo==`id' putexcel A`row' = ("`x'") B`row' = (r(mean)) C`row' = (r(sd)) loc row = `row' + 1 } } }
Comment
Melissa Bujtor

Join Date: Jul 2017

Posts: 29
#12

28 Jul 2017, 18:12

This, and you, are perfect! Thank you William, and apologies for the confusion in the way that I asked the question. Appreciate the help.

Kind regards
Mel
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#13

28 Jul 2017, 18:22

I'm glad to help. On reviewing what I posted in #11, I see that in my eagerness to respond I failed to acknowledge that my misunderstanding was in large measure my responsibility for having not read what you wrote as carefully as i could have.

The lesson from post #11 is how to nest loops, and that the innermost is the one that changes the most rapidly, so you get three values of MothersEthnicity for the first value of Infant_diplo and first dependent variable, then three value of MothersEthnicity for the second value of Infant_diplo and first dependent variable, and so on. With that in mind I hope I've helped enable you to better understand nested loops, which look like they're going to be important for you.
Comment
Melissa Bujtor

Join Date: Jul 2017

Posts: 29
#14

28 Jul 2017, 19:17

Not at all, part my explanation not being that fabulous too!

Yes, exactly what I was trying to achieve, a nested loop. You have helped a great deal, I couldnt achieve the second nested loop. I see now that can I nest multiple times - is there a limit to how many times I can nest?

My last question on this, (promise!) is the final part of my original syntax originally cited in post #8:

Code:

sort Infant_diplo putexcel H1=("F") I1=("DF1") J1=("DF2") loc row = 2 foreach x of varlist wk3_zwei { regress `x' i.MothersEthnicity_Nu putexcel H`row' = (e(F)) I`row' = (e(df_m)) J`row' = (e(df_r)) loc row = `row' + 1 }

I included this syntax because I want to also extract values for F, DF1 and DF2 to excel.

I have just tried to amend this syntax to include the double nested loop we used for the prior syntax:

Code:

putexcel H1=("F") I1=("DF1") J1=("DF2") loc row = 2 foreach x of varlist wk3_zwei { forvalues id = 0/2 { forvalues me = 0/2 { regress `x' i.MothersEthnicity_Nu & ==`me' & i.Infant_diplo==`id' putexcel H`row' = (e(F)) I`row' = (e(df_m)) J`row' = (e(df_r)) loc row = `row' + 1 } } }

However when I run it I receive the error message: ' & Invalid Name" so STATA is telling me there is something wring with how I have input the Infant Diplotype in this instance.

Can I not do the double nested loop, to extract the additional values required for each set of outcomes required? Is it not possible because of the regress command? Should I try to incorporate this into the one syntax rather than have it split over two (first part running the command summarise and second part running this code for regress).

As was saying previously, what I want to achieve is the ability to extract the values that are available when I run:

Code:

sort by Infant_diplo by Infant_diplo: oneway wk3weight MothersEthnicity, tabulate

My final out put for each dependent variable (wk 3 weight, BMI etc) needs to be:

three values of MothersEthnicity (For Chinese, Malay, Indian extracting to excel mean and SD as well as F, DF1 and DF2) for the first value of Infant_diplo and first dependent variable.

Hope I have described the issue in a manner that is easier to understand! I am most certainly a work-in-progress.

Thanks for your time William, appreciated.

Mel
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#15

29 Jul 2017, 05:49

I receive the error message: ' & Invalid Name" so STATA is telling me there is something wring with how I have input the Infant Diplotype in this instance.

No, Stata is telling you that where it is looking for a variable name it has found the character "&" which is not a variable name.

Code:

regress `x' i.MothersEthnicity_Nu & ==`me' & i.Infant_diplo==`id'

should be more like what I wrote in post #11, probably

Code:

regress `x' i.MothersEthnicity_Nu if MothersEthnicity==`me' & Infant_diplo==`id'
Comment

Announcement

HELP! Put Excel - GLM, Sort, By

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment