xtline and standard deviation

Mimi La

Join Date: Sep 2019

Posts: 9
#1

xtline and standard deviation

26 Sep 2019, 03:51

Hello everyone,

I need to draw a lineplot for paneldata.
I used the following command:

collapse (mean) housingindex, by (agegroups wave)
sort agegroups wave
xtset agegroups wave
xtline housingindex, overlay

That worked. But now I need to draw the standard deviation in this plot, also.
The byoptions are limited in the xtline-command, when overlay is used.

Do you know, if there is any possibility to add the standard deviation to the plot?
Maybe addplot-option could work - but I don't understand, how that works.

Thank you in advance!

Mimi
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

26 Sep 2019, 05:21

If you go back to collapse and add

Code:

(sd) sd=housing index

you have the extra variable you want. xtline won’t do the calculation for you.
Comment
Mimi La

Join Date: Sep 2019

Posts: 9
#3

26 Sep 2019, 07:25

Dear Nick,

thank you for your quick answer!

I did that before with the following command:

Code:

collapse (mean) meanhousingindex = housingindex /// (sd) sdhousingindex=housingindex, by(agegroups wave) sort agegroups wave xtset agegroups wave xtline meanhousingindex sdhousingindex, overlay

But it didn't work. Stata said, that you can not use multiple variables by using the xtline-command and the overlay-option. But I need the overlay-option to show the different years of my panel.
Is there anything else to do to get a lineplot which shows the development of the housingindex in the agegroups over the time and the standard deviation also?
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35698

26 Sep 2019, 07:49

I didn't think about the whole of your code before (hereabouts I was looking at my phone over a light lunch), but now that I do I see that it makes no sense.

If you are collapsing by both identifier and time variable, your results are just the individual values (which are the new means) and SDs which are necessarily missing, as Stata uses (sample size - 1) in calculating SDs and so for SDs of individual values the calculation implies dividing by zero.

I don't have your dataset, but here is the difficulty shown. I truncated the output, which is more of the same.

Code:

webuse grunfeld, clear 
xtset company year
collapse (mean) invest (sd) sd=invest , by(company year) 

list 

    +------------------------------+
     | company   year   invest   sd |
     |------------------------------|
  1. |       1   1935    317.6    . |
  2. |       1   1936    391.8    . |
  3. |       1   1937    410.6    . |
  4. |       1   1938    257.7    . |
  5. |       1   1939    330.8    . |
     |------------------------------|
  6. |       1   1940    461.2    . |
  7. |       1   1941      512    . |
  8. |       1   1942      448    . |
  9. |       1   1943    499.6    . |
 10. |       1   1944    547.5    . |
     |------------------------------|
 11. |       1   1945    561.2    . |
 12. |       1   1946    688.1    . |
 13. |       1   1947    568.9    . |
 14. |       1   1948    529.2    . |
 15. |       1   1949    555.1    . |
     |------------------------------|
 16. |       1   1950    642.9    . |
 17. |       1   1951    755.9    . |
 18. |       1   1952    891.2    . |
 19. |       1   1953   1304.4    . |
 20. |       1   1954   1486.7    . |
     |------------------------------|
 21. |       2   1935    209.9    . |
 22. |       2   1936    355.3    . |
 23. |       2   1937    469.9    . |
 24. |       2   1938    262.3    . |
 25. |       2   1939    230.4    . |
     |------------------------------|

So, let's back up here. You can average over panels or or over times, but averaging over both just returns the original data. Let's suppose you want to average over panels. But once you have done that, the dataset has in effect been collapsed to a single panel and xtset settings can be superseded by tsset settings. In fact you can just use line directly.

With the same dataset, but starting again:

Code:

webuse grunfeld, clear 
collapse (mean) invest (sd) sd=invest , by(year)  
line invest sd year

I won't show the graph, but you can run the code yourself. it works. (In this case invest should surely be looked at on logarithmic scale, but that is a different story.)

If you want something else, let us know, but a request to collapse by identifier and time variable is at most a mapping from the dataset to itself.

Comment

Mimi La

Join Date: Sep 2019

Posts: 9
#5

26 Sep 2019, 08:35

Dear Nick,

thank you, that was very helpful!

But now I get a lineplot, which only shows the SD for the housingindex over time on the one hand and on the other hand the mean of the housingindex over time.
I would like to get a plot which shows the development of the mean(housingindex) for different agegroups over time (that worked with collapse and xtline housingindex, overlay). And now I would like to add the SD of the housingindex for that development in form of a whisker or something like that.
In addition I would get different lines for the agegroups in which the whiskers of the SD would be drawn for the housingindex over time.

I hope this explanation was understandable.

The picture shows the plot I already have. Just the SD is missing.

Thank you a lot in advance.

Attached Files

Last edited by Mimi La; 26 Sep 2019, 08:53.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#6

26 Sep 2019, 09:59

Sorry, but that looks like the same question to me and the answer is the same. If you were able to

Code:

xtset agegroups wave

then there can be at most one observation for each distinct pair of the two variables and the SD of anything for that pair is just not defined. Otherwise put, show us the results of

Code:

collapse (mean) meanhousingindex = housingindex (sd) sdhousingindex=housingindex, by(agegroups wave) list

Then just as with the results #4 the SD will not show up on the graph because all its values are missing and there is nothing to show.

Other way round, if I am misunderstanding what you are doing, then you should surely show the syntax you used!

Last edited by Nick Cox; 26 Sep 2019, 10:22.
Comment
Mimi La

Join Date: Sep 2019

Posts: 9
#7

27 Sep 2019, 09:17

Dear Nick,

thank's a lot for your help!

In my case I got values for the standard deviation with

Code:

collapse (mean) meanhousingsindex = housingindex /// (sd) sdhousingindex=housingindex, by(agegroup wave)

I can not explain, why that worked in my case. I collapsed the variables from my dataset, which I converted from wide to long format before. There I set as time variable "wave" and as the person variable "fallnum". I guess, that could be the reason maybe?

As a solution for the problem of plotting the data, I exported now the data to excel and draw the line-graph there. That worked, even if I would have prefered to do it in Stata.

Have a nice day and please let me know, if you find a solution to show the lineplot with Stata.

Mimi
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#8

27 Sep 2019, 10:11

Thanks for your reply. Without data examples all I can gather is that you are moving between different datasets and not explaining enough about the structure of each dataset for specific advice to be given to you usefully now. For example now you are talking about a new "person variable" which you didn't tell us about before. That is always allowed but (obvious but crucial) we can't possibly know what you don't tell us, as also in understanding what wide and long versions you are talking about.

In #6 I asked you to show syntax but all you did in #7 was quote back at the syntax I mentioned, so sorry, but I am no further forward.

Once your data have successfully been xtset. a command like that you give collapsing on both identifiers can only return the original values as means and SDs of missing. Such a collapse will work in the sense that Stata will not complain, still less issue an error message, but such a collapse is still useless. I have explained the principle and given an example, and I can't think of a third way.

I can't offer any further solutions as I am now in the dark about what dataset you have in mind.

If you wish to pursue this further please read and act on https://www.statalist.org/forums/help#stata to give an explicit data example.
Comment

Announcement