Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtline and standard deviation

    Hello everyone,

    I need to draw a lineplot for paneldata.
    I used the following command:

    collapse (mean) housingindex, by (agegroups wave)
    sort agegroups wave
    xtset agegroups wave
    xtline housingindex, overlay

    That worked. But now I need to draw the standard deviation in this plot, also.
    The byoptions are limited in the xtline-command, when overlay is used.

    Do you know, if there is any possibility to add the standard deviation to the plot?
    Maybe addplot-option could work - but I don't understand, how that works.

    Thank you in advance!

    Mimi

  • #2
    If you go back to collapse and add

    Code:
    (sd) sd=housing index
    you have the extra variable you want.
    xtline won’t do the calculation for you.

    Comment


    • #3
      Dear Nick,

      thank you for your quick answer!

      I did that before with the following command:

      Code:
      collapse (mean) meanhousingindex = housingindex ///
      (sd) sdhousingindex=housingindex, by(agegroups wave)
      sort agegroups wave
      xtset agegroups wave
      xtline meanhousingindex sdhousingindex, overlay
      But it didn't work. Stata said, that you can not use multiple variables by using the xtline-command and the overlay-option. But I need the overlay-option to show the different years of my panel.
      Is there anything else to do to get a lineplot which shows the development of the housingindex in the agegroups over the time and the standard deviation also?

      Comment


      • #4
        I didn't think about the whole of your code before (hereabouts I was looking at my phone over a light lunch), but now that I do I see that it makes no sense.

        If you are collapsing by both identifier and time variable, your results are just the individual values (which are the new means) and SDs which are necessarily missing, as Stata uses (sample size - 1) in calculating SDs and so for SDs of individual values the calculation implies dividing by zero.

        I don't have your dataset, but here is the difficulty shown. I truncated the output, which is more of the same.


        Code:
        webuse grunfeld, clear 
        xtset company year
        collapse (mean) invest (sd) sd=invest , by(company year) 
        
        list 
        
            +------------------------------+
             | company   year   invest   sd |
             |------------------------------|
          1. |       1   1935    317.6    . |
          2. |       1   1936    391.8    . |
          3. |       1   1937    410.6    . |
          4. |       1   1938    257.7    . |
          5. |       1   1939    330.8    . |
             |------------------------------|
          6. |       1   1940    461.2    . |
          7. |       1   1941      512    . |
          8. |       1   1942      448    . |
          9. |       1   1943    499.6    . |
         10. |       1   1944    547.5    . |
             |------------------------------|
         11. |       1   1945    561.2    . |
         12. |       1   1946    688.1    . |
         13. |       1   1947    568.9    . |
         14. |       1   1948    529.2    . |
         15. |       1   1949    555.1    . |
             |------------------------------|
         16. |       1   1950    642.9    . |
         17. |       1   1951    755.9    . |
         18. |       1   1952    891.2    . |
         19. |       1   1953   1304.4    . |
         20. |       1   1954   1486.7    . |
             |------------------------------|
         21. |       2   1935    209.9    . |
         22. |       2   1936    355.3    . |
         23. |       2   1937    469.9    . |
         24. |       2   1938    262.3    . |
         25. |       2   1939    230.4    . |
             |------------------------------|
        So, let's back up here. You can average over panels or or over times, but averaging over both just returns the original data. Let's suppose you want to average over panels. But once you have done that, the dataset has in effect been collapsed to a single panel and xtset settings can be superseded by tsset settings. In fact you can just use line directly.

        With the same dataset, but starting again:

        Code:
        webuse grunfeld, clear 
        collapse (mean) invest (sd) sd=invest , by(year)  
        line invest sd year
        I won't show the graph, but you can run the code yourself. it works. (In this case invest should surely be looked at on logarithmic scale, but that is a different story.)

        If you want something else, let us know, but a request to collapse by identifier and time variable is at most a mapping from the dataset to itself.

        Comment


        • #5
          Dear Nick,

          thank you, that was very helpful!

          But now I get a lineplot, which only shows the SD for the housingindex over time on the one hand and on the other hand the mean of the housingindex over time.
          I would like to get a plot which shows the development of the mean(housingindex) for different agegroups over time (that worked with collapse and xtline housingindex, overlay). And now I would like to add the SD of the housingindex for that development in form of a whisker or something like that.
          In addition I would get different lines for the agegroups in which the whiskers of the SD would be drawn for the housingindex over time.

          I hope this explanation was understandable.

          The picture shows the plot I already have. Just the SD is missing.

          Thank you a lot in advance.

          Attached Files
          Last edited by Mimi La; 26 Sep 2019, 08:53.

          Comment


          • #6
            Sorry, but that looks like the same question to me and the answer is the same. If you were able to

            Code:
            xtset agegroups wave
            then there can be at most one observation for each distinct pair of the two variables and the SD of anything for that pair is just not defined. Otherwise put, show us the results of


            Code:
            collapse (mean) meanhousingindex = housingindex (sd) sdhousingindex=housingindex, by(agegroups wave)  
            
            list
            Then just as with the results #4 the SD will not show up on the graph because all its values are missing and there is nothing to show.

            Other way round, if I am misunderstanding what you are doing, then you should surely show the syntax you used!
            Last edited by Nick Cox; 26 Sep 2019, 10:22.

            Comment


            • #7
              Dear Nick,

              thank's a lot for your help!

              In my case I got values for the standard deviation with

              Code:
              collapse (mean) meanhousingsindex = housingindex ///
              (sd) sdhousingindex=housingindex, by(agegroup wave)
              I can not explain, why that worked in my case. I collapsed the variables from my dataset, which I converted from wide to long format before. There I set as time variable "wave" and as the person variable "fallnum". I guess, that could be the reason maybe?

              As a solution for the problem of plotting the data, I exported now the data to excel and draw the line-graph there. That worked, even if I would have prefered to do it in Stata.

              Have a nice day and please let me know, if you find a solution to show the lineplot with Stata.

              Mimi

              Comment


              • #8
                Thanks for your reply. Without data examples all I can gather is that you are moving between different datasets and not explaining enough about the structure of each dataset for specific advice to be given to you usefully now. For example now you are talking about a new "person variable" which you didn't tell us about before. That is always allowed but (obvious but crucial) we can't possibly know what you don't tell us, as also in understanding what wide and long versions you are talking about.

                In #6 I asked you to show syntax but all you did in #7 was quote back at the syntax I mentioned, so sorry, but I am no further forward.

                Once your data have successfully been xtset. a command like that you give collapsing on both identifiers can only return the original values as means and SDs of missing. Such a collapse will work in the sense that Stata will not complain, still less issue an error message, but such a collapse is still useless. I have explained the principle and given an example, and I can't think of a third way.

                I can't offer any further solutions as I am now in the dark about what dataset you have in mind.

                If you wish to pursue this further please read and act on https://www.statalist.org/forums/help#stata to give an explicit data example.

                Comment

                Working...
                X