Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Twoway graph serset

    Hello,


    I would like to extract graph to data, in particular, extract two-way graph to data in excel format.
    However, I saw that for instance, other density graphs can be extracted using serset command, however, if I use with two-way graph, the data becomes bizarre.

    For instance, it I use serset command for kdensity graph,
    2.4 5.6
    2.5 5.7
    2.1 5.8
    etc..


    But if I use serset command for twoway graph,
    0 2 10
    0 2 10
    0 2 10
    0 2 10
    0 2 10
    0 2 10
    0 2 10
    0 2 9
    0 2 9
    0 2 10
    0 2 10
    0 3 10
    0 3 10
    0 3 10
    0 3 10
    0 3 10
    0 3 10
    0 3 10
    0 3 10
    0 3 10
    0 3 10
    0 3 7
    0 3  
    0 3  
    0 3  
    0 3 10
    0 3  
    0 3  
    0 3  
    0 3 9

    it becomes like this.




    Therefore, I would like to know if there is other method to extract two-way graph into data (using serset command or not)?


    I am not sure if my question is clear..


    Thank you.
    Anne-Claire












  • #2
    I do not follow what you want. The graph created by kdensity is just a twoway line graph. So what you get are two variables corresponding to the axes.

    Code:
    sysuse auto, clear
    kdensity length
    gr save gr1, replace
    serset 0
    serset use, clear
    rename (__000000 __000001) (y x)
    tw line y x, saving(gr2, replace)
    gr combine gr1.gph gr2.gph
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	89.8 KB
ID:	1594643

    Comment


    • #3
      Keep in mind the -kdensity- default for n(#) is min(N, 50) while the default for -twoway kdensity- is n(300). This will create differences in the data extracted from the data.

      Comment


      • #4
        Thank you for the replies !




        Now I understood that what I was talking was about just two variables in one graph.
        However, I did not understand the code you have wrote.. (Sorry I'm really a beginner in Stata...)
        I wanted to get a data (in excel format) from the two-way graph with 2 variables which was used to see the difference of density from two variables.
        Therefore, I was asking for how to get this from using serset command. If I cannot get it from serset command, is there any other method ?

        Thanks !








        Comment


        • #5
          This looks like an XY problem ( https://en.wikipedia.org/wiki/XY_problem ) to me. That is, you tried to help by narrowing the problem down to a smaller issue (how to use sersets), but in doing so you have inadvertently hidden the real problem you want answered and focused the attention on something that does not help you at all. Sersets are not intended for use by regular users of Stata (let alone beginners). StataCorp is very good at trying to give users the same tools they have to add commands to Stata, which is why you can get access to sersets. But to repeat myself again: you should stay away from them.

          So, now we know that sersets are not the solution. To make some more positive progress and really help you, we need to know what the real problem is you want to solve. Can you tell us that?
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Hi, sorry that my questions were unclear..

            To explain from beginning, I am using firm-level panel data and was generating the density graphs (y-axis: the percentage of observations; x-axis: log_output), therefore I have used kdensity command.
            And for the next step, I used serset command that enables reversing engineer the data that stata graph was constructed with.

            For instance, when I generated a density graph of year 2018, I could get the data extracted from the graph.
            However, when I generated a density graph of year 2018 and 2020, since it has different density, the data was as below:

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input double(2018 2020) float log_output
              .0004566025107534187 2.1044299602508545  9.567924
             .00044484920080820035 2.1425071655707217  9.861096
             .00040958927097254564  2.180584370890589 10.161576
             .00035082272124645425 2.2186615762104562 10.182395
              .0002685495516299263 2.2567387815303235 10.219068
             .00016276976212296196 2.2948159868501907 10.059332
             .00003348335272556095  2.332893192170058 10.381873
                                 0  2.370970397489925  9.396801
                                 0 2.4090476028097925   9.43321
                                 0 2.4471248081296597  9.595257
                                 0  2.485202013449527  9.636411
                                 0  2.523279218769394  9.609747
                                 0 2.5613564240892615  9.602918
                                 0 2.5994336294091287  9.776806
                                 0  2.637510834728996 10.019296
                                 0  2.675588040048863 10.426855
                                 0 2.7136652453687304 10.075003
                                 0 2.7517424506885977 10.173587
                                 0  2.789819656008465  9.919168
                                 0  2.827896861328332 10.059264
                                 0 2.8659740666481994 10.171206
                                 0 2.9040512719680667  7.402017
                                 0  2.942128477287934         .
                                 0  2.980205682607801         .
                                 0 3.0182828879276684         .
                                 0 3.0563600932475357 10.270204
                                 0  3.094437298567403         .
                                 0   3.13251450388727         .
                                 0 3.1705917092071374         .
                                 0 3.2086689145270046  9.151247
            .000049578783615194704  3.246746119846872  9.419602
             .00017615661289919687  3.284823325166739  9.590576
              .0002792278222927625 3.3229005304866064  9.420062
             .00035879241179589163 3.3609777358064736  9.337263
             .00041485038140858427  3.399054941126341  6.929214
               .000556880321853227  3.437132146446208  8.252649
              .0006819192273749494 3.4752093517660754  8.024282
              .0007599448931157989 3.5132865570859426  9.077544
              .0007909573190757754   3.55136376240581  7.904799
              .0007749565052548787  3.589440967725677  5.373601
              .0007119424516531089 3.6275181730455444         .
               .000601915158270466 3.6655953783654116  9.564207
              .0004448746251069501  3.703672583685279  10.45219
             .00037924663930938773  3.741749789005146   10.7501
              .0003071878558757984 3.7798269943250133 10.553906
             .00030421636711572666 3.8179041996448806         .
              .0003042085534821183  3.855981404964748         .
             .00030721571383522553  3.894058610284615         .
              .0004078054533621134 3.9321358156044823  9.748738
               .000655507746913433 3.9702130209243496  9.413807
              .0010453531194945396  4.008290226244217         .
               .001401802106356166  4.046367431564084  10.11448
              .0017282814758564878  4.084444636883951  10.07336
              .0019372277459046268  4.122521842203819 10.028668
               .002028640916500583  4.160599047523686         .
              .0020025209876443567  4.198676252843553         .
              .0018588679593359476   4.23675345816342   8.27697
              .0016277543148099653 4.2748306634832876  9.838158
              .0014120503167015755  4.312907868803155  9.937252
              .0011023198390314394  4.350985074123022  9.082714
              .0006985628817995571  4.389062279442889  8.139772
              .0003143515587257548 4.4271394847627565  8.817812
              .0002244073202183201  4.465216690082624 10.018007
             .00021092809242136948  4.503293895402491  9.363429
              .0003587297558742333  4.541371100722358   9.23246
              .0005829640509691335 4.5794483060422255  9.310748
              .0010298365867043447  4.617525511362093  9.399506
               .001483528277592935   4.65560271668196   9.74006
              .0018196868690293425  4.693679922001827  9.330611
              .0021592567949777768 4.7317571273216945  9.684529
              .0024268662311606654  4.769834332641562  9.882376
               .002696928284263707  4.807911537961429         .
               .002914267680667063  4.845988743281296         .
              .0030096811761832127 4.8840659486011635  8.920382
              .0029168169708544594  4.922143153921031  8.880051
               .002885254254746569  4.960220359240898         .
               .002642131959624751  4.998297564560765         .
              .0024496469050086568 5.0363747698806325         .
               .002322355668817003    5.0744519752005  9.394835
               .002488679157094697  5.112529180520367  9.113639
              .0024684979213106494  5.150606385840234  9.384831
              .0023243564380763942 5.1886835911601015  8.993385
              .0020962820754696567  5.226760796479969         .
               .001850590794797631  5.264838001799836         .
              .0019232602778316015  5.302915207119703         .
              .0021757549937546977 5.3409924124395705         .
               .002307068917513949  5.379069617759438         .
               .002333924990103119  5.417146823079305         .
              .0022602489115173367  5.455224028399172         .
               .002079431102444787 5.4933012337190394         .
               .001878937341497531  5.531378439038907         .
              .0017840442104903138  5.569455644358774         .
              .0017736140800502095  5.607532849678641  9.218263
               .001756231627962711  5.645610054998508  9.306288
              .0018228028353002436  5.683687260318376  9.173756
               .001990468882517715  5.721764465638243  9.757135
              .0022439206001226473   5.75984167095811  9.145731
               .002723191670344887  5.797918876277977   9.38125
               .003755911201405303  5.835996081597845  9.251581
               .004718915747029722  5.874073286917712  9.384183
            end
            Now I knew that serset command is not working for this kind of problem, could you tell me if other is any other method to get a data ?
            For info, I well extract the data into excel format and re-generate the density graph in excel file.

            Thank you

            Comment


            • #7
              You can just create these variables directly:
              Code:
              kdensity log_output, generate(x d)
              See help kdensity
              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment


              • #8
                Now I knew that serset command is not working for this kind of problem,
                Erm, I don't think you do know that the command is not working; instead I would suggest that you're not using -serset- correctly. As Maarten says, it's very tricky.

                Why you should want to export to Excel and redraw a graph there is a mystery too, because you will not have a reproducible audit trail regarding how the Excel graph is created.

                In addition, it's good of you to show some data, but unfortunately not very helpful at all, because (i) there's something wrong with your -dataex- output -- it refers to illegal variable names; (ii) you've given no information about how you created these data, including the graph command.

                All that said, look at the following which does the sort of thing you're seeking, I think:

                Code:
                serset clear
                sysuse nlsw88, clear
                kdensity wage, saving(junk.gph, replace)
                
                graph use junk.gph
                serset dir
                serset use , clear
                save junk, replace
                describe, fullnames
                rename __000000 fx
                rename __000001 x
                save junk, replace // or -export excel-, etc.
                export excel using junk.xlsx, replace
                * this reproduces the kernel density graph (without labels, etc)
                gr tw line fx x
                Also look at -grexport- on SSC. (NB I've found that this command only works for me if I comment out the "Error Messages" section.) It doesn't matter, in the sense that the program is essentially just a wrapper for the sort of code I've shown you above
                Last edited by Stephen Jenkins; 25 Feb 2021, 02:54. Reason: PS while I was producing my answer, I see others have produced similar responses.

                Comment


                • #9
                  Originally posted by Maarten Buis View Post
                  You can just create these variables directly:
                  Code:
                  kdensity log_output, generate(x d)
                  See help kdensity
                  Maarten's advice will work for kdensity, but not twoway kdensity.

                  Code:
                  sysuse auto
                  tw kdensity length, predict(x d)
                  Res.:

                  Code:
                  . tw kdensity length, gen(x d)
                  option gen() not allowed
                  r(198);
                  Scott Merryman's point is key here:

                  Keep in mind the -kdensity- default for n(#) is min(N, 50) while the default for -twoway kdensity- is n(300). This will create differences in the data extracted from the data.
                  That is, twoway kdensity evaluates 300 points by default. What you need are the first two variables extracted from serset 0. You can export these to Excel as illustrated by Stephen Jenkins in #8.


                  ADDED IN EDIT: There is an option to specify -n- in kdensity. So to produce the default twoway kdensity graph using kdensity

                  Code:
                  sysuse auto
                  kdensity length, n(300) generate(x d)
                  Then you do not have to resort to sersets.
                  Last edited by Andrew Musau; 25 Feb 2021, 05:36.

                  Comment


                  • #10
                    Originally posted by Andrew Musau View Post
                    Maarten's advice will work for kdensity, but not twoway kdensity.
                    That is the whole point of my advise: I interpreted the question such that Anne-Claire for some mysterious reason does not want to make graphs in Stata. All she wants is the data. So in that case it makes sense to not make the graph, but instead create the data, which is what this advise does.

                    I agree we are probably still in XY problem teritory: Why does Anne-Claire think it is necessary to create the graph in Excel? This indicates to me we are still not at the real problem Anne-Claire wants answered.
                    ---------------------------------
                    Maarten L. Buis
                    University of Konstanz
                    Department of history and sociology
                    box 40
                    78457 Konstanz
                    Germany
                    http://www.maartenbuis.nl
                    ---------------------------------

                    Comment


                    • #11
                      My colleagues do not use Stata so I should produce graphs and datas in Excel format as well, whereas I am working in Stata.
                      Again, sorry if my questions were unclear, I just wanted to get the data that is constructed for two-way graph with two variables. But the problem is that when I extracted the data, it looked as a table that I posted above.
                      Maarten Buis, I understand what you meant by XY problem and I do not have to make a graph in Stata, that's why I was questioning if there is any other method.
                      Actually, I cannot really post with the data I am using since there is confidential issue.
                      But I can say that my dataset looks as:

                      firm number log_output year
                      1 1.2 2016
                      2 2.3 2019
                      3 2.5 2012
                      4 9.10 2014
                      1 4.5 2019
                      6 2 2009
                      3 9.9 2009
                      4 4.8 2001
                      9 2.3 2003
                      10 3.4 2009
                      ... ... ...


                      So what I want for graph: log_output in x-axis and %of firms in y-axis; and comparing the two density graphs with year 2018 and 2020.
                      And I want a data which used for constructing this graph.





                      Comment


                      • #12
                        Originally posted by Anne-Claire Jo View Post
                        My colleagues do not use Stata so I should produce graphs and datas in Excel format as well, whereas I am working in Stata.
                        That was the crucial bit of information I was looking for. Now it makes sense.

                        ---------------------------------
                        Maarten L. Buis
                        University of Konstanz
                        Department of history and sociology
                        box 40
                        78457 Konstanz
                        Germany
                        http://www.maartenbuis.nl
                        ---------------------------------

                        Comment


                        • #13
                          So someone has a solution for this ?..

                          Comment


                          • #14
                            Show us the commands that you used to generate your graph(s).

                            Comment


                            • #15
                              twoway (kdensity log_output if year == 2018)(kdensity log_output if year == 2020), saving("log_output_2018_2020.gph", replace)

                              clear
                              graph use "log_output_2018_2020.gph"
                              serset dir
                              serset use

                              export excel using "/Users/log_output_2018_2020.xlsx", sheet("2018_2020") cell(B2) sheetmodify firstrow(varlabels)

                              Comment

                              Working...
                              X