Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create Twoway Line Graph Forcing Gaps for Missing Periods

    Hi all,

    My data:

    Code:
    input float(Datum_n total_unem_bymonth sum_newpositions_bymonth)
    723 148245 2261
    724 150673 4089
    725 144790  855
    726 143049 5430
    727 145249 5507
    732 164182 4655
    733 162495 5044
    734 152841 5753
    735 146375 4993
    736 138150 4628
    737 127136 3637
    738 123275 3318
    739 121203 3301
    740 115404 3811
    744 117633 3418
    745 113398 4188
    746 105133 3700
    747  99974 3164
    749  87939 3584
    I would like to plot these data using a twoway line, with two y-axes; one for new positions, one for unemployment. The x-axis should be for time (Datum_n).

    I ran the following code:
    Code:
    twoway line sum_newpositions_bymonth Datum_n , yaxis(2) ytitle("Monthly Total New Dual VET Positions") || line total_unem_bymonth Datum_n, yaxis(1) ytitle("Number of Registered Unemployed Individuals (Monthly Total)")
    When I run the code above, both lines are continuous, however I do not want them to be.

    The thing is, as is visible from the data extract, there are gaps in Datum_n, and I would like these gaps to be reflected in the line of the following variable: sum_newpositions_bymonth. Basically, I would want the line of sum_newpositions_bymonth to be discontinuous, meaning that it should be "interrupted" whenever there is no "Datum_n" (e.g. over the period 727 to 732) and then, after the gap, start again at the next available date for Datum_n.

    Could anyone please let know how to adapt the code to show this discontinuity?

    ​​​​​​​Many thanks in advance!


  • #2
    Code:
    clear
    input float(Datum_n total_unem_bymonth sum_newpositions_bymonth)
    723 148245 2261
    724 150673 4089
    725 144790  855
    726 143049 5430
    727 145249 5507
    732 164182 4655
    733 162495 5044
    734 152841 5753
    735 146375 4993
    736 138150 4628
    737 127136 3637
    738 123275 3318
    739 121203 3301
    740 115404 3811
    744 117633 3418
    745 113398 4188
    746 105133 3700
    747  99974 3164
    749  87939 3584
    end
    
    format Datum_n %tm
    
    // fillin the missing dates
    sort Datum_n
    gen n = Datum_n[_n+1]-Datum_n
    expand n, gen(new)
    replace total_unem_bymonth = . if new == 1
    replace sum_newpositions_bymonth = . if new == 1
    sort Datum_n
    bys Datum_n (new) : replace new = sum(new)
    replace Datum_n = Datum_n + new
    
    // make the graph
    twoway line sum_newpositions_bymonth Datum_n , ///
        cmissing(n) /// <-- new
        yaxis(2) ytitle("Monthly Total New Dual VET Positions", axis(2)) ylabel(,format(%9.0gc) axis(2)) || ///
           line total_unem_bymonth Datum_n, ///
           cmissing(n) /// <-- new
        yaxis(1) ytitle("Number of Registered Unemployed Individuals" "(Monthly Total)") ///
        ylabel(,format(%9.0gc))
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      We need good terminology for this problem. In the data example there are no missing values. In contrast Stata can have no idea what is absent that might have been included in the dataset but wasn't, unless and until you fill in gaps with missing values.

      Here is another way to do it. Clearly 723 749 and 27 are empirical. Note the "fencepost" problem that 27 = 749 - 723 + 1.


      Code:
      clear 
      set obs 27 
      range Datum_n 723 749 
      save fullset 
      
      clear
      input float(Datum_n total_unem_bymonth sum_newpositions_bymonth)
      723 148245 2261
      724 150673 4089
      725 144790  855
      726 143049 5430
      727 145249 5507
      732 164182 4655
      733 162495 5044
      734 152841 5753
      735 146375 4993
      736 138150 4628
      737 127136 3637
      738 123275 3318
      739 121203 3301
      740 115404 3811
      744 117633 3418
      745 113398 4188
      746 105133 3700
      747  99974 3164
      749  87939 3584
      end
      
      merge 1:1 Datum_n using fullset 
      drop _merge 
      
      sort Datum_n 
      
      list 
      
           +-------------------------------+
           | Datum_n   total_~h   sum_ne~h |
           |-------------------------------|
        1. |     723     148245       2261 |
        2. |     724     150673       4089 |
        3. |     725     144790        855 |
        4. |     726     143049       5430 |
        5. |     727     145249       5507 |
           |-------------------------------|
        6. |     728          .          . |
        7. |     729          .          . |
        8. |     730          .          . |
        9. |     731          .          . |
       10. |     732     164182       4655 |
           |-------------------------------|
       11. |     733     162495       5044 |
       12. |     734     152841       5753 |
       13. |     735     146375       4993 |
       14. |     736     138150       4628 |
       15. |     737     127136       3637 |
           |-------------------------------|
       16. |     738     123275       3318 |
       17. |     739     121203       3301 |
       18. |     740     115404       3811 |
       19. |     741          .          . |
       20. |     742          .          . |
           |-------------------------------|
       21. |     743          .          . |
       22. |     744     117633       3418 |
       23. |     745     113398       4188 |
       24. |     746     105133       3700 |
       25. |     747      99974       3164 |
           |-------------------------------|
       26. |     748          .          . |
       27. |     749      87939       3584 |
           +-------------------------------+
      Now you can get Stata to pay attention to missings, as in @Maarten Buis's helpful answer.

      Comment


      • #4
        Thank you very much both! The code is now yielding the desired result.

        Indeed, terminology is crucial, apologies for the confusion.

        Comment


        • #5
          Also:
          Code:
          tsset Datum_n
          tsfill
          line total Dat, cmissing(n) || line sum Dat, cmissing(n) yaxis(2)

          Comment


          • #6
            @Scott Merryman's method is usually better than that in #2 or #3. If either of the earlier posts shows anything, it is that long-term users can sometimes think of different ways to do it.

            A small detail remains. tsfill as advertised fills in gaps, but it doesn't try to guess at what is absent beyond the first and last observed dates. If that is ever needed, you need something more like #3.

            Comment

            Working...
            X