Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Requesting help on data structure, routine and graph

    Hello all,
    I'm working with monthly rainfall data over a 40 year period and the goal is to see if the month that has the mean/median rainfall has been shifting over time (for example in April in 1988 to June in 2018), test whether this shift is significant and graphically present the shift. I'm attaching a dummy of the data structure below (with the median as an example).

    Rain: shows rainfall in mm
    med: show median rainfall for the year
    med_rain: shows cumulative rainfall over months in each year

    I want to generate the variable "marker" that automatically replaces the value of the var with 1 to the closest match between med/mean and med_year.

    On my second problem: what is the best way to structure the data to generate a graph that has the years on the x-axis, months on the y-axis and has dots showing the month with the mean/median.

    Many thanks in advance for your time and help!
    Best, Wameq

    year month rainfall med med_rain
    1988 1 42 127 127
    1988 2 65 127 107
    .
    2017 1 65 232 65
    2017 2 75 232 140

    Attached Files

  • #2
    As I understand it you get the month with rainfall closest to the median rainfall per month by calculating the differences and finding the smallest absolute difference in each year. In principle there could be ties.

    Code:
    gen wanted = abs(rain - med_rain)
    
    bysort year (wanted) : replace wanted = _n == 1
    I have been working with climatic data on and off for some time and never previously come across this way of thinking about seasonality!

    The month that is so identified is

    Code:
    bysort year (wanted) : gen which = month[12]
    Note that you do not need that variable as

    Code:
    scatter month year if wanted
    is enough for that graph.


    Last edited by Nick Cox; 12 May 2020, 03:19.

    Comment


    • #3
      Thanks for the simple and elegant solution Nick!

      I ran the code there is one issue with this approach, the lowest difference could be on the tail end of the year, past the point where we want to stop. I'm pasting data from 1988 to illustrate - we should have identified month 4 but ended up at 9.

      Let me know if you have any suggestions.

      Thanks again for your time!
      Best, Wameq


      Click image for larger version

Name:	Capture.JPG
Views:	1
Size:	35.8 KB
ID:	1552717



      Last edited by Wameq Raza; 12 May 2020, 09:14.

      Comment


      • #4
        If I understand correctly, the code does what you asked in #1. The problem is that you have some other criterion or criteria in mind that the code doesn't touch. i can't read those criteria from the wording "the point where we want to stop", nor can I work out what's the deal with month 4 in 1988.

        I can make sense of this only in quite different terms, for example, that you want

        1. the month with the maximum precipitation From the extraordinary data example (measuring precipitation to finer than nanometre precision!, although perhaps you are averaging stations) I sense that your climate is strongly seasonal, say monsoon climate with major contrast between wet and dry seasons. If so, identifying the wettest month is important detail.

        2. the month which includes the point when some fraction (say half) of the annual precipitation has occurred to date.

        3. something else.
        Last edited by Nick Cox; 12 May 2020, 09:38.

        Comment


        • #5
          Good point, I don't think I explained the problem as well as I thought I did. But doing by your points 1 and 2, that makes a great bit of sense and will likely be better at representing what I'm thinking in my head rather than what I was trying to do, so, thank you! And yes, I calculated a national average from 43 weather stations
          Thank you so much again for your time!!
          Best, Wameq

          Comment


          • #6
            The month with the maximum rainfall is identified by

            Code:
            bysort year (rain) : gen max_month = month[_N]
            The month within which half annual rainfall has already occurred is

            Code:
            bysort year (month) : gen  pr_rain_so_far = sum(rain) 
            by year: replace pr_rain_so_far = pr_rain_so_far / pr_rain_so_far[_N] 
            by year : gen med_month = (_n == 1 | pr_rain_so_far[_n-1] < 0.5) & pr_rain_so_far >= 0.5
            No testing of code by me.

            Comment


            • #7
              Excellent, this worked with a couple of tweaks, thank you again!

              Comment

              Working...
              X