Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cumulative frequencies in longitudinal data

    So my dataset is in the long format. A snapshot of my data is shown below. (I'm having problems downloading dataex on research servers)
    Code:
     
    id t hs benefits
    1 1 3
    1 2 2
    1 3 1 Y
    1 4 1 Y
    1 5 1
    1 6 4
    1 7 5
    2 1 2
    2 2 1
    2 3 6
    2 4 5
    2 5 4
    2 6 2
    2 7 2
    3 1 2
    3 2 4
    3 3 5
    3 4 5
    3 5 5 Y
    3 6 5
    3 7 5
    What I have been trying to do is calculate frequency of individuals in each home state over time, according to whether they received benefits or not. ideally I'm hoping to get two cumulative frequency tables, one for those who never received benefit and another for those who received benefit at a timepoint.

    As such:
    Code:
     
    t hs1 hs2 hs3 hs4 hs5 hs6
    1 36 39 18 39 29 20
    2 3 23 36 12 18 25
    3 25 41 13 32 16 20
    4 42 31 50 4 47 28
    5 45 20 2 30 26 48
    6 5 47 11 18 12 41
    7 32 38 30 12 20 5
    Benefits are only available at t=3 onwards so I expected an increase over time in the second table and thus a fall from t=3 onwards in the table of those who did not receive benefits at any time point.

    What I've tried is generating new variable newvar which has 2 possible values: 1 if individual received benefit at any time point and 2 if they did not. Then I would do following:

    Code:
    bysort newvar: tab t hs, col
    While this produced frequency table, I did not obtain cumulative frequencies. The issue I'm having is in isolating distinct observations - since individuals could receive benefits at more than one time point, simply adding up would give me an incorrect total. This is what I'm stuck on and I would appreciate any insight. Thanks.

  • #2
    if you send an email address via private message or via email to me as an author of dataex (SSC) I would be happy to send you the files.

    I see four variable names. Is the last variable string with values "Y" or missing?

    Comment


    • #3
      Originally posted by Nick Cox View Post
      if you send an email address via private message or via email to me as an author of dataex (SSC) I would be happy to send you the files.

      I see four variable names. Is the last variable string with values "Y" or missing?
      Thank you. yes the last variable is a string variable but in my analyses I used 'encode, gen' to create a numeric variable.

      Comment


      • #4
        Sorry, I am still not completely clear what is going on. I can't see what's encoded, one of your variables shown or something else.

        You should be able to show us the results of

        Code:
        describe id t hs benefit 
        tab benefits

        Comment


        • #5
          I think I've figured out your data and come up with -dataex- output that you or Nick or anybody could use to replicate your data.
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input int(id t hs) long benefits
          1 1 3 0
          1 2 2 0
          1 3 1 1
          1 4 1 1
          1 5 1 0
          1 6 4 0
          1 7 5 0
          2 1 2 0
          2 2 1 0
          2 3 6 0
          2 4 5 0
          2 5 4 0
          2 6 2 0
          2 7 2 0
          3 1 2 0
          3 2 4 0
          3 3 5 0
          3 4 5 0
          3 5 5 1
          3 6 5 0
          3 7 5 0
          end
          label values benefits benefits
          label def benefits 0 "N", modify
          label def benefits 1 "Y", modify
          I've taken the liberty of changing your Y/missing variable benefits to 1/0, which is typically a more useful way to code dichotomies in Stata.


          But I really don't understand the question at all. Why would you expect a cumulative frequency to ever decrease? Perhaps you can try to explain more clearly and, even better, hand calculate the results you hope to get from the example of your data.

          Added: I was imagining based on the description you gave that hs would be constant within id. But that is not even remotely true in the example data. So the people here are moving around from state to state frequently. Is that correct, or is the example not like the real data?
          Last edited by Clyde Schechter; 07 Feb 2017, 15:46.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            I think I've figured out your data and come up with -dataex- output that you or Nick or anybody could use to replicate your data.
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input int(id t hs) long benefits
            1 1 3 0
            1 2 2 0
            1 3 1 1
            1 4 1 1
            1 5 1 0
            1 6 4 0
            1 7 5 0
            2 1 2 0
            2 2 1 0
            2 3 6 0
            2 4 5 0
            2 5 4 0
            2 6 2 0
            2 7 2 0
            3 1 2 0
            3 2 4 0
            3 3 5 0
            3 4 5 0
            3 5 5 1
            3 6 5 0
            3 7 5 0
            end
            label values benefits benefits
            label def benefits 0 "N", modify
            label def benefits 1 "Y", modify
            I've taken the liberty of changing your Y/missing variable benefits to 1/0, which is typically a more useful way to code dichotomies in Stata.


            But I really don't understand the question at all. Why would you expect a cumulative frequency to ever decrease? Perhaps you can try to explain more clearly and, even better, hand calculate the results you hope to get from the example of your data.

            Added: I was imagining based on the description you gave that hs would be constant within id. But that is not even remotely true in the example data. So the people here are moving around from state to state frequently. Is that correct, or is the example not like the real data?
            Thanks for pointing that out, I think cumulative was the incorrect word to use.

            In essence, at each time point from t=3 onwards, if an individual receives benefit they would leave the original population and move into benefits-receiving population. This therefore would cause an increase in frequency over time for population receiving benefits (as at each time point more individuals would join) and a decrease in population who did not receive benefits at any timepoint (as at each time point a few individuals would receive benefit and would thus leave the original population). This steady increase / decrease in the two groups is what I would like to show in my table.

            And yes, individuals will be moving between states frequently.

            Comment


            • #7
              Well, if people are moving between states, then you will not necessarily see increasing or decreasing patterns within states. While there will be a steady increase in the number of people who have ever received benefits, any given state's total could decrease if those people move elsewhere.

              I still don't really understand what you are looking for. But take a look at the code below. It may be on track, or, if not, it may help you clarify what you need.

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input int(id t hs) long benefits
              1 1 3 0
              1 2 2 0
              1 3 1 1
              1 4 1 1
              1 5 1 0
              1 6 4 0
              1 7 5 0
              2 1 2 0
              2 2 1 0
              2 3 6 0
              2 4 5 0
              2 5 4 0
              2 6 2 0
              2 7 2 0
              3 1 2 0
              3 2 4 0
              3 3 5 0
              3 4 5 0
              3 5 5 1
              3 6 5 0
              3 7 5 0
              end
              label values benefits benefits
              label def benefits 0 "N", modify
              label def benefits 1 "Y", modify
              
              isid id t, sort
              by id (t): gen byte ever_benefits = sum(benefits)
              replace ever_benefits = 1 if ever_benefits > 1
              
              table t hs, c(sum ever_benefits) col
              Note that the variable ever_benefits that is created does not distinguish people who have at some point gotten benefits from those who don't. Rather, at any given moment in time, it distinguishes those people who have previously or are currently receiving benefits from those who have not up to that point in time received benefits. So for example, id 1 has ever_benefits = 0 at t= 1 and t = 2, but from t = 3 onward, ever_benefits = 1. The table produced by this code shows, in a column for each state and a row for each time period, the number of people in that state who have ever received benefits up to or including that point in time. The total column at the side shows that when all states are considered together, the number of people having ever received benefits increases over time, but you can easily see that in the other columns, that is not necessarily the case. Note also that the blanks in the table represent combinations of hs and t for which there just aren't any people in that state at that time (regardless of benefits status). So those are zeroes, in a sense, but they differ from the printed zeroes which represent the presence of people (just none who received benefits.)

              I hope this helps move the discussion forward.

              Comment

              Working...
              X