Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analysis on Individual improvement over time

    I have a huge dataset of individual performance over time. I want to do multiple things with this dataset but am not sure how to do them.

    First, I would like to tell stata to ignore the individual if they don't have at least 4 scores. (Imagine person 1 as p1. So I have p1, p2, p3..... and then I have Test 1 as t1. so t1, t2, t3. Of course my variable names are not as simple as t1-t15) Not all people took all tests. So I only want to look at the people who took at least four tests. Does that make sense?

    Once that is done, I want to somehow flag all cases where the person improved over time. Yes I could look at graphs but that would take forever when I have hundreds of thousands of people. Is there code for this? I haven't done any stats in a year or two so I'm rusty and most of my advanced stats are self taught.

    The eventual goal would be to examine all the people who improved and see if they have things in common. But I'm just struggling to determine how to work with just those individuals' data.

    Thanks in advance.

  • #2
    You didn't show example data, which often leads to not getting any responses. I made up a toy data set that I think resembles your very scanty data description, to illustrate code.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(person_id result1 result2 result3 result4 result5 result6 result7 result8)
    1  .9472316  .9484983 .20113812   .7800556  .2258395        .         . .40200615
    2 .05222337         .  .9874877 .015391795  .7590426 .7012697 .58383894         .
    3  .9743183         .         .    .574749         .        .  .6502236  .2021227
    4         .         .         .   .6551434  .8826448        .   .421985         .
    5 .18564783 .12549753 .08119465  .53569984 .32789275 .1590071         .  .9302015
    end
    
    //  GO TO LONG LAYOUT, WHERE EVERYTHING IS POSSIBLE
    reshape long result, i(person_id) j(time)
    drop if missing(result)
    
    //  KEEP ONLY THOSE WHO HAVE DONE 4 OR MORE ASSESSMENTS
    by person_id, sort: drop if _N < 4
    Pretty much anything you will want to do (including everything you have already stated in #1) will be easier, or only possible, if you switch to a long data layout. There are only a limited number of things that are straightforward to do with wide data in Stata.

    As for wanting to identify who improved over time, you need to explain what you would consider improvement. Is it enough that their last result be better (and does better mean greater or less than) than their first? Or does it mean each result is better than the immediately preceding one? Or some sort of individual regression slope having the right sign? Or any of the myriad other possible definitions of improvement? When you settle on what you mean, it should be possible to translate it into code if you are clear enough about what you want.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thank you so much for taking the time to respond. I had debated whether it needed to be switched to long. And thanks for letting me know how to use -dataex-. I'm completely self taught in Stata so all advice is welcome. I did end up moving the data to SPSS to try and perform the functions I wanted but it's still not exactly right. This will help immensely.

      For improvement, I think I need to do some more thinking about how I will define it. Perhaps some way of predicting where they should have scored on current trajectory vs where they did land.

      Thank you again, and I will be sure to update once I figure out the code and analyses I want to use.

      Comment

      Working...
      X