Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Measure changes in team composition

    Hey there,

    I am working with a dataset on the video game industry comprising the variables name (of the team members in the development team), title ( of the game, i.e. project), release_date, developer (i.e. the firm developing the game), mobyscore (how the game was assessed), and title_project_size (size of the development team per project). I already generated the variables person_id and project_id.

    What I want to do is to find out, how team composition changes compared to the previous period. Hence, I was thinking to compare each project's person_id in the current period to the previous one. To do so, I tried the following and wanted to summarize the "changes" of a project compared to the previous one and, moreover, to divide the outcome by the team_project_size:
    . sort release_date

    . gen change=0

    . bys title: replace change = change + 1 if person_id != person_id [_n-1]

    Stata "responded" with :
    weights not allowed
    r(101);
    So, obviously, this doesn't work. I would really appreciate, if someone could help me with that. I was also thinking about the xtset command, which didn't work as the variable release_date contains repeated time values within panel (as a developer may release various games at the same date).

    Thank you all,
    Katrin

    (PS: I already requested to change my user name.)

  • #2
    The problem appears to be that you have a space before the "[_n-1]". There seem to be problems of understanding as well, but this one led to the syntax error Stata reported.

    Added: After more thought, it is not clear what you are trying to accomplish. You want to compare "the current period to the previous one" and yet you do not tell us anything about "period".

    The advise you were given to your previous topic still holds.
    Last edited by William Lisowski; 11 Jul 2017, 05:48.

    Comment


    • #3
      Hey William, thank you so much for your answer. I changed that and the command worked then.


      I further tried to summarize the changes to see how much the team has changed:
      bys title: gen comp_changes = sum(change)

      generate average_comp_changes = comp_changes/title_project_size

      Unfortunately, Stata now assigns the changes correctly (as far as I can see), but sums these up on an individual level, i.e. per person in the team, not on the project level. Hence, a single title features various values for comp_changes. E.g. for a team of 4 people, it assigns change=1 to member1, change=1 to member2, change=0 to member3, and change=1 to member4 and, further comp_changes=1 to member1, comp_changes=2 to member2, comp_changes=2 to member3, comp_changes=3 to member4. But I want these on the level of the project, i.e. title. Shouldn't they all be the same then within a single title?

      Thank you. Best,
      Katrin

      Comment


      • #4
        Sorry, my amendment to post #2 crossed with your post #3.

        I am unable to understand either the structure of your data or your objectives. Certainly the "period" is an issue that I fail to understand. And "sum(change)" probably does not do what you expect - it computes the running total of change, within each title.

        As I wrote in the earlier topic
        It would be particularly helpful if you were to post a small hand-made example, with just a few of your variables and some well-chosen observations that will demonstrate what you seek to do, showing the data before the process and how you would like it to look after the process. In particular, please read FAQ #12 and use dataex and CODE delimiters when posting to Statalist.

        Comment


        • #5
          Dear William,

          I tried to use data, but unfortunately Stata did this:

          Code:
          ssc install dataex
          host not found
          http://fmwww.bc.edu/repec/bocode/d/ either
          1) is not a valid URL, or
          2) could not be contacted, or
          3) is not a Stata download site (has no stata.toc file).
          r(631);

          Sorry for that.

          An example for my data is:


          name title developer mobyscore person_id project_id reldate title_project_size

          Adrian GTA1 A 4.3 1 1 1.Jan. 2010 4
          Adrian FIFA A 6.2 1 3 1.März 2010 3
          Adrian Galaxy B 5.1 1 4 1.April 2010 3
          Brian GTA1 A 4.3 2 1 1.Jan. 2010 4
          Brian Resident Evil B 6.0 2 2 1.Feb. 2010 5
          Colin Resident Evil B 6.0 3 2 1.Feb. 2010 5
          Dan Resident Evil B 6.0 4 2 1.Feb. 2010 5
          Eric GTA1 A 4.3 5 1 1.Jan. 2010 4
          Eric FIFA A 6.2 5 3 1.März 2010 3
          Frank FIFA A 6.2 6 3 1.März 2010 3
          Greg Resident Evil B 6.0 7 2 1.Feb. 2010 5
          Greg Galaxy B 5.1 7 4 1.April 2010 3
          Hank GTA1 A 4.3 8 1 1.Jan. 2010 4
          Hank Galaxy B 5.1 8 4 1.April 2010 3
          Hank Resident Evil B 6.0 8 2 1.Feb. 2010 5


          In my definition, a period refers to the release date per developer.

          So, e.g. for developer A, in period 1 (1.Jan. 2010), the team consisted of Adrian, Brian, Eric, and Hank (person_ids: 1,2, 5, & 8) creating "GTA1". In Period 2 for developer A (1.März 2010), i.e. when working on the next project "FIFA", the team consisted of Adrian, Eric, & Frank (person_ids: 1, 5, & 6). So only two people of the previous team work on the next project, while the other two do not work on this project anymore.

          What I want Stata to do, is to tell me how many percent of the team that worked on the developer's previous project are still working on the next project. In the case above, it would be 50% as two out of four people of the previous project (GTA1) work on the next project (FIFA).

          My idea was to compare the person_id involved in the developer's previous project to the current one, yet, after reading and searching a lot, I cannot find the right solution to this problem.

          Among others, what I tried was:

          Code:
          sort reldate
          by reldate: gen reldate_developer_title_counter =_n
          gen same_person=0
          bys title developer: replace same_person= same_person + 1 if person_id == person_id[_n-1]
          by title: gen same_composition = sum(same_person)/title_project_size[_n-1]
          I am not quite sure, if this measures what it is supposed to measure, so I would really appreciate your help on this.

          Thanks, best,
          ​​​​​​​Katrin

          Comment


          • #6
            .
            Click image for larger version

Name:	Stata.png
Views:	1
Size:	83.2 KB
ID:	1401490

            Comment

            Working...
            X