Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Avoiding Duplicated variables because of calculating Mean

    Dear All,

    I am a new user to STATA and I was requested to make a regression model for using the average of some variables over 4 years, so when I calculated the average for each company I got the average 4 times corresponding to each year of the company's data which led to the multiplication of my observations by 4 and hence it's affecting the results of my regression model.

    How I can avoid that and have only 1 average for each company instead of 4?

    Your help will be so appreciated.

    Regards,

    Mohammed Khaled

  • #2
    Without being shown an example of your data, and without knowing what commands you used to generate the averages, it is difficult to tell you how to use what you now have to accomplish what you need to do.

    My recommendation is that you return to your original data and instead of doing whatever you did to generate your averages, you use the collapse command to reduce your dataset to one observation per year, taking the mean of the variables you need for your model. For further details, see the output of
    Code:
    help collapse

    Comment


    • #3
      Hello William,

      Many thanks for your advice.

      So an example of my data is as follows
      Company Size AverageSize profitability
      Company A 2013 10 10.5 20
      Company A 2014 12 10.5 18
      Company A 2015 10 10.5 17
      Company A 2016 10 10.5 16
      Company B 2013 12 15 21
      Company B 2014 14 15 24
      Company B 2015 16 15 22
      Company B 2016 18 15 23
      I used the following code to generate Average Size:
      egen AverageSize=mean(Size), by(Company)

      if I used the command duplicate drop or collapse I will used the data that I need and I want to keep all the data in one file so how can I delete the repeated values in the average size column without deleting the whole observation.

      Thanks in advance for your support.




      Comment


      • #4
        I was requested to make a regression model for using the average of some variables over 4 years
        and now it is apparent that you need to use the annual values of other variables over the same 4 years.

        You do not want to delete repeated values of average size. If your regression is going to have four observations for each of the companies, each of those observations must have AverageSize for the observation to be included in the regression.

        Are you perhaps trying to do something like
        Code:
        summarize AverageSize
        and you want to include only one value per company? In that case you want something like
        Code:
        egen touse = tag(Company)
        summarize AverageSize if touse==1

        Comment


        • #5
          Many Thanks William for your advise it worked!!

          I used the following regression code:

          regress Profitability AverageSize if ( touse==1)

          My next required task is to compare the periods to each other and I generated a new column for the periods but what I understand that I can only use on if function in the command, so how I can solve for this?

          I tried to use the below command but it didn't work

          regress Profitability AverageSize if ( touse==1) (Period==1)
          Company Size AverageSize profitability Period touse
          Company A 2013 10 10.5 20 0 1
          Company A 2014 12 10.5 18 0 0
          Company A 2015 10 10.5 17 1 0
          Company A 2016 10 10.5 16 1 0
          Company B 2013 12 15 21 0 1
          Company B 2014 14 15 24 0 0
          Company B 2015 16 15 22 1 0
          Company B 2016 18 15 23 1 0
          Thanks in advance for your continuous support.

          Comment


          • #6
            Following the above I also included 1 more column to calculate the average of each period
            Company Size AverageSize profitability Period touse AvSize Period
            Company A 2013 10 10.5 20 0 1 11
            Company A 2014 12 10.5 18 0 0 11
            Company A 2015 10 10.5 17 1 0 10
            Company A 2016 10 10.5 16 1 0 10
            Company B 2013 12 15 21 0 1 13
            Company B 2014 14 15 24 0 0 13
            Company B 2015 16 15 22 1 0 17
            Company B 2016 18 15 23 1 0 17

            Comment


            • #7
              regress Profitability AverageSize if ( touse==1)
              is meaningless. It doesn't make any difference which observation you use to summarize AverageSize, but because Profitability is different in different years, it makes a difference which observation you use.

              If you want to run a regression on your data, do not use touse to select observations. Select them
              Code:
              ... if year==2015 | year==2016
              ... if Period==1
              or something similar.

              Comment


              • #8
                Thanks William that was an amazing solution!!!

                Comment

                Working...
                X