Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • finding mean when one measurable variable is correlated to binary variable

    Hello!
    I am trying to find the mean signing bonus value of each position in my regression. I am looking at past MLB draft picks and have variable "bonus" that has the signing bonus amount each athlete received. From there, I created 9 binary variables for each position but am having trouble figuring out how to isolate the two of them together. I ran a summary stat that gave me the mean for all the athletes combined, but I am trying to narrow it down by position. I ran an outreg2 that I have attached, however I feel that it was no the correct code to run for what I am trying to accomplish.
    Hope this makes sense!
    Click image for larger version

Name:	outreg.png
Views:	1
Size:	39.3 KB
ID:	1737200


  • #2
    Well, you can get the expected bonus in each position from what you did using:
    Code:
    foreach v of varlist pitcher catcher shortstop thirdbase secondbase firstbase ///
        centerfield leftfield rightfield {
            display `"Mean bonus for `v': %2.1f =_b[_cons] + _b[`v']"'
    }
    BUT, I think there is something wrong with your data. Either your position variables are not correctly set up, or there are some players who had no position at all or had more than one. That's because, last I checked, there are 9 positions on a baseball field. (OK, maybe that's changed--I don't follow sports--so let me know if this is no longer true.) So with 9 variables representing them, and every player having exactly one position, those 9 variables would be colinear with the constant term, and Stata would see that and omit one of them so that the regression can be carried out. (A regression with colinearity among its variables is mathematically impossible to estimate; it is "unidentifiable.") So you should recheck your data to see what the source of this anomaly is.

    That said, there is a much better way to go about this. In modern Stata, there is almost never any reason to create a series of indicator variables like you did. You can just create a single variable, let's call it position, with a different non-negative integer representing each of the 9 positions. So this variable takes on 9 distinct values. Then you do:
    Code:
    regress Bonus i.position
    margins position
    and the output of the -margins- command gives you the expected value of Bonus corresponding to each position.

    Actually, if all you want is the mean Bonus for each position and you need none of the fancier contrasts and tests you can do with a regression you could just do:
    Code:
    tabstat Bonus, by(position)

    Comment


    • #3
      tabstat is the syntax I was looking for! I also realize that I have to clean more data since you are in fact right about an anomaly in my code, but I think I have figured out where it lies. Thank you so much for your help Clyde

      Comment

      Working...
      X