Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Modelling a variable equal to the difference between two others

    Dear Statalisters,

    I am interested in modelling a variable that equals the differences between two scores. Each of the two scores equals the sum of a set of dummies and is then standardised, resulting, for instance, in values between -1.23 and 2.65.

    My research aims to understand if specific factors are statistical determinants of the difference between these two scores (which I refer to as "discrepancy). Therefore, I created a dependent variable equal to DV= SCORE1 - SCORE2. This variable assumes values between -3.62 and 2.87. Of course, when both scores are equal, the dependent variable equals zero too. That is a sort of equilibrium point in which the two activities (represented by the scores) are balanced, with zero "discrepancy".

    However, I have been wondering how to model this variable correctly. I have used OLS (and panel data models) but I think that the variable is truncated from the construction of the scores; however, I am not sure how to address this.

    I have researched around the web but I could not find any specific case dealing with a similar issue. Do you have any suggestions?

    Thank you,
    Jonathan

  • #2
    https://www.fharrell.com/post/errmed/#change deals with the closely related topic of change in a variable from one timepoint to another. Some of the considerations are different, but most of the same constraints would apply. Have a look at it. I hope you find it helpful.

    Comment


    • #3
      Thank you Clyde for your suggestion, much appreciated!

      Comment


      • #4
        There are certainly bounds as illustrated for example by the paper by Paul A. Samuelson (*) on how deviant can you be? https://www.jstor.org/stable/pdf/2285901.pdf

        My guess is that they won't bite hard for your kind of data.

        The theme "If you're interested in differences, then look at the differences directly" sounds about as simple as you can get, but in my experience still often needs a big push, all the way from people trying to infer differences from a graph of two similar quantities, when it would be as easy but more effective to show these differences, to people obsessing endlessly about formal criteria and P-values from a model fit but not looking at a residual plot to see what it was you did.

        Sometimes differences need some subtlety: difference on what scale, as through log (u / v) or (u - v) / (u + v)


        (*) an economist

        Comment


        • #5
          Thank you Nick for sharing your thoughts. After discussing it with my coauthors we are wondering if it might be more straightforward to regress directly score 1 on score 2 and investigate the behaviour of our dependent variables other things being equal. What worries us is (I know now) a ceiling/flooring effect, where firms that have the maximum/minimum scores cannot improve or decrease further, although it does not seem to have a straightforward solution (maybe Tobit with upper and lower limits ? However those limits exist by construction).

          Comment

          Working...
          X