Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Rich Goldstein View Post
    using factor variable notation still allows you to choose which category to exclude - see the section on "setting the base level" in
    Code:
    h fvvarlist
    Thanks Rich, I got it and it is much faster. And I saw I could do a lot more with factor variables.

    Comment


    • #17
      Originally posted by George Ford View Post
      What are you trying to test?
      Thanks George. I am trying to test if the factor variables have statistically significant effects on lastpay( i.e. income) in each of the zones. Hope I am on the right track.

      Comment


      • #18
        Originally posted by George Ford View Post
        drop the // before noconstant (the // comments it out).
        Thanks George, I have done the correction and it worked.

        I also noticed R-squared improved a great deal with 'noconstant' option using either of the codes even though the code with m* included the original categorical variable( mainactivitysector) in the regression:

        by zone: reg loglastpay m* ,noconstant
        or

        by zone: regress loglastpay ib14.mainactivitysector, noconstant

        I really appreciate the contributions so far.

        Comment


        • #19
          Please I also want to know if the estimated coefficients can still be read as percentage changes in the dependent variable since the dependent variable is expressed as a log in the regression equation (semi-log linear)?

          Thank you.

          Comment


          • #20
            re: #18 on the issue r-squared - r-squared is calculated differently with the noconstant option than it is when you have a constant and the two cannot (should not) be compared; you can, however, obtain a comparable r-squared by using the "hascons" option (if that fits in your situation)

            Comment


            • #21
              To interpret a dummy variable in a ln(y) model you calculate: exp(beta) - 1. That's the percent change in y given dummy going from 0 -> 1.





              Comment


              • #22
                I am trying to test if the factor variables have statistically significant effects on lastpay( i.e. income) in each of the zones.
                This is a poorly specified hypothesis. You are trying to test whether pay is different by activity (either within a zone or across all zones). A regression of pay on the dummies won't answer this question. I will tell you whether pay is different for activities 2-14 is different than activity 1 (assuming 1 is the base activity).

                You'll get 13 coefficients and a constant term (or 13 comparisons).

                In fact, with 14 activities and comparing 2 of them at a time, there are 91 potential comparisons in each zone. The regression does 1:2, 1:3, ... 1:14, but what about 2:3, 2:4, ... 2:14, etc...?

                Say, for instance, 2-14 all had the same pay, but 1 was half of that value. With base 1, you'll get 13 significant coefficients. With base 2, you'll get 1 significant coefficient (comparing 2 to 1).

                Across six zones, you have 91*6 possible comparisons.

                Comment


                • #23
                  Thank you Rich for the clarification in #20 on R - squared / noconstant option.

                  Comment


                  • #24
                    Thanks George for the guidance in #21 and #22. I will look at the hypothesis again.

                    Thank you all for your contributions to my post, I have really learnt a lot through your contributions.

                    Comment


                    • #25
                      If all you want to do is to test whether activity has some explanatory power on wages, then

                      Code:
                      reg pay b1.activity b1.zone
                      testparm i.activity

                      Comment


                      • #26
                        If you want to test all possible combinations across (not within) zones:

                        Code:
                        clear all
                        
                        set obs 10000
                        g zone = int(runiform(1,7))
                        tab zone
                        g activity = int(runiform(1,15))
                        tab activity
                        
                        g pay = 10 + 0.75*activity - 0.25*zone + rnormal() 
                        ​​
                        reg pay b1.activity b1.zone  
                        margins, over(activity) atmeans post //expression(exp(predict(xb))*exp((`e(rmse)'^2)/2)) 
                        forv a = 1/14 {
                            local b = `a' + 1
                            forv c = `b'/14 {
                                qui test `a'.activity = `c'.activity
                                local diff = e(b)[1,`a'] - e(b)[1,`c']
                                di "(`a', `c') : " %5.3f e(b)[1,`a'] " - " %5.3f e(b)[1,`c']  " = "        ///
                                    %5.3f `diff' "; F = " %5.1f r(F) _col(25) " (" %5.3f r(p) ")"
                            }
                        }
                        You'll see that the increment of the difference between activities is about 0.75, as specified in the DGP.

                        if you want within zone, you'll have nearly 900 comparisons, which would be hard to write up. It depends on what you're after, but that approach does not seem very helpful in telling a story.

                        While doable, what this tells me is that you need to really think about your hypothesis. If all you want is to say that activity matters, then #25 will do that (typically, if any of the coefficients is statistically significant, you'll get a stat sig F).

                        if you use ln(pay), then take the // out of the margins command. Or, this could be rewritten to give the % differences based on the logs if you prefer it. (or you can compute those % differences from the e(b) matrix).

                        Comment

                        Working...
                        X