Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating variables for averages

    Hi,

    I have a panel dataset with 13 waves and my dependent variable is binary.

    I am running a correlated random effects model which requires me to generate average variables for my time-variant explanatory variables ( please see slide 37: http://conference.iza.org/conference...nonlin_iza.pdf )

    So I would do:
    Code:
    egen x1bar = mean(x1), by(id)
    One of my main explanatory variables is "retire", which I think is a categorical variable. This measures how importantly the individual rates retirement as a motive to save money in the present, rated from 1-14 with 0 being very unimportant and 14 being very important.


    To generate the average of this variable, I tried:
    Code:
    egen retirebar = mean(i.retire), by(id)
    But this returned the error message: i: operator invalid

    Could you please suggest how I can generate the average of this variable?

    I think the following would be incorrect because it doesn't take into account that "retire" is categorical
    Code:
    egen retirebar = mean(retire), by(id)
    Thank you

  • #2
    I can't see that the mean of a multiple category categorical variable coded arbitrarily has any meaning or use.

    Comment


    • #3
      Hi Nick,

      Please also see slide 51( http://conference.iza.org/conference...nonlin_iza.pdf ) where it appears that Jeff Wooldridge has generated average variables for any time-varying RHS variables (such as kids), and not for time-invariant RHS variables (such as black).

      I thought I should do the same given that the "retire" variable varies over time

      Thanks

      Comment


      • #4
        Also Dimitriy V. Masterov posted here ( http://stats.stackexchange.com/quest...ice-cre-probit ) where he suggested for the Chamberlain-Mundlak CRE model, we should
        fit a panel random effects probit where the RHS variables are augmented with x¯i , the average of xit for each panel [...] The inclusion of the mean terms should capture the correlation between the unobserved heterogeneity and the covariates that renders the random effect model inconsistent
        . Therefore I thought I should include the mean terms. Would this be incorrect for multiple-category categorical variables? Thanks

        Comment


        • #5
          Back-tracking to #1:

          I see that you "think" that retire is categorical. I suggest that you resolve this doubt.

          If you think that, then I would be amazed that you're expected to calculate its mean for this procedure.

          Conversely if you are treating it as a measure then in Stata terms the last egen statement in #1 is the way to calculate its mean separately for panels.

          But I am not any kind of expert on these random effect models.

          I imagine that experts would want to see your intended model syntax, which I can't see as yet in this thread. I have not read any of your links.

          Comment


          • #6
            Hi Nick,

            Firstly, thank you for pointing this out, I was indeed unsure and I think that I was wrong about retire being categorical. The ordering of this variable is meaningful, so I think it is an ordinal variable. Thus, as there is an intrinsic ordering of the levels of the categories, I think it is possible to calculate the mean for this procedure - would you agree? Then, as you confirmed, I should use:
            Code:
            egen retirebar = mean(retire), by(id)
            Secondly, for categorical variables (such as occupation) in Stata, my understanding is that it is better to attach the prefix (i.occupation), to display all the categories separately in the regression output. Similarly, the c. prefix is attached to continuous variables. For ordinal variables (such as retire), is there a need to attach any prefix when running the regression?

            So for example in a basic version of the Probit RE model (not using Chamberlain-Mundlak CRE model yet):
            Code:
            xtprobit saving1 retire i.occupation, re nolog


            Thanks
            Last edited by Sasha Gulabivala; 27 Feb 2017, 07:24.

            Comment


            • #7
              I naturally agree that it's possible to calculate the mean of an ordinal variable, but necessarily I can't advise on whether it's a good idea for your purpose.

              It seems to me that you need more support from a supervisor, advisor or mentor in talking this through.

              Comment


              • #8
                Thank you for your help - I will discuss this further with my tutor.

                Comment


                • #9
                  Nick Cox how can I tell whether a variable should be treated as categorical or continuous in Stata?

                  I understand that variables like car colour (e.g. red =1, blue =2) would be categorical because there is no meaning or order to the number it is coded with.
                  I also understand that variables like age would be continuous as there is an intrinsic order to this.

                  With a variable such as self-perceived health status, there is an order to this, so would this be an ordinal variable? Would I then treat it as categorical (treated as c.health) in Stata?


                  Thank you for your time

                  Comment


                  • #10
                    Ordinal scales are precisely those on which different researchers (and practitioners too) jump in different directions. For example, many universities average grades say 1 to 5 routinely while people in some departments tell their students that it is wrong to average ordinal scales.

                    I was brought up on texts which preached that Pearson correlation was wrong for ordinal data but Spearman correlation was fine, seemingly obliviously of what Spearman actually does.

                    Comment


                    • #11
                      Hi Dr Cox,

                      Thanks for your reply

                      Indeed, as you suggested, it seems that Pearson's correlation can not be used when there is an ordinal variable.
                      So I ran a Spearman's correlation on saving (my key dependent variable) and health (explanatory/control variable).
                      The result is as follows:

                      There was a strong positive correlation between saving and health, which was statistically significant, rs = 0.1605, p = .0000.
                      This leads me to believe that there is a monotonic relationship between the variables, and as health is on a Likert scale, I think it is appropriate to treat health, an ordinal variable, as continuous (rather than categorical with i. prefix) in Stata.

                      I would be extremely grateful if you could let me know if you think that I have misinterpreted this
                      Last edited by Sasha Gulabivala; 27 Feb 2017, 16:39.

                      Comment


                      • #12
                        Amended with a more readable format:
                        Code:
                         spearman saving health
                        
                         Number of obs =    3065
                        Spearman's rho =       0.1605
                        
                        Test of Ho: saving and health are independent
                            Prob > |t| =       0.0000

                        Comment


                        • #13
                          Hi Nick Cox ,

                          For the Chamberlain-Mundlak CRE model, Wooldridge generates average variables, as these should capture the correlation between the unobserved heterogeneity & the covariates which make the random effect (RE) model inconsistent. So, the CRE model attempts to act as something in between FE and RE.

                          Originally posted by Nick Cox View Post
                          I can't see that the mean of a multiple category categorical variable coded arbitrarily has any meaning or use.
                          I just wondered, do you think that the mean of an indicator/dummy variable would have any meaning or use?

                          Thank you

                          Rose Simmons

                          Comment


                          • #14
                            The mean of an indicator variable has as much meaning and use as is possible. It is the fraction or probability of the state coded 1. If you have 7 females and 3 males and female is coded 1 and male 0 then the mean of 0.7 naturally corresponds to, nay is the same as, the proportion 7/10 who are female.

                            Comment

                            Working...
                            X