Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Equivalent of -areg- with Survey Weights

    Hi!

    I am working with survey data , and I want to run a regression with several dummy variables.

    Initially, I wanted to run the following regression:
    svy: areg cost wage age female white black asian disease1 disease2 disease3 disease4 disease5 d_year*
    , where following variables are dummy variables: female, white, black, asian, diseaseX, d_year*
    Unfortunately, I cannot combine the -svy- and -areg- commands.

    Do you know any way around this?

    Would this be correct, in terms of weights?
    areg cost wage age female white black asian disease1 disease2 disease3 disease4 disease5 d_year*, [aweight = perwgt]?

    It would be great if you could give some guidance on this. Thanks a lot in advance!


  • #2
    areg allows pweights and vce(cluster ...).

    If you have strata, poststratification, or a multistage design this will not be
    enough to fully account for the survey design.

    If your absorb() variable doesn't have too many levels, you can just
    use regress with factor variable notation instead of areg. Just
    make sure to include the absorb() variable as a predictor using factor
    variable notation.

    For information on factor variable notation see [U] 11.4.3 Factor variables.

    Comment


    • #3
      Thank you very much, Jeff!

      This may be a silly question, but how can I know what variable to -absorb-?

      Here is more detail about what I am doing:
      I am running the regression in DatabaseA in order to predict cost in DatabaseB. All of the variables in the regression are available in Database A and DatabaseB, except for "cost", which is only available in DatabaseA.
      Initially, I wanted to absorb "HouseholdID", but I just realized that I don't have that variable in DatabaseB. So I am now really unsure about what to absorb!

      Also, regarding factor notation:
      Is it the same to do:
      1) xi i.race
      reg cost _Irace1 _Irace2 _Irace3

      than to do
      2) reg cost i.race
      ?
      I actually used option 1) to run my regression (and renamed the _IraceX variables), but I am now wondering if it is correct.

      thank you in advanceT

      Comment


      • #4
        Hi Ana,
        As a follow up to your question. I have personally used absorbed only when the variable to absorb has Many categories, as would be the case when you estimate panel data models.
        For example, say you want to absorb age, it would be equivalent to include a dummy for each case of age (probably from 1 to 99). While it is possible to do the same simply using -reg-, it you would have far more cases (say 1000 different cases), is less practical to estimate the coefficient to each corresponding dummy, but probably better to absorb them before making any statistical analysis, which in essence is what "areg" does.
        Hope this Helps.
        Fernando

        Comment


        • #5
          Thanks, Fernando. This is very useful!

          I am realizing that I am actually confused about when to use -absorb- in general, though.
          Would you absorb age when you have reasons to believe that your dependent variable varies in ways that are specific to a given age?

          In my case, where I predict costs in DatabaseB by using a regression in DatabaseA, I am wondering if I should absorb anything. I am predicting costs using age and wage, as well as a set of dummy variables for disease and demographic characteristics. The potential variables to absorb (like "region" or "householdID") are only available in Database A. I could absorb age, but, I don't know if it's necessary.
          Any guidance would be appreciated!

          Thanks!

          Comment


          • #6
            Well, its not a matter of "absorbing" per se, but rather controlling.
            In a simple wage equaition, for example, you would have a model like:
            ln(wage)=a0+a1*age+a2*age^2+a3*educ+a4*educ^2+a5*o thers+error
            Here im assuming age has a cuadratic relationship with wages. Which makes sense, as wages increases with experience, but at certain point in life , you might expect wages to fall.
            Now, i could be very interested on what happen at each year of life, rather than assuming a functional form. in which case i could estimate the model like follows:
            reg ln_w i.age educ educ2
            Here, i ll see a more detailed effect (or rather a more detailed control) of age on wages. Perhaps, however, im not interested on the "shape" of the relationship between age and wages, but only education. In this case, i would type
            areg ln_w educ educ2, abs(age)
            Here, the results will be exactly the same as in the previous version, but it wont show the coefficients for all the age dummies.

            In other words, survey design apart, reg and areg will give you exactly the same results. And in general, the only reason you might want to "absorb" a variable is if it has too many categories that makes the OLS estimation (using reg) unfeasible or impractical.
            For example, data of thousands of families, and you want to control for the family effect, it would be impractical to include a dummy for each family, but you would rather either use areg and absorb, or use a panel data type of model.
            Hope this helps
            Fernando

            Comment


            • #7
              Wow, this was a really good explanation. You just made everything much clearer -- Thank you so much, Fernando!

              Comment

              Working...
              X