Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Impute a variable to perform multiple regression

    Hi,

    I would like to perform the following multiple regression, w "loglprice" as the dependent variable and the remaining are independent (including some dummy variables):

    Code:
    reg loglprice i.yearquarter loglot logunitsf floors rooms bedrms baths 1.garage 1.porch 1.fplwk 1.airsys i.cellar 1.poolacc
    I selected these variables from the data for a certain period (1999-2011), each containing 14,766 observations, but there is 1 variable "poolacc" which was only registered in 2011 (=3,826 obs).

    I would like to impute the variable "poolacc", in order to perform the regression above correctly.

    Can someone explain how I could do this?

    I already performed the following 'setup' steps interactively in Stata:

    Code:
    mi set mlong
    Code:
    mi register imputed poolacc
    What is the next step ?
    Let me know if I should provide some data.

    Kr,

  • #2
    Mat:
    see Examples under -mi impute- entry, Stata .pdf manual.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Hi Carlo,

      Btw the variable poolacc is a binary variable that takes values 1=if swimming pool is available and 2= if not.

      I looked at the manual, and I believe that I should use:

      Code:
      mi impute logit
      However, I get the following error:

      outcome does not vary; remember:
      0 = negative outcome,
      all other nonmissing values = positive outcome
      r(2000);


      Do you know what I should enter in the window?

      Refer to the attachment.


      Thanks.
      Attached Files
      Last edited by Mat Sko; 19 May 2019, 11:04.

      Comment


      • #4
        I wish to make 3 comments;

        I strongly recommend to use zero for ‘no’ and 1 for ‘yes’.

        There is a whole Stata Manual on this matter, o.e., multiple imputation. Reading it is a hit or miss.

        With (round) 75% of missings, imputing values for such a variable may be taken as pushing too much the envelope.
        Best regards,

        Marcos

        Comment


        • #5
          Hi Marcos,

          I looked at the notes I received from my mentor: "I suspect that you can impute this variable "poolacc' for the other years since they are the same property, right? (under the assumption that the swimming pool was already there)".

          The variables I have selected for the regression contain values for the period 1999-2011, except for the variable "poolacc"(= does the property has a swimming pool?).
          As I mentioned, "poolacc" is a binary variable which I have only values for the year 2011. I believe that I have to copy the same values of 2011 to the other years I have selected before (1999-2011).

          How can I fill in the dots for year 1999-2010 with the values of this particular year 2011 in an efficient way?


          Thanks in advance!

          Kr,

          Comment


          • #6

            According to what you informed in the previous post, there is no need for multiple imputation.

            Assuming you have an id variable,you can try something like:

            . bysort id (year): replace pollacc = pollacc[_n -1] if pollacc ==. & year > 2011

            This is just a guess, since there is no data to work on and I’m out of ‘my’ Stata now.

            Hopefully that helps.
            Best regards,

            Marcos

            Comment


            • #7
              Hi Marcos,

              Here is some data with 3 variables, with the id (I generated myself) ; hhmove=year moved into house; poolacc=presence of a pool (1= Yes, 2=No).


              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float id int hhmove byte poolacc
                3 2001 .
                7 1999 .
               33 1999 .
               46 1999 .
               48 1999 .
               68 2000 .
               74 1999 .
               78 2000 .
               81 2000 .
               82 1999 .
              103 2000 .
              106 2001 .
              108 2000 .
              149 1999 .
              153 2001 .
              159 2001 .
              169 1999 .
              185 1999 .
              197 1999 .
              226 1999 .
              230 1999 .
              255 2000 .
              283 2001 .
              290 2000 .
              295 2000 .
              310 1999 .
              311 2000 .
              313 1999 .
              317 1999 .
              340 1999 .
              343 1999 .
              344 2000 .
              345 2001 .
              351 1999 .
              353 2000 .
              357 2000 .
              359 2000 .
              361 2001 .
              401 2000 .
              402 1999 .
              430 2001 .
              451 1999 .
              457 2001 .
              462 2001 .
              469 2000 .
              501 2001 .
              504 2001 .
              514 2001 .
              519 2000 .
              520 1999 .
              533 1999 .
              547 1999 .
              578 2000 .
              589 1999 .
              613 1999 .
              615 2001 .
              628 1999 .
              630 2001 .
              636 1999 .
              644 1999 .
              647 2000 .
              648 1999 .
              656 1999 .
              657 2001 .
              667 1999 .
              673 2000 .
              674 1999 .
              684 2000 .
              688 1999 .
              690 1999 .
              717 2001 .
              722 2000 .
              724 2000 .
              729 2000 .
              730 2001 .
              741 1999 .
              744 2001 .
              758 2000 .
              762 1999 .
              768 1999 .
              786 2000 .
              787 1999 .
              789 2001 .
              790 1999 .
              791 2001 .
              792 2001 .
              829 2000 .
              873 1999 .
              882 2000 .
              890 2000 .
              901 2000 .
              902 2000 .
              928 2000 .
              936 2000 .
              966 2001 .
              969 2000 .
              972 2001 .
              987 2000 .
              990 1999 .
              997 2000 .
              end
              I tried the code you suggested, but it doesn't seem to fill in all the dots of the variable 'poolacc':

              Code:
              bysort id (hhmove): replace poolacc = poolacc[_n -1] if poolacc ==. & hhmove > 2011

              Thanks in advance for your help.


              Kr,

              Comment


              • #8
                Since there is no year > 2011 in the shared data, filling missing values if year > 2011 is a preposterous task.
                Best regards,

                Marcos

                Comment


                • #9
                  My mistake; but even with hhmove<2011, no changes occur in the variable 'Poolacc' (there are still dots for all years 'Hhmove' except for 2011).

                  Comment

                  Working...
                  X