Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching Imputed Datasets

    Hello everyone,

    for my master thesis I need to analyze data which has been multiple imputated.

    So I have five datasets (5 different .dta-files) which are quite similar, but differ in some cells, because their values were imputated. Additionally, I have one further dataset which only contains of many "0" and some "1" indicating which values were imputated in the other five datasets.

    Now my question is how do I "merge" theses datasets and what specifics do I have to care about before analyzing my data like normal (doing a logistic regression etc.)?

    Thanks in advance for every piece of help I get.

  • #2
    Were the imputed data sets created by Stata? I suspect not, but if so that would make things easier.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      I honestly don't know that. How can I check that?

      Comment


      • #4
        So far I was not able to check whether those Datasets were created by Stata nor was I able to solve my initial problem.

        ​​​​​​​

        Comment


        • #5
          if they were created by the official Stata command, there will be variable names starting "_mi"; are there? also, generally all imputed sets would be in one file (which it appears they are not); the user-written -ice- command also includes a counter and also generally results in just one file; you should probably read:
          Code:
          help mi import
          to see if that gives you any help in determining what is going on

          you don't say where this data came from but I would find it very surprising if the source gives you no help/information on the imputation process

          Comment


          • #6
            They could have been created by the -mi set flongsep- command, which creates separate files for each imputation, in which case the data are already usable. But then again they may have been created some other way. Like Rich G., I am a little surprised that you don't have any info on how these files were created. I suppose you could just assume flongsep was used and see if it works.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            Stata Version: 17.0 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              What are the actual file names? If flongsep was used, you would have names like

              myflongsep.dta
              _1_myflongsep.dta
              _2_myflongsep
              .etc

              But when you say " Additionally, I have one further dataset which only contains of many "0" and some "1" indicating which values were imputated in the other five datasets" - that doesn't sound like what stata would create.

              From what you say, you may not have the m=0 file -- the original unimputed data -- which may make the task more difficult but hopefully not impossible.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              Stata Version: 17.0 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                This may help:

                https://stats.idre.ucla.edu/stata/fa...-not-included/

                But first make sure it isn't already in flongsep format.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                Stata Version: 17.0 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  One other thing: if you open a file and type mi set, it will tell you if the file is mi set and if so how. For example,

                  Code:
                  . use "C:\Dropbox\testprogs\_1_myflongsep.dta" 
                  (Fictional heart attack data; bmi missing)
                  
                  . mi set
                  data m=1 of flongsep myflongsep
                  
                  . use "C:\Dropbox\testprogs\myflongsep.dta"
                  (Fictional heart attack data; bmi missing)
                  
                  . mi set
                  data mi set flongsep myflongsep, M = 20
                  last mi update 11jan2018 14:55:25, approximately 16 hours ago
                  If you are lucky everything is already mi set and you are ready to go. If not so lucky, You may have to do something like what was described in the UCLA handout.
                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  Stata Version: 17.0 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://www3.nd.edu/~rwilliam

                  Comment


                  • #10
                    First of all, thank you for your help!

                    I found out a way how to merge all my files into a stata MI dataset and the link provided by R. Williams was key to this, because with my imputet five datasets and the indicator dataset I was able to restore the original dataset (with missing values). From this point onward I generated an flong dataset myself and now I'm done with it. Thanks again.

                    Concerning the question were my data is from: Its from the Munich Center for the Economics of Aging. And the Dataset is called SAVE .(http://www.mea.mpisoc.mpg.de/index.php?id=315&L=2)

                    Regarding multiple imputation its written on their website:

                    "Why are there five datasets for each year? Which dataset should I use?
                    Missing data are imputed in SAVE using a multiple imputation technique. This is a Monte Carlo technique in which the missing values are replaced by m>1 simulated versions. Like in other surveys, such as the Survey of Consumer Finances, in SAVE m is set equal to five. In other words, the whole imputation algorithm is repeated five times, producing the five datasets that are provided to the final user.

                    To get meaningful results, each of the completed dataset should be analyzed by standard methods, and the results should be combined to produce estimates and confidence intervals that incorporate missing-data uncertainty. Standard errors obtained using only a single dataset are generally too low; furthermore single imputation is more prone to generate biased results. The statistical analysis of a single dataset is, however, good to get confidence with the data and to gather a first idea about magnitude and direction of the estimated effects. To this scope, it is absolutely indifferent which of the five dataset is used.

                    Rubin, D.B. (1996) “Multiple Imputation After 18+ Years” Journal of the American Statistical Association, 91(434), pp. 473-489 explains how to combine the results obtained from the separate analysis of the five datasets."

                    Comment

                    Working...
                    X