Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge two questions

    Hi, Stata users!
    I have two questions. My first question is, I have labour and panel data, and I want to merge two datasets, meaning my data is separated by year and I want to join it for all years. My code is
    merge m:1 idworker using /mnt/Labourdata/dct/workers00.dct
    and Stata tells me
    "file /mnt/Labourdata/dct/workers00.dct not Stata format".
    What I should do to Stata read a dct file?
    My second question is: since this is labour data, some individuals have more than one work. When I merge, how Stata proceeds with this individuals? I have this variables: year, idcompany, idworker, salary, etc. What I want is to sum all salaries of this individuals.

  • #2
    If you

    Code:
     
    search dictionary
    the results start with

    Code:
     
    ---------------------------------------------------------------------------------
    search for dictionary                                       (manual:  [R] search)
    ---------------------------------------------------------------------------------
    
    Search of official help files, FAQs, Examples, SJs, and STBs
    
    [D]     infile (fixed format) Read text data in fixed format with a dictionary
            (help infile2)
    You must read data into memory or put them in a .dta file before you can merge.

    Comment


    • #3
      Thank you, Nick Cox for your answer. But I didn't understand for the data that I want to merge how I will read data into memory. Because I did first
      infile using /mnt/Labourdata/dct/workers01.dct

      Comment


      • #4
        It appears you have two datsets, worker00 and worker01, which are not stored as Stata datasets. in order to merge the two datasets, you will first have to read one of them into memory and then save it as a Stata dataset, so it can appear as the using dataset on the merge command. Something like the following might set you in the right direction.

        Code:
        infile using /mnt/Labourdata/dct/workers00.dct
        tempfile touse
        save `touse'
        infile using /mnt/Labourdata/dct/workers01.dct
        merge m:1 idworker using `touse'
        With that said, Statalist can better help you if we know what commands you have tried and what Stata told you to indicate that there was a problem. Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. See especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using CODE delimiters, as described in section 12 of the FAQ. It would have been useful to see your work leading up to the merge command so we would have know that you were trying to merge two datasets, neither of which is in Stata format.
        Last edited by William Lisowski; 04 Jul 2016, 09:12.

        Comment


        • #5
          Thank you, William Lisowski for your answer. I tried to follow your code, but when I did "save `touse'" Stata replied to me "invalid file specification r(198)'. I also tried just
          save touse
          but Stata replied
          "file touse.dta could not be opened r(603)"
          I later installed the command mergedct and it worked, however I did a mistake. I have several datasets, from 2000 to 2010, each dataset is for one year and I want to join in one dataset all the years to have panel data (and later do xtset idworker). I was trying to do
          mergedct idworker using /mnt/Labourdata/dct/workers01.dct
          mergedct idworker using /mnt/Labourdata/dct/workers02.dct
          However, when I do the 2nd step Stata tells me
          "variable _merge already defined r(110)"
          How should I merge all the datasets into one file?

          Comment


          • #6
            The first problem is likely to be a problem with permissions on your machine or system.

            Different datasets for different years should almost always be appended not merged.

            Comment


            • #7
              Thank you, Nick Cox, it helped.
              I have been using the following code
              infile using /mnt/Labourdata/workers01.dct
              tempfile workers01
              save `workers01'
              clear
              infile using /mnt/Labourdata/workers00.dct
              append using `workers01'
              tempfile append1
              save `append1'
              clear
              infile using /mnt/Labourdata/dct/workers02.dct
              tempfile workers02
              save `workers02'
              clear
              use `append2'
              append using `workers02'
              tempfile `workers02'
              save `append2'

              And so on...meaning that I was appending each dataset at a time. However, in the middle Stata stopped when I was doing
              save `workers05'
              I/0 error writing.dta file Usually such I/0 errors are caused by the disk or file system being full. r(693)
              Probaby because I have a very large data, more than 10 million people.
              Does anyone know how can I solve this?

              Comment


              • #8
                Added in edit: this response to post #5 crossed with post #7.

                You have three problems.

                when I did "save `touse'" Stata replied to me "invalid file specification r(198)'.
                I expect that you have these commands
                Code:
                tempfile touse
                save `touse'
                in the Do-File Editor window, but selected and ran them one at a time, rather than running the entire do-file. If you look carefully at your results from that, you'll see that each line was copied into a temporary do-file and run, so even though all the commands are in the same window in the do-file editor, they are run as separate do-files. The important thing to keep in mind is that local macros (and similarly the temporary files defined with the tempfile command) vanish when the do-file within which they were created ends. So when the temporary do-file containing the first line ended, the local macro touse was no longer defined. You would need to run the last four lines of my example as a group.

                I also tried just
                save touse
                but Stata replied
                "file touse.dta could not be opened r(603)"
                For that, the problem is as Nick Cox described. My guess is that Stata is trying to write your output to a directory in which you do not have permission to write. You should issue the pwd command and if the directory it reports is not one Stata can write it, either change Stata's working directory (on your File menu) or use a full path with the filename.

                I have several datasets, from 2000 to 2010, each dataset is for one year and I want to join in one dataset all the years to have panel data (and later do xtset idworker).
                Again agreeing with Nick Cox, you do not want to use merge to create panel data from yearly files, you want to use append.

                If I were confronting your situation, I would go through your data files one at a time
                • use infile using to read the file into memory one at a time
                • then make whatever changes you need (for instance
                  Code:
                  generate year=2000
                  for the first file) with the goal that all the files will have the identifier and time variables, and the other variables will have the same names in each file
                • then save the data in memory as a Stata dataset, say data2000, then data2001, ...
                And following that, use append repeatedly to create a single file containing your panel data.
                Code:
                use data2000, clear
                append using data2001
                append using data2002
                ...

                Comment


                • #9
                  With regard to running out of room in post #7, use a single temporary for your appended data, and do
                  Code:
                  save `append', replace
                  after each append command. Similarly, use the same temporary file to save the most recent infile results, again doing
                  Code:
                  save `workers', replace
                  You could also consider dropping variables you will not need for your analysis, and using compress following the infile to ensure that each variable is stored as compactly as possible.

                  Finally, since you will probably want the appended data for future use, instead of using a temporary file, I'd write (and rewrite) it to the file you intend to save it to permanently. So the code below will probably solve all your problems, without need for compress, dropping variables, or using temporary files.

                  Code:
                  clear
                  save "mydirectory/paneldata", replace emptyok
                  infile using /mnt/Labourdata/workers01.dct
                  append using "mydirectory/paneldata"
                  save "mydirectory/paneldata", replace
                  clear
                  infile using /mnt/Labourdata/workers02.dct
                  append using "mydirectory/paneldata"
                  save "mydirectory/paneldata", replace
                  clear
                  infile using /mnt/Labourdata/workers03.dct
                  append using "mydirectory/paneldata"
                  save "mydirectory/paneldata", replace
                  Last edited by William Lisowski; 05 Jul 2016, 08:43.

                  Comment


                  • #10
                    Thank you, William Lisowski for your help.
                    I tried your last code as is definitely the most interesting for me because in the other way, even after doing compress and drop variables Stata was still telling me that the disk or file system was full.
                    But when I did
                    save "mydirectory/paneldata", replace emptyok Stata replied
                    (note: dataset contains 0 observations)
                    (note: file "mydirectory/paneldata.dta not found)
                    file mydirectory/paneldata.dta could not be opened
                    r(603)

                    Could you please clarify me what I misunderstood?

                    Comment


                    • #11
                      Please review the documentation for the save command given by help save and modify my sample code to suit your needs. I assume the specific error message came because, whatever your Stata working directory is, it does not contain a subdirectory called "mydata". You do not need a subdirectory, I was just illustrating what could be done. You do not need to name the output file "paneldata" either, you can name it whatever you want.

                      Let me add the advice in the following paragraph. I know you've been using Stata for a while now, from the dates of your previous Statalist questions. But if you're unfamiliar with the syntax of the save command, which is a very fundamental command, it suggests that you may some gaps in your knowledge of Stata that are making it harder for you than it need be. If I'm wrong, please accept my apologies for the misjudgement.

                      When I began using Stata in a serious way, I started by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through Stata's Help menu. The objective in doing this was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax.

                      Stata supplies exceptionally good documentation that amply repays the time spent studying it.
                      Last edited by William Lisowski; 05 Jul 2016, 13:57.

                      Comment


                      • #12
                        Thank you for your answer. I obviously used my own subdirectory and later realised that was a ridiculous distraction mistake when I wrote my subdirectory. Sorry for the misunderstood and loss of time
                        By the way, it worked perfectly, thank you.
                        Last edited by Marli Fernandes; 05 Jul 2016, 14:59.

                        Comment

                        Working...
                        X