Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Renaming a Variable while merging from sequential files

    Hi,
    I have data from 136 separate months that have the same variable names in each file. I'm trying to merge the monthly data with a a different data set. I would like to rename the variable of interest, mean, to ndviYYMM each time a file is merged (YYMM will be digits identifying the year and month). The code I have, which successfully merges but overwrite ndviYYMM for each month is:

    cd "C:\Users\csutera\Desktop\N. Burger MENA Climate Change\weather data\srndvi"
    local i = 1
    fs *.csv
    foreach x in `r(files)' {
    insheet using `x', clear
    save `x', replace
    use "C:\Users\csutera\Desktop\N. Burger MENA Climate Change\stata data\lab algeria.dta", clear
    merge m:1 orig_fid using `x'
    drop count_ area min_ max_ range std sum_ variety minority majority median
    rename mean ndvi`i'
    local ++i
    }

    Because I could not figure out how to attach digits YYMM i just tried to attach numbers from 1-136. A YYMM suffix is preferred. I also tried putting the 'local = 1' line inside the loop and separately tried using the mmerge function posted here:

    local i = 1
    fs *.csv
    foreach x in `r(files)' {
    insheet using `x', clear
    save `x', replace
    use "C:\Users\csutera\Desktop\N. Burger MENA Climate Change\stata data\lab algeria.dta", clear
    mmerge orig_fid using `x', type(n:1) ukeep(mean) urename(mean ndvi`i')
    local ++i
    }

    I also tried using the line ' rename mean ndvi(###), addnumber' but it also overwrites in each iteration of the loop.
    Can you help me attach the desired suffix?

    Thanks for reading.

  • #2
    Well, after you create your ndvi`i' variable, you go back to the top of the loop and overwrite your data with a new file using -insheet-, so, of course, the ndvi`i' variable is lost. What you need to do is build up your merged file one step at a time in a tempfile:

    Code:
    cd "C:\Users\csutera\Desktop\N. Burger MENA Climate Change\weather data\srndvi"
     tempfile building
    copy "C:\Users\csutera\Desktop\N. Burger MENA Climate Change\stata data\lab algeria.dta" `building'
     
     local i = 1
     fs *.csv
     foreach x in `r(files)' {
         insheet using `x', clear
         merge 1:m orig_fid using `building' // NOTE CHANGE FROM ORIGINAL -merge-
         // TO BE SAFE YOU MIGHT WANT TO ADD AN ASSERT OPTION TO THE MERGE,
         // OR OTHERWISE VERIFY THE MERGE WORKED AS EXPECTED
         drop count_ area min_ max_ range std sum_ variety minority majority median _merge
         rename mean ndvi`i'
         save "`building'", replace // THIS IS THE KEY STEP
         local ++i
     }
    Now, when that's done you can -use `building'- and it will contain all of the different ndvi* variables.

    As for using YYMM values for the ndvi* variables instead of number 1 through 136, I can't help you because I can't see where the values of YYMM you want to use are supposed to come from. Are they part of the filenames? Or found in the data files somewhere? Or just sequential starting from some particular month/year that you didn't tell us about?

    Comment


    • #3
      The first thing that occurs to me is that you should rename your variable after importing and before saving. Something like:

      Code:
      cd "C:\Users\csutera\Desktop\N. Burger MENA Climate Change\weather data\srndvi"
      local i = 1
      fs *.csv
      foreach x in `r(files)' {
      insheet using `x', clear
      rename mean ndvi`i'
      save `x', replace
      use "C:\Users\csutera\Desktop\N. Burger MENA Climate Change\stata data\lab algeria.dta", clear
      merge m:1 orig_fid using `x'
      drop count_ area min_ max_ range std sum_ variety minority majority median
      local ++i
      }
      This should result in a different variable name for each month. You don't say where the YYMM suffix is going to come from but this should work more or less the same way.

      That said, this code is not very efficient, as merging each time you import a monthly file is redundant. You should probably append all of the monthly data sets together and merge once at the end:

      Code:
      cd "C:\Users\csutera\Desktop\N. Burger MENA Climate Change\weather data\srndvi"
      local i = 1
      fs *.csv
      foreach x in `r(files)' {
      insheet using `x', clear
      save `x'   // Optional
      rename mean ndvi`i'
      capture append using allmonthly
      save allmonthly, replace
      local ++i
      }
      use "C:\Users\csutera\Desktop\N. Burger MENA Climate Change\stata data\lab algeria.dta", clear
      merge m:1 orig_fid using allmonthly
      drop count_ area min_ max_ range std sum_ variety minority majority median
      Finally, you should consider changing your user name to your real first and last name as per the Stata FAQ.

      Comment


      • #4
        Clyde and Joe helpfully rewrote your syntax using the modern merge command and avoiding the outdated user-written mmerge command (not function).

        However, your use of the user-written command fs should still be explained. Please see FAQ Advice on posing questions.

        The biggest problem remains that no one can see where the dates to be used in naming come from.

        Comment


        • #5
          Thanks, the YYMM identifiers are in the file name but the file names are combined with the country abbreviation and various wildcards. I tried renaming the variables using the file name but the wildcards (which, according to stata, are the .csv extensions) are preventing me from doing this. The four digit data starts at 0002 and ends at 1106; is there a way to assign this data format based upon the sequential numbering? I think that a macro with this information could be assigned using the rename function

          Clyde,
          would you suggest using the mmerge command with "type(1:n)" option, rather than the merge function?
          Your code worked well.

          Joe,
          Your preferred code created an observation for each month of each year. Meaning, there are 20 orig_fid identifiers and 136 months which ended up with 2720 observations.
          I will change my username, the username guideline list was the first thing I noticed after creating my account.

          Thank you both for your help.

          Comment


          • #6
            I don't really understand your first paragraph, but "0002" to "1106" are in principle just 4 character substrings that can be copied as such.

            merge
            and rename are commands, not functions.

            Yes, this is only terminology. On the other hand, realising what is a command and what is a function can make your Stata life easier, as you know where to look for details in the documentation.

            Yes, other programs would use the term functions for similar beasts, but this is Stata and Stata terminology applies.

            Comment


            • #7
              NIck, It appears we posted at the same time I will use the correct terminology. The first paragraph should say, "date starts at . . " not "data". The file for Algeria, September 2009 is named "ndvi200909_alg.csv".

              Comment


              • #8
                I don't recommend the mmerge function (unless you are using an old version of Stata--which you should have mentioned in your post if that were the case). The advantages of -mmerge- over old versions of Stata's official -merge- have been incorporated into the modern -merge- command. So I would stick with Stata's native -merge- command.

                If you are sure that your list of filenames in `r(files)' will actually appear in the correct monthly order, the code below will work. But I'd be a little skeptical about that. Even if it works today on your current setup, I'm not sure it would be guaranteed to work again next time. So, if you're really confident about that, go ahead and use this code. But, honestly, I think it would be safer to try to extract the dates from the filenames. You indicate that it looks complicated, but if you gave us a sampling of the names of the files someone might be able to help you figure that out. Anyway, conditional on that assumption:

                Code:
                // CODE WILL BE BROKEN IF FILES ARE NOT READ IN IN CORRECT MONTHLY ORDER
                // USE AT YOUR OWN RISK!!
                cd "C:\Users\csutera\Desktop\N. Burger MENA Climate Change\weather data\srndvi"
                 tempfile building
                copy "C:\Users\csutera\Desktop\N. Burger MENA Climate Change\stata data\lab algeria.dta" `building'
                 
                local i = monthly("Feb2000", "MY") // F.Y.I., THE RESULT IS 481
                
                 fs *.csv
                 foreach x in `r(files)' {
                     insheet using `x', clear
                     merge 1:m orig_fid using `building' // NOTE CHANGE FROM ORIGINAL -merge-
                     // TO BE SAFE YOU MIGHT WANT TO ADD AN ASSERT OPTION TO THE MERGE,
                     // OR OTHERWISE VERIFY THE MERGE WORKED AS EXPECTED
                     drop count_ area min_ max_ range std sum_ variety minority majority median _merge
                
                     local yymm: display %tmYYNN =`i'  // GET CURRENT MONTH IN YYMM FORMAT
                     rename mean ndvi`yymm'
                
                     save "`building'", replace // THIS IS THE KEY STEP
                     local ++i
                 }
                Note that the uppercase N's in the local yymm:... line are not typographical errors -- see help datetime_display_formats for more detail about formatting dates.

                Comment


                • #9
                  Although your post of 15:32 is timestamped earlier than mine of 15:36, I did not see it prior to making my post. If ndvi200909_alg.csv is typical of your filenames, it seems that your YYMM come from the 7th through 11th characters of the filename. If that pattern prevails, then the safe way to proceed is:
                  Code:
                  cd "C:\Users\csutera\Desktop\N. Burger MENA Climate Change\weather data\srndvi"
                  tempfile building
                  copy "C:\Users\csutera\Desktop\N. Burger MENA Climate Change\stata data\lab algeria.dta" `building'
                  
                  fs *.csv
                  foreach x in `r(files)' {  // NOTE NO `i' NEEDED IN THIS LOOP
                      insheet using `x', clear
                      merge 1:m orig_fid using `building'
                      // TO BE SAFE YOU MIGHT WANT TO ADD AN ASSERT OPTION TO THE MERGE,
                      // OR OTHERWISE VERIFY THE MERGE WORKED AS EXPECTED
                      drop count_ area min_ max_ range std sum_ variety minority majority median _merge
                  
                      local yymm = substr("`x'", 7, 4)
                      rename mean ndvi`yymm'
                  
                      save "`building'", replace
                  }

                  Comment


                  • #10
                    Thank you,
                    I am using Stata 13.
                    I will learn to post correctly for my next post. This has been very helpful.

                    Lastly, the value labels disappear from my master file. I used the nolabel option for merge but it did not save the value labels. For now, I have save a do-file with labels and will add them after the merge loop. Is there a way to save the labels from the original file?

                    Comment


                    • #11
                      While that master file is in memory, run the -label save- command, saving all the labels to a tempfile. (See the online help if you are not familiar with -label save-). Then after all the merges are complete, -run- or -do- the tempfile to which you saved the labels, and they will all come back. You may also need to give some commands to apply those value labels to their respective variables.

                      Since you mentioned the subject of posting correctly, in order to maintain a sense of professionalism, the ground rules of the forum request that people use their real first and last names when posting. You can get your user name changed by clicking on "Contact Us" and requesting the modification.

                      Comment


                      • #12
                        I have already contacted them. Thanks for the help.

                        Comment

                        Working...
                        X