Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Import year quarter from excel to Stata

    Dear all:
    I have data A stored in excel and need import A to stata , then merge A with dataset B that is already stored in stata.
    data A in excel has two time variables : year (e.g. 2012) and quarter (Q1, Q2, …)
    ( I share a few observations for A , B as illustration. A is attached, B is below)

    data B in stata has a datadate variable that is long and format %td
    I generated two variables year and quarter from datadate (e.g., 2012, 4), both A and B has a firm identifier I can merge on.

    Code:
    clear
    input str6 firmid long datadate float(cyear quarter)
    "008530" 18352 2010 1
    "008530" 18443 2010 2
    end
    format %td datadate
    My questions:
    1. how should I format the year and quarter in excel before I use import function in stata, so that I can merge A and B by firmID, year, quarter.
    2. Can I merge by firmID, year, quarter., or do I need to generate a timeid based on year and quarter , then merge by firmID, timeid
    Thank you

    Rochelle
    Attached Files

  • #2
    how should I format the year and quarter in excel before I use import function in stata, so that I can merge A and B by firmID, year, quarter.
    You shouldn't do anything to the data in Excel. Excel leaves no audit trail of what you do, so you must leave the Excel data as it is and do all your data management in Stata (or other software that leaves an audit trail.)

    As I think you know, attachments, particularly of spreadsheets, are deprecated on this Forum. Many of us will not download a file from people we do not know, for reasons of computer safety. So I'm relying here on your description of data set A. What I would do is something like this:
    Code:
    import excel using "data A.xlsx", clear firstrow
    destring quarter, ignore("Q") replace
    // PERHAPS OTHER DATA CLEANING AS NEEDED
    isid firmid year quarter, sort
    save data_A, replace
    
    use "data B.dta", clear
    merge m:1 firmid year quarter using data_A
    Now, as for whether it makes sense to make a single variable with a quarterly date in data set A, that depends on what you will be doing with the data and whether it will be convenient to have. You certainly don't need to do that in order to merge the two data sets together. If you will find such a variable handy to use later on, however, then by all means go ahead and do that. -gen timeid = yq(year, quarter)- followed by -format timeid %tq- will do that for you.

    Comment


    • #3
      Rochelle: Please see #6 in your previous thread https://www.statalist.org/forums/for...-in-panel-data

      Comment


      • #4
        Clyde Schechter Thank you always for being so helpful. one issue I ran into is data set B variable quarter.

        In my earlier post https://www.statalist.org/forums/for...-in-panel-data
        you also helped me.

        the data below is an extended version of Dataset B
        Code:
         
         clear input str6 firmid long datadate str6 datacqtr double invtq float(cyear quarter) "002403" 18352 "2010Q1"     1284 2010 200 "002403" 18443 "2010Q2"     1265 2010 201 "002403" 18535 "2010Q3"     1369 2010 202 "002403" 18627 "2010Q4"     1204 2010 203 "002403" 18717 "2011Q1"     1322 2011 204 "002403" 18808 "2011Q2"     1492 2011 205 "002403" 18900 "2011Q3"     1357 2011 206 "002403" 18992 "2011Q4"     1384 2011 207
        as shown above, the quarter is stored on a daily basis (I assume) , so this will not match up with my dataset A where quarter takes on only 1~4.

        What kind of adjustment of quarter I need to make to help me merge A and B .

        Thank you!

        Comment


        • #5
          Well, in the example data you show, both cyear and quarter are actually the year and year_and_quarter of the variable datadate. And quarter is also the Stata internal format quarterly date representation of the variable datacqtr (which is a string variable). So that gives us lots of options to match with B. On the assumption that the relationships among all these variables that I have observed in the example data hold throughout the data set, I think the simplest way is to do this:
          Code:
          rename quarter qdate
          gen quarter = quarter(datadate)
          This will preserve the information in the original quarter variable under the new name qdate and create a new variable, quarter, that will match to its namesake in the other data set.

          Before doing that, however, I suggest you verify that the relationships among all those variables in the example data really do hold in the full data set:
          Code:
          assert quarter== qofd(datadate)
          assert cyear == year(datadate)
          assert quarter == quarterly(datacqtr, "YQ")
          If those relationships do hold, then the above three commands will produce no output. If any of them fail, then you will get an error message, and in that case we will need to clarify which of these variables contains the correct information for matching up with the other data set.

          Comment


          • #6
            Nick Cox Yes I have contacted forum administrator to keep Rochelle Zhang as the only login name.

            Comment


            • #7
              Thanks ! @ Clyde Schechter your post #5 is correct about dataset B: quarter is in Stata internal format quarterly date. I shall clarify My postn #4 above,

              after using import function to bring excel data to stata, it looks like thisI stated that the excel file, quarter takes the value of 1-4,

              dataset A
              Code:
              clear
              input str6 firmid int Year str2 Quarter double Avgdegree
              "008530" 2012 "Q1" 1.239
              "008530" 2012 "Q2" 1.306
              "008530" 2012 "Q3" 1.372
              "008530" 2012 "Q4" 1.066
              "008530" 2013 "Q1" 1.405
              "008530" 2013 "Q2" 1.258
              "008530" 2013 "Q3" 1.284
              you can see Quarter is not in Stata internal format quarterly date. so I can't merge A and B dataset. this is why I asked initally what adjustment to make to dataset A in excel in #1.

              Comment


              • #8
                Code:
                destring Quarter, ignore("Q") gen(quarter)
                rename Year cyear
                should do it.

                Comment


                • #9

                  Code:
                  gen quarter = real(substr(Quarter, 2, 1))
                  is another way to do it.

                  Comment


                  • #10
                    Thanks Nick Cox and Clyde Schechter . Both of your suggestions (#8, 9) work, i.e., make a quarter variable taking values of 1,2,3,4. this is for my data A.

                    However, as noted in my #4, quarter in dataset B, and I already use

                    Code:
                     
                     gen quarter = quarter(datadate)
                    so the quarter variable in dataset B and quarter variable in dataset A are still stored in different ways in stata after importing A. My apology if I confused you earlier.

                    Comment


                    • #11
                      At this point, I cannot follow what you are saying. If in one data set you have -gen quarter = quarter(datadate)- and in the other you have used the code in either #8 or #9, in both data sets the variable quarter will take on the values 1, 2, 3, and 4 and you are good to go for -merge-ing on that variable. So if you are still unable to -merge- the data sets, then something else is wrong.

                      So I suggest you post -dataex- examples from both of your data sets and we can look at it with fresh eyes. Also show the exact -merge- command you are using and any error messages it gives you.

                      Comment


                      • #12
                        Clyde Schechter please see below


                        dataset A (showing just a few obs)
                        Code:
                        * Example generated by -dataex-. For more info, type help dataex
                        clear
                        input long firmid int cyear byte quarter double Avgdegree
                        2403 2012 2 1.475
                        2403 2012 3 1.396
                        2403 2012 1 1.229
                        2403 2012 4 1.089
                        2403 2013 2 1.209
                        2403 2013 1 1.537
                        2403 2013 3 1.285
                        2403 2013 4  .978
                        2403 2014 4   1.2
                        2403 2014 3 1.459
                        2403 2014 1 1.307
                        2403 2014 2  1.46
                        2403 2015 2 1.663
                        end
                        dataset B
                        [CODE]
                        Code:
                        * Example generated by -dataex-. For more info, type help dataex
                        clear
                        input long(firmid datadate) float(cyear quarter)
                        2403 18352 2010 200
                        2403 18443 2010 201
                        2403 18535 2010 202
                        2403 18627 2010 203
                        2403 18717 2011 204
                        2403 18808 2011 205
                        2403 18900 2011 206
                        2403 18992 2011 207
                        2403 19083 2012 208
                        2403 19174 2012 209
                        2403 19266 2012 210
                        2403 19358 2012 211
                        2403 19448 2013 212
                        2403 19539 2013 213
                        2403 19631 2013 214
                        2403 19723 2013 215
                        2403 19813 2014 216
                        2403 19904 2014 217
                        2403 19996 2014 218
                        2403 20088 2014 219
                        2403 20178 2015 220
                        2403 20269 2015 221
                        2403 20361 2015 222
                        end

                        Comment


                        • #13
                          There are many ways one could harmonize the date representations in these two data sets to support -merge-ing them. The main obstacle is that the same name (quarter) used to represent different things in the two. My personal preference is to rely on the version of quarter shown in dataset B because it contains both the year and quarter, so it has the most information. Like this:
                          Code:
                          use dataset_A, clear
                          rename quarter Quarter
                          gen quarter = yq(cyear, Quarter)
                          format quarter %tq
                          
                          merge 1:1 firmid quarter using dataset_B
                          Moreover, to avoid further confusion later, I recommend -rename-ing the variable quarter to something else, like qdate, in the final combined data set.

                          Comment


                          • #14
                            As always, a meta-convention is to choose your own conventions and be as consistent as possible.

                            I would always use names like mdate and qdate for variables holding monthly and quarterly dates and names like month and quarter only for months 1-12 and quarters 1-4 respectively.

                            But year works fine for me for years: ydate would be consistent with what I've just said, but doesn't appeal.

                            If date to you could only mean a daily date, then go with that, but not otherwise. I think I would at least sometimes use ddate if there were other kinds of date variables in the dataset.

                            Comment


                            • #15
                              Stata dates/time is challenging topic to me. Thank you both (Clyde and Nick) for the tremendous help !!!

                              Comment

                              Working...
                              X