Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keeping date format during reshape

    Hello everyone,

    I have a dataset that includes a date variable in the following format: M/D/YYYY.
    With this dataset I am performing the following reshape commands:

    Code:
    renvars, pref(Category)
    ren CategoryDate Date
    reshape long Category, i(Date) j(id) string
    reshape wide Category, i(id) j(Date)
    renvars Category*, pred(8)
    In the second reshape command, unfortunately Stata changes the format of the date variable, so that I end up with variables named CategoryXXXXX, where XXXXX is a random combination of 5 numbers. What I would like to have is variables named CategoryMDYYYY.

    Any ideas on where I am making a mistake or how I could specify the format that I want in my commands?

    Thank you very much!

  • #2
    Please give us a data example using dataex (SSC) (FAQ Advice #12 applies).

    Comment


    • #3
      If you try
      Code:
      gen dateu = Date
      list Date dateu
      before reshaping, you will see that the five numbers are not "random" but instead are the unformatted values of the Date variable.

      When constructing variable names, reshape wide does not take into consideration the formatting of the original variable. Indeed, your expectation is that it would use the digits and omit the slashes, but then it seems to me that January 21 2016 and December 1 2016 would both be represented 1212016, unless you meant MMDDYYYY with leading zeros.

      If you want the new variables to include the date as part of their name, you will need to create a string variable containing Date formatted as you desire, and then use that string variable rather than Date in the j() option, and include the string option, on your reshape wide command. With sample data we could perhaps demonstrate this.

      If the distinction I draw between the formatted and unformatted values of Date is unfamiliar, then the following advice is relevant. Before working with dates and times, any Stata user should thoroughly review the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF. After that, the help datetime documentation will usually be enough to point the way. All Stata manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

      Comment


      • #4
        First of all, sorry for the delayed reply and thank you very much for your feedback already!

        Please find a data example below:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int Date double(CategoryCrime CategoryDefence CategoryEconomic)
        16565 .2263 .0167 .2701
        17050 .2252 .0194 .2396
        17266 .2321 .0182    .2
        17431 .2269 .0178 .1687
        17616 .3731 .0308 .3953
        17806 .1668 .0142 .3669
        18049 .1581 .0127 .4179
        18387 .3252 .0283  .797
        19125 .2266     0 .6982
        19300 .2241     0 .7339
        19488 .1219     0 .3281
        19664 .1124     0 .3268
        19797 .1291     0 .2925
        19874   .12     0   .29
        20035  .093     0 .2433
        20147 .0946     0 .2187
        20224 .0909     0 .2108
        20399 .0997     0 .1919
        20595 .1042     0 .1927
        20761 .1073     0 .1905
        end
        format %tdnn/dd/CCYY Date

        Comment


        • #5
          I have now been able to solve the problem by creating a string variable of the date, as you proposed. Thank you very much!

          Comment


          • #6
            I am still puzzled here. Why you think the final data layout, with a variable for each different date, is desirable?

            Note; renvars is from the Stata Journal and now mostly superseded by the revised rename.

            Comment


            • #7
              The data are answers to the Eurobarometer question: "Which do you think are the two most important issues facing (OUR COUNTRY) at the moment?" as percentages. I need to rank these answers to see if there are patterns in which categories people deem to be most important over time. I was unable to rank the categories before reshaping the data. Probably some kind of rookie mistake I was unable to solve.

              Comment


              • #8
                Maybe I can just add this question here, since I am struggling a bit right now. I have the data you can see in the example below, which are percentages answered to a categorical question. For example, in the May 2005 survey, 22.63% of people mentioned crime as one of the two most important issues facing their country. In total there are 21 categories such as crime and defence.

                What I ultimately need is a table that ranks, for each survey wave, the categories in descending order, from being mentioned most to being mentioned least. I have failed to achieve this using all types of rank, sort and list commands.

                Any help is highly appreciated!

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str9 Date double(Crime Defence Economic)
                "09may2005" .2263 .0167 .2701
                "06sep2006" .2252 .0194 .2396
                "10apr2007" .2321 .0182    .2
                "22sep2007" .2269 .0178 .1687
                "25mar2008" .3731 .0308 .3953
                "01oct2008" .1668 .0142 .3669
                "01jun2009" .1581 .0127 .4179
                "05may2010" .3252 .0283  .797
                "12may2012" .2266     0 .6982
                "03nov2012" .2241     0 .7339
                "10may2013" .1219     0 .3281
                "02nov2013" .1124     0 .3268
                "15mar2014" .1291     0 .2925
                "31may2014"   .12     0   .29
                "08nov2014"  .093     0 .2433
                "28feb2015" .0946     0 .2187
                "16may2015" .0909     0 .2108
                "07nov2015" .0997     0 .1919
                "21may2016" .1042     0 .1927
                "03nov2016" .1073     0 .1905
                end

                Comment


                • #9
                  Here is code that accomplishes what you need. Run this sample, using just 5 waves to limit the amount of output, and review the output carefully to understand the approach taken. In general, the approach you took, which may have solved your problem, would in general increase the difficulty of analysis in Stata.
                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str9 Date double(Crime Defence Economic)
                  "09may2005" .2263 .0167 .2701
                  "06sep2006" .2252 .0194 .2396
                  "10apr2007" .2321 .0182    .2
                  "22sep2007" .2269 .0178 .1687
                  "25mar2008" .3731 .0308 .3953
                  end
                  generate wave = daily(Date,"DMY")
                  format wave %td
                  list, clean
                  drop Date
                  rename (Crime Defence Economic) (rate=)
                  list, clean 
                  reshape long rate, i(wave) j(issue) string
                  list, clean
                  bysort wave (rate): gen rank = 4-_n
                  list, clean
                  reshape wide issue rate, i(wave) j(rank)
                  list, clean
                  And here is the output from the final command.
                  Code:
                              wave     issue1   rate1     issue2   rate2    issue3   rate3  
                    1.   09may2005   Economic   .2701      Crime   .2263   Defence   .0167  
                    2.   06sep2006   Economic   .2396      Crime   .2252   Defence   .0194  
                    3.   10apr2007      Crime   .2321   Economic      .2   Defence   .0182  
                    4.   22sep2007      Crime   .2269   Economic   .1687   Defence   .0178  
                    5.   25mar2008   Economic   .3953      Crime   .3731   Defence   .0308

                  Comment


                  • #10
                    Using your original data from post #1 produces a much more direct example.
                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input int Date double(CategoryCrime CategoryDefence CategoryEconomic)
                    16565 .2263 .0167 .2701
                    17050 .2252 .0194 .2396
                    17266 .2321 .0182    .2
                    17431 .2269 .0178 .1687
                    end
                    format %tdnn/dd/CCYY Date
                    reshape long Category, i(Date) j(issue) string
                    rename Category rate
                    list, clean
                    bysort Date (rate): gen rank = 4-_n
                    list, clean
                    reshape wide issue rate, i(Date) j(rank)
                    list, clean
                    Code:
                                Date     issue1   rate1     issue2   rate2    issue3   rate3  
                      1.    5/9/2005   Economic   .2701      Crime   .2263   Defence   .0167  
                      2.    9/6/2006   Economic   .2396      Crime   .2252   Defence   .0194  
                      3.   4/10/2007      Crime   .2321   Economic      .2   Defence   .0182  
                      4.   9/22/2007      Crime   .2269   Economic   .1687   Defence   .0178

                    Comment


                    • #11
                      Thank you so much!!!

                      Comment

                      Working...
                      X