Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • extracting data from numbers

    Dear Statalists,
    I have an excel file with dates like 29.02.1980, 30.03.1980 and etc... I have appended Y in front of every date in the excel file and then imported it to the STATA. I had to appended Y in front of dates since i have to reshape the data (from wide to long) in the STATA.
    I have used codes like: gen year = year(date) - 60 // Getting YEAR from the date
    gen month = month(date) // Getting MONTH from the date
    With year everything worked fine but with month i am getting wrong values, what can be possible reason for that and how can i fix it?

  • #2
    I don't understand your description of the data. In particular, I don't see why the presence or absence of "Y" in front of the numbers you show would affect the ability to -reshape- the data. It is also unclear to me how these are being imported into Stata. Specifically, with Y in front of them, I would imagine they are imported as strings, but if that is the case you cannot apply the year() and month() functions to them, as these require Stata internal format numeric variables.

    So, rather than trying to imagine what you have, and probably getting it wrong, I ask that you post back and use the -dataex- command to show an actual example of your imported Stata data. (Never mind what it looks like in Excel.) If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.


    Comment


    • #3
      Dear Clyde,
      Thank you for the reply. Below you will find the part of the data set that i have imported in the STATA(it is just too big to show it through the -dataex-, I have 40 year monthly price data for 40 countries). Moreover I want to include the codes that I am using, maybe somewhere there i do a mistake. Please help me out because I cannot find an issue and its been a long time since i am looking for it.
      drop in 1 // Drop the first row of the data as it contains the duplicate values of the column names

      gen vars = "monthly_prc"

      reshape long Y, i(DSCD vars) j(date) string // Currently I have transposed Data Stream file from which i have to create Panel Data


      reshape wide Y, i(DSCD date) j(vars) string

      destring date, replace

      gen year = year(date) - 60 // Getting YEAR from the date
      gen month = month(date) // Getting MONTH from the date
      save filestata, replace



      ----------------------- copy starting from the next line -----------------------
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str9 Country str6 DSCD str12(Y2291980 Y3311980 Y4301980) str11 Y5301980 str12 Y6301980
      "Country"   "DSCD"   " 2/29/1980"   " 3/31/1980"   " 4/30/1980"   " 5/30/1980"  " 6/30/1980"  
      "Argentina" "322833" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "131879" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "298819" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "322656" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "320148" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "677693" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "320053" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "134645" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "321948" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "874140" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "149482" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "321768" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "32366C" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "696138" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "50857N" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "51896R" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "322652" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "9102PQ" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "9140NR" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "255169" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "151203" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "321774" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "322664" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982655" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "320158" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982694" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "53497W" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982902" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "286752" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "321778" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "134495" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "50472L" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "87852H" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "87852F" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "51391N" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "134627" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982933" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982935" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "321795" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "946443" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "677414" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "286047" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "322832" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "51217X" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "27099C" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "7555W7" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "7738N8" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "320096" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "992119" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "322660" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "51896T" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "992143" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "75649H" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "322661" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "884387" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "92638J" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "992156" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "9211GR" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "321946" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "2639DZ" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "134841" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "269413" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "151196" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "321789" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "9107ZN" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "992183" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "320362" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "87733X" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "26085H" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "321947" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "26953T" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982904" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "992126" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "891408" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982696" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "992128" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "72931L" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "50923T" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "265532" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982683" "NA"           "52000033.16"  "45495747.89"  "45372798.59" "39364887.22" 
      "Argentina" "9107ZM" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "27743U" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "321767" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982616" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982620" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "884385" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982625" "129281240.21" "142857239.19" "162886001.03" "116198599.9" "107849012.07"
      "Argentina" "982628" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982627" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982629" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "759885" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982990" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "286699" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "321769" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "321770" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "50290F" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982643" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "321773" "NA"           "NA"           "NA"           "NA"          "NA"          
      "Argentina" "982651" "NA"           "NA"           "NA"           "NA"          "NA"          
      end

      Comment


      • #4
        Your data example is more than enough to find your error.

        Consider this fragment -- in which I stop before the destring.

        Code:
        clear
        input str9 Country str6 DSCD str12(Y2291980 Y3311980 Y4301980) str11 Y5301980 str12 Y6301980
        "Country"   "DSCD"   " 2/29/1980"   " 3/31/1980"   " 4/30/1980"   " 5/30/1980"  " 6/30/1980"  
        "Argentina" "322833" "NA"           "NA"           "NA"           "NA"          "NA"          
        end
        
        drop in 1 // Drop the first row of the data as it contains the duplicate values of the column names
        
        gen vars = "monthly_prc"
        
        reshape long Y, i(DSCD vars) j(date) string // Currently I have transposed Data Stream file from which i have to create Panel Data
        
        reshape wide Y, i(DSCD date) j(vars) string
        
        list
        
             +-----------------------------------------+
             |   DSCD      date   Ymonth~c     Country |
             |-----------------------------------------|
          1. | 322833   2291980         NA   Argentina |
          2. | 322833   3311980         NA   Argentina |
          3. | 322833   4301980         NA   Argentina |
          4. | 322833   5301980         NA   Argentina |
          5. | 322833   6301980         NA   Argentina |
             +-----------------------------------------+
        date at this point is a string variable containing information on daily dates, but they are at or before the last day of each month, so monthly dates really as you realise.

        Using destring at this point is quite wrong, as it just produces large integers which 2291980 which are not what you want. If you push the results through year() and month() you get nonsense in the form of missing, as 2291980 is not for example a Stata daily date 2 million and so days after 1 January 1960.

        To understand Stata's date handling, you have to read the documentation on Stata's date handling.

        This will do what you want:

        Code:
        . gen year = real(substr(date, -4, 4))
        
        . gen month = real(subinstr(date, substr(date, -6, 6), "", .))
        
        . gen mdate = ym(year, month)
        
        . format mdate %tm
        
        . list date year month mdate
        
             +---------------------------------+
             |    date   year   month    mdate |
             |---------------------------------|
          1. | 2291980   1980       2   1980m2 |
          2. | 3311980   1980       3   1980m3 |
          3. | 4301980   1980       4   1980m4 |
          4. | 5301980   1980       5   1980m5 |
          5. | 6301980   1980       6   1980m6 |
             +---------------------------------+
        It should be clear that the year is given by the last four characters of your date variable. The month is evidently the first character for months 1 to 9 but presumably the first two characters for months 10 to 12.

        For the record I note that I failed to get daily()or date() to understand strings like 2291980 without subterfuge that I didn't want to explain here. I may be missing something. Getting the month and year out of your string dates by string manipulation seems a bit simpler.
        Last edited by Nick Cox; 22 Jul 2019, 07:43.

        Comment


        • #5
          To understand Stata's date handling, you have to read the documentation on Stata's date handling.
          Adding further details to Nick's advice, Stata's "date and time" variables are complicated and there is a lot to learn. If you have not already read the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF, do so now. If you have, it's time for a refresher. After that, the help datetime documentation will usually be enough to point the way. You can't remember everything; even the most experienced users end up referring to the help datetime documentation or back to the manual for details. But at least you will get a good understanding of the basics and the underlying principles. An investment of time that will be amply repaid.

          All Stata manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

          Comment


          • #6
            Note that the assertion in #1 that year() as used there worked fine is implausible. For example, even with the smallest date on offer


            Code:
            . di year(2291980) - 60
            8175
            a year of 8175 is returned, which doesn't look fine to me.


            Comment

            Working...
            X