Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Might be simple but it is not ! Data in plain txt format: Can we have it in Stata ?

    Hi
    There is free data from French website that has variables for date and other risk factors ( all are 5 variables).
    The problem with dataset is that they are in WORDformat and I want to use them in Stata in order to merge with other Stata files.
    I tell you what I tried to do:
    1- I tried to copy and paste then into excel so as later I can insheet then in stata and convert them into dta.
    2- Also tried to copy and paste them directly in stata.

    These trials did not work because the problem is that they way data are reported results in having all variables as in ONE COLUMN in excel or stata. Stata and excel treat all 5 variables as if they are one variable !!

    Can any one suggest how to deal with this each so data will be displayed as it should be: in 5 columns, each columns has obs for each variable ?
    I attach the dataset that in in plain txt format as they are exactly in the website. Again the data are free from Professor French website http://mba.tuck.dartmouth.edu/pages/...a_library.html
    ( You can see the data in the link I posted under Fama/French factors).

    Thanks for your help in advance.
    Attached Files

  • #2
    Ahmed,

    There are (at least) two possible ways to do this. Both of these assume that the introductory text and the Annual Factors at the end are deleted.
    1. Use insheet with space as a delimiter, However all of the instances of multiple consecutive spaces would need to be changed to a single space. (Can be done in an editor with search and replace.)
    2. Use infix. This requires also deleting the row with the variable names, but does not require deleting spaces as above.
    Code:
    // Option 1
    insheet using "F-F_Research_Data_Factors.txt", delimiter(" ") clear
    // Option 2
    infix Date 1-6 MktRF 8-14 SMB 15-22 HML 24-30 RF 32-38  using "F-F_Research_Data_Factors.txt", clear
    If you want the Annual Factors made into a data set you will have to copy it to a different text file and import it separately.

    Regards,
    Joe

    Comment


    • #3
      Joe
      A big Like to your second code. Only the second code works properly. That is awesome !
      I had a different dataset that are also free from French's website (attached here) . Only two variables data and Momentum. I edited your code as following:

      infix Date 1-6 MOM 8-14 using "monthly_momentum.txt", clear

      It worked properly !

      I just don;t know what are those number 1-6 8-14 15-22 ? i interpreted 1-6 as number of digits for fist variable, but if so why is it 8-14 for second variable and others ?

      Thanks again!

      Best Regards
      Ahmed
      Attached Files

      Comment


      • #4
        I also tried to create a variable for yr and mth from the Date variable you named, but it didn't work. Do you know how to edit the date format. Here is the code I used to use:

        format Date %d
        gen yr=year(Date)
        gen mth=month(Date)

        It did not work properly and created misleading years and month and even the Date format became incorrect .
        Any advice ?

        Thanks

        Comment


        • #5
          Ahmed,

          The numbers 1-6 8-14... are column numbers. The variable Date is in positions 1 through 6, the variable MktRF is in positions 8 through 14, etc.

          If you are going to convert your date variable to a Stata date variable or create variables for month and year, I would recommend importing the date variable as a string:

          Code:
          infix str Date 1-6 MktRF 8-14 SMB 15-22 HML 24-30 RF 32-38  using "F-F_Research_Data_Factors.txt", clear
          Since your dates are just year and month, you have to convert them to Stata monthly date format first:

          Code:
          gen mDate=monthly(substr(Date,1,4)+" "+substr(Date,5,2),"YM")   // space in between year and month required to use monthly()
          format mDate %tm
          However, if your ultimate goal is just to get the year and month as separate variables, you can skip the conversion to monthly dates and just do:

          Code:
          gen yr=substr(Date,1,4)
          gen mth=substr(Date,5,2)
          Regards,
          Joe
          Last edited by Joe Canner; 12 May 2014, 14:03. Reason: Mistake in use of monthly() and substr()

          Comment


          • #6
            what version of Stata are you using? I ask because "%d" is not a valid format in the current, or recent, versions; see -h datetime-

            Comment


            • #7
              I use version 11

              Comment


              • #8
                Ahmed did you figure the date issue out if not I can give you my FF-data for the factors, I first converted the text file into an excel file and then imported it into stata, fot hte date I played around can't really remember what I did exactly, but I have the date as ym, so 1925m12, I think that is what you are looking for or?

                Comment


                • #9
                  I don't have access to version 11 manuals or help files - which is why the FAQ asks you to state this upfront; did you try -h datetime-?

                  Comment


                  • #10
                    Thomas,
                    Use Joe code it is great ! I think I managed to have them all now in dta. with dates !
                    Thanks

                    Comment

                    Working...
                    X