Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Setting up Data

    Hello everyone,
    I am currently facing some troubles setting up my data.
    I have the format like this at the moment:
    Company, Variable(containts ESG, Marketcap f.i.), Currency and then daily dates so, 21 jun2001 22 jun2001

    However, I would like to have it likes this
    Date Company ESG Marketcap Currency
    21 jun2001 Adecco 60 20,000 $
    22 jun 2002

    Note that nothing is named as variables either as stata is not able to handle this amount of variables as I am using 20 years of daily data.

    I also have the dataset in a different format, like this:

    Variable Market Value Governance Pilar score Social Pilar Score
    Company ZIJIN MINING GROUPA ZIJIN MINING GROUPA ZIJIN MINING GROUPA
    Currency CH CH CH
    21-06
    22-06
    etc.

    Thanks in advance!


  • #2
    Note that nothing is named as variables either as stata is not able to handle this amount of variables as I am using 20 years of daily data.
    ^ Is this your primary problem? What version do you use? 365 * 20 is only about 7300 and if you use SE or above it should be fine. If it still does not import then you may have to resolve that first, either by trimming in down in another software, or I guess you may be able to import to Stata with variable range selection.

    Without the variable names, it may be too early to talk about reshaping. However, generally, your task can possibly be done with -reshape long-.

    When you have a chance, please also read the FAQ (http://www.statalist.org/forums/help) on how to use -dataex- to post some data example, so that we can test the codes we suggested.

    Comment


    • #3
      Well, this is the first problem, I think it has something to do with the fact that stata cannot name a variable as 22-01-2003 f.i.
      And indeed you were right I can change the max variable setting to 8000 so that is not the problem.
      I do know some wide-long functions so I might be able to fix that.

      Comment


      • #4
        With sample data of the sort I described in your previous topic posted in the Mata forum at

        https://www.statalist.org/forums/for...etting-up-data

        we could perhaps demonstrate how to deal with the problem with the variable names, or perhaps if you search Statalist you can find some of the previous discussions on this problem -- your problem is not by any means something previously unheard of. But even the best descriptions of data are no substitute for an actual example of the data. There are many ways your data might be organized that are consistent with your description, and each would require a somewhat different approach. In order to get a helpful response, you need to show some example data.

        Import your data and allow the variable names to default. Then use the dataex command to display a (very small) example of each dataset. For example, on the first dataset, limit it to the Company, Variable, Currency, and just a few dates (that is, for example, var1-var10 in the Stata dataset). Then present the dataex output in a reply on this topic.

        Comment


        • #5
          Click image for larger version

Name:	Thesis.PNG
Views:	1
Size:	20.7 KB
ID:	1616113

          Like this

          Comment


          • #6
            In post 2 it was suggested that you read the FAQ for advice on using dataex, and in that FAQ you will also find advice about why screen shots are not as useful as you might think. Hint: Stata has the import excel command to turn an Excel worksheet into a Stata dataset, but it has yet to develop an import screenshot command to do the same for a screenshot.

            I invented some data similar to yours and post it below in a usable form, following the instructions in the output of help dataex. Having spent my time at this, I now have other work to attend to. If nobody else uses this to suggest code for you, perhaps I'll have a chance to return to this later.
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input str30 C str34 D str18(G H)
            "Company"        "Variable"                 " 6/21/2001" " 6/22/2001"
            "AMC Pacer "     "Environment Pillar Score" ""           ""          
            "AMC Spirit "    "Environment Pillar Score" ""           ""          
            "Buick Century " "Environment Pillar Score" ""           ""          
            "Buick Electra " "Environment Pillar Score" ""           ""          
            "Buick LeSabre " "Environment Pillar Score" ""           ""          
            "AMC Pacer "     "Some Other Thing"         ""           ""          
            "AMC Spirit "    "Some Other Thing"         ""           ""          
            "Buick Century " "Some Other Thing"         ""           ""          
            "Buick Electra " "Some Other Thing"         ""           ""          
            "Buick LeSabre " "Some Other Thing"         ""           ""          
            end
            One hint for anyone tackling this: turning the contents of D into Stata variable names will be a bit tricky, because we see that D can be up to 34 characters long, and Stata variable names are limited to 32 characters, and to use in reshaping, to 31 characters so it can be preceded by a single-character "stub".

            Let me add: On your earlier topic linked to from post #4 you wrote

            As I am doing master thesis now. I have followed several courses with stata. However, most of the time we had already received the database, or just make some small corrections like changing the date format from mdy to year or somethings. But I am familiar with the regressions and the event study methodology which I am going to perform.
            What you are setting about doing now is programming in Stata, which is more complicated than running a statistical analysis in Stata. The comparison is like cooking instead of reheating a frozen pizza - or perhaps instead of ordering from a menu. The reading I recommended in the earlier topic will prepare you for cooking with Stata.

            Comment


            • #7
              Small update, The first problem is solved by using this:

              import excel "..." firstrow clear
              foreach var of varlist _all {
              local label : variable label `var'
              local new_name = lower(strtoname("`label'"))
              rename `var' `new_name'
              }

              Now the variables are called _6_21_2001. For instance,
              Now I am looking at the following:
              reshape long _ , i(cid vid) j(date)
              Though I am still receiving an error:
              Variable date contains all missing values, but shouldn't this become the values of _6_21_2001 etc.

              Comment


              • #8
                Perhaps this sample code will help.
                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input str30 C str34 D str18(G H)
                "Company"        "Variable"                 " 6/21/2001" " 6/22/2001"
                "AMC Pacer "     "Environment Pillar Score" ""           ""          
                "AMC Spirit "    "Environment Pillar Score" ""           ""          
                "Buick Century " "Environment Pillar Score" ""           ""          
                "Buick Electra " "Environment Pillar Score" ""           ""          
                "Buick LeSabre " "Environment Pillar Score" ""           ""          
                "AMC Pacer "     "Some Other Thing"         "-7.0027"    "1.0086"    
                "AMC Spirit "    "Some Other Thing"         "-0.9193"    "-4.6556"   
                "Buick Century " "Some Other Thing"         "-4.4881"    "-3.0125"   
                "Buick Electra " "Some Other Thing"         "-8.6703"    "6.3614"    
                "Buick LeSabre " "Some Other Thing"         "-2.0454"    "7.6733"    
                "AMC Pacer "     "Another Thing"            "A"          "B"         
                "AMC Spirit "    "Another Thing"            "C"          "D"         
                "Buick Century " "Another Thing"            "E"          "F"         
                "Buick Electra " "Another Thing"            "G"          "H"         
                "Buick LeSabre " "Another Thing"            "I"          "J"         
                end
                
                // prepare the example data
                foreach var of varlist _all {
                    replace `var' = trim(`var')
                    local new_name = lower(strtoname(`var'[1]))
                    rename `var' `new_name'
                }
                drop in 1
                
                // first reshape long so there is one observation for each company, date, and variable
                reshape long _, i(company variable) j(sdate) string
                generate date = daily(sdate,"MDY")
                format %td date
                drop sdate
                
                // now reshape wide so there is one observatio for each company and date, with all the variables
                generate v = subinstr(strtoname(variable),"_","",.)
                // confirm that different values of variable didn't get the same value of v
                bysort v (variable): assert variable[1]==variable[_N]
                // be sure none is longer than 31 characters
                assert length(v)<32
                drop variable
                reshape wide _, i(company date) j(v) string
                destring _*, replace
                rename (_*) (*)
                Code:
                . list, noobs sepby(company) abbreviate(32)
                
                  +------------------------------------------------------------------------------------+
                  |       company        date   AnotherThing   EnvironmentPillarScore   SomeOtherThing |
                  |------------------------------------------------------------------------------------|
                  |     AMC Pacer   21jun2001              A                        .          -7.0027 |
                  |     AMC Pacer   22jun2001              B                        .           1.0086 |
                  |------------------------------------------------------------------------------------|
                  |    AMC Spirit   21jun2001              C                        .           -.9193 |
                  |    AMC Spirit   22jun2001              D                        .          -4.6556 |
                  |------------------------------------------------------------------------------------|
                  | Buick Century   21jun2001              E                        .          -4.4881 |
                  | Buick Century   22jun2001              F                        .          -3.0125 |
                  |------------------------------------------------------------------------------------|
                  | Buick Electra   21jun2001              G                        .          -8.6703 |
                  | Buick Electra   22jun2001              H                        .           6.3614 |
                  |------------------------------------------------------------------------------------|
                  | Buick LeSabre   21jun2001              I                        .          -2.0454 |
                  | Buick LeSabre   22jun2001              J                        .           7.6733 |
                  +------------------------------------------------------------------------------------+

                Comment


                • #9
                  Thanks a lot for the help. Got it!

                  Comment

                  Working...
                  X