Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata stuck running my regressions

    Hello all,

    I'm pretty new to stata and am currently writing my Bachelor Thesis, but I am already running into some issues early on.

    I have collected all of my variables from Eikon and am now looking to merge them together so that I can start running my regressions. All of my variables except for one are in the same order (panel data) but one of my variables came back in a wide form. I'm currently trying to reshape this into a long form but after troubleshooting for a little while, I ran into my final error that I cannot seem to get fixed. The code and output is as follows:

    . reshape long FY, i(ISIN) j(quarter, string)
    (note: j = 2005Q1 2005Q2 2005Q3 2005Q4 2006Q1 2006Q2 2006Q3 2006Q4 2007Q1 2007Q2 2007Q3 2007Q4 2008Q1 2008Q2 2008Q3 2008Q4 2009Q1 2009Q2 2009Q3 2009Q4 2010Q1 2010Q2 2010Q3 2010
    > Q4 2011Q1 2011Q2 2011Q3 2011Q4 2012Q1 2012Q2 2012Q3 2012Q4 2013Q1 2013Q2 2013Q3 2013Q4 2014Q1 2014Q2 2014Q3 2014Q4 2015Q1 2015Q2 2015Q3 2015Q4 2016Q1 2016Q2 2016Q3 2016Q4 201
    > 7Q1 2017Q2 2017Q3 2017Q4 2018Q1 2018Q2 2018Q3 2018Q4 2019Q1 2019Q2 2019Q3 2019Q4 2020Q1 2020Q2 2020Q3 2020Q4)
    variable id does not uniquely identify the observations
    Your data are currently wide. You are performing a reshape long. You specified i(ISIN) and j(quarter). In the current wide form, variable ISIN should uniquely identify
    the observations. Remember this picture:

    long wide
    +---------------+ +------------------+
    | i j a b | | i a1 a2 b1 b2 |
    |---------------| <--- reshape ---> |------------------|
    | 1 1 1 2 | | 1 1 3 2 4 |
    | 1 2 3 4 | | 2 5 7 6 8 |
    | 2 1 5 6 | +------------------+
    | 2 2 7 8 |
    +---------------+
    Type reshape error for a list of the problem observations.
    Its probably a minor mistake but I just can't seem to figure it out. I looked at my ISIN values and they look normal.

    Thanks for your time,
    Jesse

  • #2
    There is a command that checks whether your variable is an id, -[D] isid -- Check for unique identifiers-.

    Otherwise without see your data we cannot help much. Stata tells you where is the error, you have to figure out why.

    Comment


    • #3
      Adding to Joro's advice, Stata's error message from the reshape long command told you
      Code:
          Type reshape error for a list of the problem observations.
      After running the reshape long command and being given that advice, did you then run the reshape error command? If so, what did it tell you?

      Comment


      • #4
        Originally posted by William Lisowski View Post
        Adding to Joro's advice, Stata's error message from the reshape long command told you
        Code:
        Type reshape error for a list of the problem observations.
        After running the reshape long command and being given that advice, did you then run the reshape error command? If so, what did it tell you?
        I tried running the command but I did not understand the output. It was a very long list of nothing. I added a snippit of the output on the right. I'm not sure what I am doing wrong. I want to reshape this particular variable so that I can later merge all of my variables using 'merge m:1 using ''mvreshape.dta'' ' and then use that to create my paneldata set using the xtset .. .. command (still figuring this one out).

        I'm currently just stuck at the reshaping part still. As for the request for more data I read that users don't like downloading files (understandably so). What would be the best way for me to provide additional data about my request? Screenshots of my data editor and dofile editor?

        Thanks in advance,
        Jessed0841fd3ebe8406d68a0d3aea3239ab2.png

        Comment


        • #5
          It was a very long list of nothing.
          No, apparently it was a very long list of missing values, which constitute the values of ISIN in your dataset that are not unique.

          Why do you have observations where ISIN is missing?

          I suspect you will find that that you have many rows at the bottom of your data where all the data is missing, representing problems importing the data from "Eikon" (whatever that is).

          If your data was originally in an Excel worksheet and you used import excel to read it into Stata, then the worksheet was poorly constructed and led Stata to believe there were more rows in your data than there actually were, and Stata added empty observations to the bottom of your imported data, and you have not done the necessary review to ensure that the data was imported correctly. If that's the case, you can tell import excel the range of rows and columns from which it should get its data.

          In any event, use Stata's Data Editor and review those observations with missing ISIN values, and either correct or delete them. Then try the isid command that Joro recommended in post #2 to see if there are any remaining issues.

          What would be the best way for me to provide additional data about my request?
          Please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ. The FAQ also explains that screenshots are not as useful as you might thing they are.

          Even the best descriptions of data are no substitute for an actual example of the data. Be sure to use the dataex command to do this. If you are running version 15.1 or later, or a fully updated version 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

          The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

          Comment


          • #6
            Thanks for the explanation William, that did end up being the issue. After removing the many ''.'' I was able to reshape my variable into a long form.

            I have since read the FAQ listings you provided, thanks for those.

            I am using stata 16.
            Some information about my thesis:


            Research question: To what extent did the great recession change the capital structure of the Nasdaq 100 listed companies.
            Sample size contains all of the companies listed on the nasdaq between 2005 and 2020. The total sample size differs per variable as not all information was as easily accessible. I do have all of my current variables in panel data form and ready to merge though. The final result I am looking for is to regres (method to be determined by Hausman test) for my dependent variables market debt ratio and book debt ratio controlling for several variables and industry / country effects.


            I tried running the isid command but it required a varlist. My code right now is as follows
            Code:
             cd "c:\users\Jesse\documents\thesis"
            use "mvreshape", clear
            set more off, permanently
            
            encode ISIN, gen(ISIN1)
            
            reshape long FY, i(ISIN1) j(Quarter, string)
            rename FY market_value
            
            import
            merge m:1 market_value using "mvreshape.dta"
            I'm trying to figure out how to import multiple excel sheets at once, but the guide I have found online is rather vague.

            What I am trying to achieve is to merge all of my variables (currently listed in separate sheets in the same excel file). I have the sample with ISIN codes on sheet 1 followed by the variables; total_assets, total_long_term_debt, return_on_assets, property_plant_and_equipment, research_and_development, price_to_book_value. I want to merge all of these variables into 1 sheet so I can use them to calculate my new variables using gen combined with some math. Would this be the proper command to execute this: merge m:1 hid using ‘’mvreshape.dta’' ?

            After merging and calculating my new variables, I want to declare xtset id year and then xtdescribe followed by xttest0.

            Once this is completed I think my next step would be looking into conducting a Hausman test to determine which regressions I should run and then try to run them. After this is done I want to make a dummy for the timeframe of the great recession and then test whether the great recession has affect capital structure significantly during the recession using the dummy and the normal time frame. It would be robust because I test it with both Market Debt Ratio and Book Debt Ratio.

            I apologize for the amount of questions but I am very stuck right now. Any help is appreciated.
            Last edited by Jesse Nooijen; 21 Apr 2021, 05:33.

            Comment


            • #7
              I tried running the isid command but it required a varlist.
              The correct response is then to give the command
              Code:
              help isid
              and read the output. There you will see an example of its use
              Code:
               Check whether mpg uniquely identifies observations
                      . isid mpg
              Since Stata told you in its error message
              Code:
              In the current wide form, variable ISIN should uniquely identify the observations.
              this suggests that after correcting the problems identified by reshape error you would then run
              Code:
              isid ISIN
              to confirm that there are no remaining problems.

              You are too ambitious by half in trying to simultaneously import your Excel worksheets and merge them to your data, given your basic level of expertise with Stata. I suggest you first set about importing each of your worksheets and saving it as a Stata dataset, making sure import only the rows and columns within which your data lies. Once that is all in order, then you can try repeatedly merging the datasets you have created.

              I recommend you work with ISIN as a string variable and defer creating the encoded version until after all your datasets are merged. You cannot be sure that the encoded values will be the same when you encode a string in separate datasets.

              Comment


              • #8
                Originally posted by William Lisowski View Post
                You are too ambitious by half in trying to simultaneously import your Excel worksheets and merge them to your data, given your basic level of expertise with Stata. I suggest you first set about importing each of your worksheets and saving it as a Stata dataset, making sure import only the rows and columns within which your data lies. Once that is all in order, then you can try repeatedly merging the datasets you have created.

                I recommend you work with ISIN as a string variable and defer creating the encoded version until after all your datasets are merged. You cannot be sure that the encoded values will be the same when you encode a string in separate datasets.
                Alright. My focus now is to first import all of my work sheets, then try to merge them and go from there. After searching this forum I found the following code and modified it to suit my work, but I can't get it to work.

                Code:
                cd "c:\users\Jesse\documents\thesis" 
                set more off, permanently
                
                
                 import excel using variables.xlsx, describe
                forvalues sheet=1/`=r(N_worksheet)' {  
                local sheetname=r(worksheet_`sheet')  
                import excel using variables.xlsx, sheet("`sheetname'")  
                save "sample", replace  
                clear
                }
                
                
                isid ISIN
                encode ISIN, gen(ISIN1)
                
                reshape long FY, i(ISIN1) j(Quarter, string)
                rename FY market_value
                Gives me the output
                Code:
                 . do "C:\Users\jesse\AppData\Local\Temp\STD2998_000000.tmp"
                
                . cd "c:\users\Jesse\documents\thesis" 
                c:\users\Jesse\documents\thesis
                
                . set more off, permanently
                (set more preference recorded)
                
                . 
                . 
                .  import excel using variables.xlsx, describe
                
                                         Sheet | Range
                  -----------------------------+-----------------------------
                                        sample | A1:B246
                                  market_value | A1:BN12339
                                  total_assets | A1:E12339
                          total_long_term_debt | A1:D12339
                              return_on_assets | A1:D11792
                  property_plant_and_equipment | A1:D12339
                      research_and_development | A1:D7241
                           price_to_book_value | A1:C15429
                
                . forvalues sheet=1/`=r(N_worksheet)' {  
                  2. local sheetname=r(worksheet_`sheet')  
                  3. import excel using variables.xlsx, sheet("`sheetname'")  
                  4. save "sample'", replace  
                  5. clear
                  6. }
                (2 vars, 246 obs)
                (note: file sample'.dta not found)
                file sample'.dta saved
                worksheet . not found
                r(601);
                
                end of do-file
                I don't understand how I am supposed to load the several sheets into my database. I originally tried importing each sheet seperately using the
                Code:
                import excel ''variables.xlsx'', /*
                */ sheet(sample) cellrange(a1:b246)firstrow 
                
                import excel ''variables.xlsx'', /*
                */ sheet(total_assets) cellrange(a1:d12339)firstrow
                Which gave me the error:
                Code:
                 no dataset in memory has changed since last saved

                Comment


                • #9
                  Okay, I have since the post before managed to load all of my excel sheets and save them as their own .dta file.

                  Code:
                  cd "c:\users\Jesse\documents\thesis" 
                  set more off, permanently
                  clear
                  
                  import excel "variables.xlsx", /*
                  */sheet(sample) cellrange(a1:b246)firstrow
                  save sample, replace
                  clear
                  
                  import excel "variables.xlsx", /*
                  */ sheet(total_assets) cellrange(a1:d12339)firstrow
                  save total_assets, replace
                  clear 
                  
                  import excel "variables.xlsx", /*
                  */ sheet(market_value) cellrange(a1:bn100)firstrow
                  reshape long FY, i(ISIN) j(Quarter, string)
                  rename FY market_value
                  drop NAME
                  save market_value, replace
                  clear
                  
                  import excel "variables.xlsx", /*
                  */ sheet(total_long_term_debt) cellrange(a1:d12339)firstrow
                  save total_long_term_debt, replace
                  clear 
                  
                  import excel "variables.xlsx", /*
                  */ sheet(return_on_assets) cellrange(a1:d11792)firstrow
                  save return_on_assets, replace
                  clear 
                  
                  import excel "variables.xlsx", /*
                  */ sheet(property_plant_and_equipment) cellrange(a1:d10776)firstrow
                  save property_plant_and_equipment, replace
                  clear 
                  
                  import excel "variables.xlsx", /*
                  */ sheet(research_and_development) cellrange(a1:d7241)firstrow
                  save research_and_development, replace
                  clear 
                  
                  import excel "variables.xlsx", /*
                  */ sheet(price_to_book_value) cellrange(a1:15429)firstrow
                  save price_to_book_value, replace
                  clear
                  Can I now merge all of these separate sheets together using
                  Code:
                  merge m:1 sample using " all of my .dta files"
                  ?

                  An example of my excel sheets is as follows:
                  The main excel sheet consists of the sample with the company names and ISIN codes. I assume this will be the master data set I will be merging the other variables into.

                  Sample sheet:
                  cfdbebc9e6b94b0d6c7cce4c66555573.png

                  Example of variable I want to merge into the sample sheet (all my variables will be merged into 1 database preferably):
                  c9d8d1e1ae913a5ae626fd75cdc338ba.pngbde36218ccd49ff69b9d058503b8825b.png


                  Another question here is: Should I remove the general list of ISIN codes that is also present on each separate sheet? Because this is already present in the sample sheet.

                  Comment

                  Working...
                  X