Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data format reshape

    Dear Statalists,

    I have one simple question. At the moment, I am dealing with the data which is relatively cross-sectional. Please check the data as shown below:
    Director Name Company ID Director_StartYear Director_EndYear
    John 7 01/07/1996 01/07/2002
    Mary 3 01/07/1999 01/07/2003
    where Company ID is the unique identifier of the company, Director_StartYear is the beginning year of being the director of the company, and Director_EndYear is the ending year of being the director of the company.


    Ideally, I would like to reshape it to the data format (to panel data) as:
    Director Company ID Director_Year
    John 7 1996
    John 7 1997
    John 7 1998
    John 7 1999
    John 7 2000
    John 7 2001
    John 7 2002
    Mary 3 1999
    Mary 3 2000
    Mary 3 2001
    Mary 3 2002
    Mary 3 2003
    Could you please kindly show me the possible stata code for doing this data transmission? Will "reshape" be a possible code?

    Many thanks.

    Best wishes,
    Cong

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str4 Director_Name byte Company_ID str10(Director_StartYear Director_EndYear)
    "John" 7 "01/07/1996" "01/07/2002"
    "Mary" 3 "01/07/1999" "01/07/2003"
    end
    
    rename *Year date#, addnumber
    
    reshape long date, i(Director_Name Company_ID)
    
    gen Director_Year = yofd(date(date,"DMY"))

    Comment


    • #3
      Dates stored as strings cannot be used for calculations. So the first step is to convert them to Stata internal format numerical date variables. Also, it is a poor programming practice to give variables misleading names. The variables you call director_startyear and director_endyear are not, in fact, years. They are full calendar dates. Since you ultimately need to create actual year variables from them, the potential for confusing which variable is which is high.
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str5 directorname byte companyid str11 director_startyear str10 director_endyear
      "John " 7 "01/07/1996 " "01/07/2002"
      "Mary " 3 "01/07/1999 " "01/07/2003"
      end
      
      //  CONVERT DATES TO STATA INTERNAL FORMAT DATE VARIABLES
      //  AND EXTRACT THE YEAR
      rename *year *date
      foreach v of varlist *date {
          gen _`v' = daily(`v', "DMY"), after(`v')
          assert missing(_`v') == missing(`v')
          format _`v' %tc
          drop `v'
          rename _`v' `v'
          local yvar: subinstr local v "date" "year"
          gen int `yvar' = year(`v')
      }
      
      assert director_enddate >= director_startdate
      expand director_endyear - director_startyear + 1
      by directorname companyid, sort: gen director_year = director_startyear + _n - 1
      Note: I cannot tell from the examples given when the dates shown are MDY or DMY. The code assumes the latter. Revise accordingly if needed.

      In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Added: Crossed with #2 which offers a different approach. In a large data set, #2 will be appreciably slower. It also does not answer the question posed as I understand it, because it does not create additional observations with the years between the start and end years.



      Last edited by Clyde Schechter; 21 Jul 2022, 14:13.

      Comment


      • #4
        Originally posted by Øyvind Snilsberg View Post
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str4 Director_Name byte Company_ID str10(Director_StartYear Director_EndYear)
        "John" 7 "01/07/1996" "01/07/2002"
        "Mary" 3 "01/07/1999" "01/07/2003"
        end
        
        rename *Year date#, addnumber
        
        reshape long date, i(Director_Name Company_ID)
        
        gen Director_Year = yofd(date(date,"DMY"))
        Thanks Snilsberg. Much appreciated.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Dates stored as strings cannot be used for calculations. So the first step is to convert them to Stata internal format numerical date variables. Also, it is a poor programming practice to give variables misleading names. The variables you call director_startyear and director_endyear are not, in fact, years. They are full calendar dates. Since you ultimately need to create actual year variables from them, the potential for confusing which variable is which is high.
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str5 directorname byte companyid str11 director_startyear str10 director_endyear
          "John " 7 "01/07/1996 " "01/07/2002"
          "Mary " 3 "01/07/1999 " "01/07/2003"
          end
          
          // CONVERT DATES TO STATA INTERNAL FORMAT DATE VARIABLES
          // AND EXTRACT THE YEAR
          rename *year *date
          foreach v of varlist *date {
          gen _`v' = daily(`v', "DMY"), after(`v')
          assert missing(_`v') == missing(`v')
          format _`v' %tc
          drop `v'
          rename _`v' `v'
          local yvar: subinstr local v "date" "year"
          gen int `yvar' = year(`v')
          }
          
          assert director_enddate >= director_startdate
          expand director_endyear - director_startyear + 1
          by directorname companyid, sort: gen director_year = director_startyear + _n - 1
          Note: I cannot tell from the examples given when the dates shown are MDY or DMY. The code assumes the latter. Revise accordingly if needed.

          In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

          Added: Crossed with #2 which offers a different approach. In a large data set, #2 will be appreciably slower. It also does not answer the question posed as I understand it, because it does not create additional observations with the years between the start and end years.


          Many thanks Clyde. It is exactly what I want. Appreciated.

          Comment

          Working...
          X