Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Sorry, I meant -isid firm date-, not -isid firm year-.

    As for the missing values, this sounds like a problem. The code you show belongs inside a loop, but you do not show the loop that contains it--and that might be the source of the problem. Please show the entire loop.

    Comment


    • #17
      Code:
      . local myfilelist : dir . files "*.csv"
      
      .
      . foreach file of local myfilelist {
        2.
      . drop _all
        3.
      . insheet using "`file'"
        4.
      . local building = subinstr("`file'",".csv","",.)
        5.
      . save "`building'", replace
        6.
      . }
      (7 vars, 757 obs)
      file a anhui gujing.dta saved
      (7 vars, 731 obs)
      file a bengang.dta saved
      (7 vars, 757 obs)
      file a changchai.dta saved
      (7 vars, 725 obs)
      file a china bicycle.dta saved
      (7 vars, 744 obs)
      file a china fangda.dta saved
      (7 vars, 646 obs)
      file a china national accord.dta saved
      (7 vars, 703 obs)
      file a chongqing changan.dta saved
      (7 vars, 731 obs)
      file a csg.dta saved
      (7 vars, 737 obs)
      file a dalian.dta saved
      (7 vars, 658 obs)
      file a dongxu jan15.dta saved
      (7 vars, 750 obs)
      file a fawer.dta saved
      (7 vars, 750 obs)
      file a fiyta.dta saved
      (7 vars, 744 obs)
      file a foshan electrical.dta saved
      (7 vars, 727 obs)
      file a guangdong electrical.dta saved
      (7 vars, 679 obs)
      file a guangdong provincial.dta saved
      (7 vars, 631 obs)
      file a hainan dadonghai.dta saved
      (7 vars, 630 obs)
      file a hainan pearl.dta saved
      (7 vars, 739 obs)
      file a hefei meiling.dta saved
      (7 vars, 464 obs)
      file a hubei sanonda.dta saved
      (7 vars, 757 obs)
      file a jiangling.dta saved
      (7 vars, 672 obs)
      file a konka.dta saved
      (7 vars, 755 obs)
      file a lu thai.dta saved
      (7 vars, 748 obs)
      file a shandong chenming.dta saved
      (7 vars, 668 obs)
      file a shenbao toaug17.dta saved
      (7 vars, 607 obs)
      file a shenzhen chiwan wharf tonov17.dta saved
      (7 vars, 640 obs)
      file a shenzhen nanshan.dta saved
      (7 vars, 757 obs)
      file a shenzhen prop.dta saved
      (7 vars, 671 obs)
      file a shenzhen seg.dta saved
      (7 vars, 741 obs)
      file a shenzhen tellus.dta saved
      (7 vars, 667 obs)
      file a shenzhen textile.dta saved
      (7 vars, 710 obs)
      file a shenzhen wongtee tooct17.dta saved
      (7 vars, 587 obs)
      file a shenzhen zhongheng frmar15.dta saved
      (7 vars, 741 obs)
      file a sino great wall.dta saved
      (7 vars, 757 obs)
      file a wuxi little swan.dta saved
      (7 vars, 757 obs)
      file a yantai changyu.dta saved
      (7 vars, 751 obs)
      file aboetech.dta saved
      
      . gen firm = `"`s'"'
      (751 missing values generated)
      
      .
      .     append using `building'
      
      .
      .     save `"`building'"', replace
      file aboetech.dta saved
      
      . use `building', clear
      
      .
      . isid firm date, sort
      variables firm date should never be missing
      r(459);
      
      .
      . quietly compress
      
      .
      . save 34_firms_data, replace
      file 34_firms_data.dta saved
      
      .
      Dear Clyde,

      The highlighted words serve to gain your attention. I have also updated -isid firm year- to -isid firm date-. I forgot to mention that I have also tried to append my data, as you suggested, using Data->Combine data sets->Append data sets, for simplicity's sake I only tried to append 2 data sets but this showed up:


      Code:
      . append using "\\ads.bris.ac.uk\filestore\MyFiles\Stud
      > entUG15\zl15509\Documents\a anhui gujing.dta" "\\ads.
      > bris.ac.uk\filestore\MyFiles\StudentUG15\zl15509\Docu
      > ments\a bengang.dta"
      variable price is float in master but str13 in using
        data
          You could specify append's force option to ignore
          this numeric/string mismatch.  The using variable
          would then be treated as if it contained numeric
          missing value.
      r(106);
      Last edited by sladmin; 09 Apr 2018, 08:54. Reason: anonymize poster

      Comment


      • #18
        Dear Clyde,

        Was what I posted above the 'loop' that you requested?

        Comment


        • #19
          Dear Clyde,

          Below is the loop you required:

          Code:
          . local myfilelist : dir . files "*.csv"
          
          .
          . foreach file of local myfilelist {
            2.
          . drop _all
            3.
          . insheet using "`file'"
            4.
          . local building = subinstr("`file'",".csv","",.)
            5.
          . save "`building'", replace
            6.
          . }
          (7 vars, 757 obs)
          file a anhui gujing.dta saved
          (7 vars, 731 obs)
          file a bengang.dta saved
          (7 vars, 757 obs)
          file a changchai.dta saved
          (7 vars, 725 obs)
          file a china bicycle.dta saved
          (7 vars, 744 obs)
          file a china fangda.dta saved
          (7 vars, 646 obs)
          file a china national accord.dta saved
          (7 vars, 703 obs)
          file a chongqing changan.dta saved
          (7 vars, 731 obs)
          file a csg.dta saved
          (7 vars, 737 obs)
          file a dalian.dta saved
          (7 vars, 658 obs)
          file a dongxu jan15.dta saved
          (7 vars, 750 obs)
          file a fawer.dta saved
          (7 vars, 750 obs)
          file a fiyta.dta saved
          (7 vars, 744 obs)
          file a foshan electrical.dta saved
          (7 vars, 727 obs)
          file a guangdong electrical.dta saved
          (7 vars, 679 obs)
          file a guangdong provincial.dta saved
          (7 vars, 631 obs)
          file a hainan dadonghai.dta saved
          (7 vars, 630 obs)
          file a hainan pearl.dta saved
          (7 vars, 739 obs)
          file a hefei meiling.dta saved
          (7 vars, 464 obs)
          file a hubei sanonda.dta saved
          (7 vars, 757 obs)
          file a jiangling.dta saved
          (7 vars, 672 obs)
          file a konka.dta saved
          (7 vars, 755 obs)
          file a lu thai.dta saved
          (7 vars, 748 obs)
          file a shandong chenming.dta saved
          (7 vars, 668 obs)
          file a shenbao toaug17.dta saved
          (7 vars, 607 obs)
          file a shenzhen chiwan wharf tonov17.dta saved
          (7 vars, 640 obs)
          file a shenzhen nanshan.dta saved
          (7 vars, 757 obs)
          file a shenzhen prop.dta saved
          (7 vars, 671 obs)
          file a shenzhen seg.dta saved
          (7 vars, 741 obs)
          file a shenzhen tellus.dta saved
          (7 vars, 667 obs)
          file a shenzhen textile.dta saved
          (7 vars, 710 obs)
          file a shenzhen wongtee tooct17.dta saved
          (7 vars, 587 obs)
          file a shenzhen zhongheng frmar15.dta saved
          (7 vars, 741 obs)
          file a sino great wall.dta saved
          (7 vars, 757 obs)
          file a wuxi little swan.dta saved
          (7 vars, 757 obs)
          file a yantai changyu.dta saved
          (7 vars, 751 obs)
          file aboetech.dta saved
          
          . gen firm = `"`s'"'
          (751 missing values generated)
          
          .
          .     append using `building'
          
          .
          .     save `"`building'"', replace
          file aboetech.dta saved
          
          . use `building', clear
          
          .
          . isid firm date, sort
          variables firm date should never be missing
          r(459);
          
          .
          . quietly compress
          
          .
          . save 34_firms_data, replace
          file 34_firms_data.dta saved
          
          .
          The highlighted words serve to gain your attention. I have also updated -isid firm year- to -isid firm date-.

          I have tried to execute the following as well:
          Code:
          . clear
          
          . local myfilelist : dir . files "*.csv"
          
          .
          . foreach file of local myfilelist {
            2.
          . drop _all
            3.
          . insheet using "`file'"
            4.
          . local stubs = subinstr("`file'",".csv","",.)
            5.
          . save "`building'", replace
            6.
          . }
          (7 vars, 729 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 755 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 723 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 742 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 644 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 703 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 731 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 737 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 656 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 748 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 748 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 742 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 725 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 677 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 629 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 628 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 737 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 462 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 755 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 670 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 753 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 746 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 666 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 605 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 638 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 755 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 669 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 739 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 665 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 708 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 585 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 739 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 753 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 755 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 755 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 755 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          (7 vars, 751 obs)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > a.tmp saved
          
          . tempfile building
          
          .
          . save `building', emptyok
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > b.tmp saved
          
          . foreach s of local stubs {
            2.
          .     import delimited `s'.csv, varnames(1)
            3.
          .     gen firm = `"`s'"'
            4.
          .     append using `building'
            5.
          .     save `"`building'"', replace
            6.
          . }
          no; data in memory would be lost
          r(4);
          
          .
          When I try your code in #13, this came up:

          Code:
          . clear
          
          .
          . local files: dir "." files "*.csv"
          
          .
          . local stubs: subinstr local files ".csv" "", all
          
          .
          .
          .
          . tempfile building
          
          .
          . save `building', emptyok
          (note: dataset contains 0 observations)
          file C:\Users\zl15509\AppData\Local\Temp\ST_142c_00000
          > c.tmp saved
          
          .
          As I have mentioned before, the code you provided to another user in another thread:
          Code:
           local myfilelist : dir . files "*.csv"
          foreach file of local myfilelist {
          drop _all
          insheet using "`file'"
          local outfile = subinstr("`file'",".csv","",.)
          save "`outfile'", replace
          }
          has helped me to import all the .CSV files into Stata and copies of .dta are replicated in the original work directory as well, but I can't seem to amalgamate the said code with the one you gave me in #13 to reach a meaningful outcome. I have been working on it since a few hours ago and I feel that the more I try, the more I stray from my goal. As you suggested, I have also browsed the forum and looked for similar topics that could help me, but unfortunately I think I am the (rather unwilling) lone wolf here.

          Thank you
          Last edited by sladmin; 09 Apr 2018, 08:55. Reason: anonymize poster

          Comment


          • #20
            So the problem is that you have blended two different sets of code but not reconciled the difference between them.

            You have posted two large blocks of code showing two different attempts. The problems are slightly different.

            For the first block, the situation is this. In the code I originally suggested in #13, the -gen firm = `"`s'"'- command was inside a loop that was controlled by -foreach s of local stubs-. And the code itself also created local macro stubs in the third line. Now the code that you are actually using is different and no longer creates, nor uses, local macro stubs. Instead, you are inside a loop governed by -foreach file of local myfilelist-. So when the command -gen firm = `"`s'"'- is reached, `s' has never been defined, which Stata responds to by interpreting `s' as an empty string. So firm is always set to a missing value.

            For the second block, you do define local macro stubs, but you do it differently. I'm not sure whether what you have there is correct or not. BUt we never get to find out because of a later error. Your -import delimited- command is producing the error message you get there, because it needs to have a -clear- option. This was my error in the code in #13, and you just copied that.

            You then have a third block of code, shorter, that is producing a warning that "data set contains no observations." That's not a problem. Don't worry about it. That data set is supposed to be empty at that point in the program.

            So let's try to fix this up and get you moving forward. You already have the 34 (or is it 36 now?) data sets imported into Stata. So no need to work on that part. To go from there to a combined file is what we need.

            Code:
            clear
            capture erase 34_firms_data.dta
            local files: dir "." files "*.dta"
            local stubs: subinstr local files ".dta" "", all
            
            tempfile building
            save `building', emptyok
            
            foreach s of local stubs {
                use `s', clear
                gen firm = `"`s'"'
                append using `building'
                save `"`building'"', replace
            }
            
            use `building', clear
            isid firm date, sort
            quietly compress
            save 34_firms_data, replace
            I think this should do the trick. Note that this code assumes that the only .dta files in the current working directory are the 34 data files we want to combine (plus possibly an earlier version of the combined file--which gets replaced by the new one).

            Comment


            • #21
              Dear Clyde,

              I think I was actually facing quite a fundamental issue:


              Code:
              . local myfilelist : dir "." files "*.csv"
              
              .
              . foreach file of local myfilelist {
                2.
              . drop _all
                3.
              . insheet using "`file'"
                4.
              . local outfile = subinstr("`file'",".csv","",.)
                5.
              . save "`outfile'", replace
                6.
              . }
              (7 vars, 723 obs)
              file a china bicycle.dta saved
              (7 vars, 742 obs)
              file a chinafangda.dta saved
              (7 vars, 644 obs)
              file a china national accord.dta saved
              (7 vars, 703 obs)
              file a chongqing changan.dta saved
              (7 vars, 731 obs)
              file a csg.dta saved
              (7 vars, 737 obs)
              file a dalian ref.dta saved
              (7 vars, 656 obs)
              file a dongxu jan15.dta saved
              (7 vars, 748 obs)
              file a fawer.dta saved
              (7 vars, 748 obs)
              file a fiyta.dta saved
              (7 vars, 742 obs)
              file a foshan electrical.dta saved
              (7 vars, 725 obs)
              file a guangdong electrical.dta saved
              (7 vars, 677 obs)
              file a guangdong provincial.dta saved
              (7 vars, 629 obs)
              file a hainan dadonghai.dta saved
              (7 vars, 628 obs)
              file a hainan pearl.dta saved
              (7 vars, 737 obs)
              file a hefei meiling.dta saved
              (7 vars, 462 obs)
              file a hubei sanonda.dta saved
              (7 vars, 755 obs)
              file a jiangling.dta saved
              (7 vars, 670 obs)
              file a konka.dta saved
              (7 vars, 753 obs)
              file a lu thai.dta saved
              (7 vars, 746 obs)
              file a shandong chenming.dta saved
              (7 vars, 666 obs)
              file a shenbao toaug17.dta saved
              (7 vars, 605 obs)
              file a shenzhen chiwan wharf tonov17.dta saved
              (7 vars, 638 obs)
              file a shenzhen nanshan.dta saved
              (7 vars, 755 obs)
              file a shenzhen prop.dta saved
              (7 vars, 669 obs)
              file a shenzhen seg.dta saved
              (7 vars, 739 obs)
              file a shenzhen tellus.dta saved
              (7 vars, 665 obs)
              file a shenzhen textile.dta saved
              (7 vars, 708 obs)
              file a shenzhen wongtee tooct17.dta saved
              (7 vars, 585 obs)
              file a shenzhen zhongheng frmar15.dta saved
              (7 vars, 739 obs)
              file a sino great wall.dta saved
              (7 vars, 753 obs)
              file a weifu.dta saved
              (7 vars, 755 obs)
              file a wuxi little swan.dta saved
              (7 vars, 755 obs)
              file a yantai changyu.dta saved
              (7 vars, 751 obs)
              file aboetech.dta saved
              (7 vars, 755 obs)
              file achangchai.dta saved
              
              . capture erase 34_firms_data.dta
              
              .
              . local files: dir "." files "*.dta"
              
              .
              . local stubs: subinstr local files ".dta" "", all
              
              . tempfile building
              
              .
              . save `building', emptyok
              file C:\Users\zl15509\AppData\Local\Temp\ST_0000000a.tmp saved
              
              .
              .
              .
              . foreach s of local stubs {
                2.
              .     use `s', clear
                3.
              .     gen firm = `"`s'"'
                4.
              .     append using `building'
                5.
              .     save `"`building'"', replace
                6.
              . }
              invalid 'china'
              r(198);
              
              . use `building', clear
              
              .
              . isid firm date, sort
              variable firm not found
              r(111);
              
              .
              . quietly compress
              
              .
              . save 34_firms_data, replace
              (note: file 34_firms_data.dta not found)
              file 34_firms_data.dta saved
              As you can see, I have sorted my .CSV files using a naming system that works like this, (Market)space(company name), thus my .dta files were generated with spaces in between, I have then deleted the spaces and the error invalid 'china' went away and another company's name came up as the new invalid file, can Stata not understand spaces in files' name? There might also have been a miscommunication between you and me because there actually isn't a variable called firm, as you can see below:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str12 date float(price open high low) str7 vol float change
              "Dec 29, 2017" 5.58 5.57 5.63 5.54 "1.19M"     .72
              "Dec 28, 2017" 5.54 5.56 5.62 5.52 "1.47M"    -.89
              "Dec 27, 2017" 5.59 5.65 5.66 5.58 "1.39M"   -1.06
              "Dec 26, 2017" 5.65 5.71 5.71 5.51 "2.73M"   -1.05
              "Dec 25, 2017" 5.71 5.74 5.76 5.62 "3.77M"     -.7
              "Dec 22, 2017" 5.75 5.63 5.86 5.59 "4.87M"    1.95
              "Dec 21, 2017" 5.64 5.54 5.75 5.49 "3.23M"    1.81
              "Dec 20, 2017" 5.54 5.62 5.62 5.53 "1.29M"   -1.25
              "Dec 19, 2017" 5.61 5.55 5.61 5.52 "1.44M"     .72
              "Dec 18, 2017" 5.57 5.58 5.61 5.54 "1.50M"       0
              "Dec 15, 2017" 5.57 5.55 5.57  5.5 "1.27M"     .36
              "Dec 14, 2017" 5.55 5.55 5.56 5.49 "1.40M"     .18
              "Dec 13, 2017" 5.54 5.43 5.54 5.43 "1.30M"    1.09
              "Dec 12, 2017" 5.48 5.54 5.61 5.46 "1.78M"   -1.08
              "Dec 11, 2017" 5.54 5.52 5.55 5.49 "1.29M"     .54
              "Dec 08, 2017" 5.51  5.5 5.54 5.47 "1.27M"     .73
              "Dec 07, 2017" 5.47 5.46 5.53 5.43 "2.03M"     .18
              "Dec 06, 2017" 5.46 5.39 5.48 5.34 "3.68M"       0
              "Dec 05, 2017" 5.46 5.69 5.77 5.29 "6.63M"   -4.71
              "Dec 04, 2017" 5.73 5.89 5.93 5.72 "4.30M"   -2.88
              "Dec 01, 2017"  5.9 5.85 5.91 5.84 "1.82M"     .85
              "Nov 30, 2017" 5.85  5.9  5.9 5.84 "1.88M"    -.85
              "Nov 29, 2017"  5.9 5.93 5.94 5.83 "1.56M"    -.67
              "Nov 28, 2017" 5.94 5.83 5.96 5.82 "1.63M"    1.89
              "Nov 27, 2017" 5.83 5.88 5.89 5.82 "987.84K"  -.85
              "Nov 24, 2017" 5.88 5.88 5.95 5.83 "1.64M"       0
              "Nov 23, 2017" 5.88 5.97 6.09 5.87 "2.63M"   -1.67
              "Nov 22, 2017" 5.98 5.93    6 5.86 "2.92M"     .84
              "Nov 21, 2017" 5.93 5.97 5.99  5.9 "1.92M"    -.67
              "Nov 20, 2017" 5.97 6.05 6.07 5.88 "3.43M"   -1.32
              "Nov 17, 2017" 6.05 6.29 6.32 6.04 "4.19M"   -3.82
              "Nov 16, 2017" 6.29 6.31 6.35 6.29 "1.60M"    -.47
              "Nov 15, 2017" 6.32  6.4 6.41  6.3 "2.68M"   -1.25
              "Nov 14, 2017"  6.4 6.41 6.45 6.38 "1.69M"    -.47
              "Nov 13, 2017" 6.43 6.48  6.5  6.4 "2.32M"    -.77
              "Nov 10, 2017" 6.48 6.54 6.54 6.46 "2.29M"    -.92
              "Nov 09, 2017" 6.54 6.52 6.57 6.47 "1.93M"     .31
              "Nov 08, 2017" 6.52 6.47 6.55 6.45 "2.60M"     .15
              "Nov 07, 2017" 6.51 6.49 6.52 6.33 "4.76M"       0
              "Nov 06, 2017" 6.51  6.5 6.54 6.41 "2.35M"     .31
              "Nov 03, 2017" 6.49 6.67 6.69 6.46 "3.18M"    -2.7
              "Nov 02, 2017" 6.67 6.73 6.79 6.64 "1.74M"   -1.33
              "Nov 01, 2017" 6.76 6.84 6.86 6.75 "1.91M"    -.59
              "Oct 31, 2017"  6.8 6.86    7 6.75 "2.54M"    1.64
              "Oct 30, 2017" 6.69 6.94 6.94 6.66 "3.61M"   -3.74
              "Oct 27, 2017" 6.95 7.09 7.09  6.9 "3.96M"   -2.52
              "Oct 26, 2017" 7.13 7.16 7.34 7.08 "5.62M"    -.56
              "Oct 25, 2017" 7.17 7.12 7.23 7.08 "3.01M"     .99
              "Oct 24, 2017"  7.1 7.14 7.16 7.08 "1.67M"    -.98
              "Oct 23, 2017" 7.17 7.05 7.25 7.01 "4.66M"    1.27
              "Oct 20, 2017" 7.08 6.98 7.08 6.93 "2.37M"    1.14
              "Oct 19, 2017"    7 6.94 7.09 6.77 "3.99M"     .72
              "Oct 18, 2017" 6.95 6.98 7.02 6.93 "1.86M"       0
              "Oct 17, 2017" 6.95 6.92 7.01 6.92 "2.09M"       0
              "Oct 16, 2017" 6.95 7.15 7.15 6.95 "2.88M"   -2.52
              "Oct 13, 2017" 7.13 7.09 7.14 7.08 "1.68M"     .56
              "Oct 12, 2017" 7.09 7.13 7.16 7.05 "2.08M"    -.98
              "Oct 11, 2017" 7.16  7.2 7.23 7.15 "2.25M"    -.56
              "Oct 10, 2017"  7.2 7.16 7.22  7.1 "2.66M"     .56
              "Oct 09, 2017" 7.16 7.17 7.19 7.12 "2.59M"     .42
              "Sep 29, 2017" 7.13 7.03 7.14 7.02 "2.63M"    1.42
              "Sep 28, 2017" 7.03 7.02 7.06 7.01 "2.10M"     .14
              "Sep 27, 2017" 7.02 7.03 7.03 6.96 "2.02M"     .29
              "Sep 26, 2017"    7 6.98 7.04 6.97 "1.92M"     .29
              "Sep 25, 2017" 6.98 7.01 7.04 6.98 "2.39M"    -.43
              "Sep 22, 2017" 7.01 7.05 7.09 6.98 "4.55M"    -.85
              "Sep 21, 2017" 7.07 7.14 7.17 7.07 "6.00M"    -.98
              "Sep 20, 2017" 7.14 7.15 7.19 7.09 "4.57M"    -.14
              "Sep 19, 2017" 7.15 7.19 7.27  7.1 "5.37M"     .14
              "Sep 18, 2017" 7.14 7.18  7.2 7.13 "4.05M"    -.56
              "Sep 15, 2017" 7.18 7.21 7.28  7.1 "3.84M"    -.55
              "Sep 14, 2017" 7.22  7.3  7.3 7.18 "3.37M"    -1.1
              "Sep 13, 2017"  7.3 7.29  7.3 7.16 "5.12M"     .41
              "Sep 12, 2017" 7.27 7.46 7.46 7.26 "6.99M"   -2.55
              "Sep 11, 2017" 7.46 7.47 7.49 7.42 "4.02M"     .13
              "Sep 08, 2017" 7.45 7.45 7.52 7.43 "2.24M"       0
              "Sep 07, 2017" 7.45  7.5 7.51 7.44 "3.23M"    -.53
              "Sep 06, 2017" 7.49 7.47  7.5 7.43 "3.98M"     .13
              "Sep 05, 2017" 7.48 7.49 7.51 7.43 "4.62M"    -.13
              "Sep 04, 2017" 7.49 7.54 7.54 7.44 "5.54M"    -.66
              "Sep 01, 2017" 7.54 7.57 7.58 7.46 "3.72M"     .13
              "Aug 31, 2017" 7.53 7.61 7.71 7.51 "4.33M"    -.79
              "Aug 30, 2017" 7.59 7.66 7.74 7.59 "4.51M"    -.78
              "Aug 29, 2017" 7.65 7.85 7.94 7.65 "6.98M"   -2.42
              "Aug 28, 2017" 7.84  7.5  8.1 7.46 "14.43M"   4.39
              "Aug 25, 2017" 7.51 7.46 7.62 7.39 "4.23M"     .27
              "Aug 24, 2017" 7.49 7.64 7.64 7.43 "2.31M"   -1.83
              "Aug 23, 2017" 7.63 7.61 7.68 7.51 "2.75M"       0
              "Aug 22, 2017" 7.63 7.65 7.65 7.54 "3.84M"    -.26
              "Aug 21, 2017" 7.65 7.49 7.65 7.43 "4.36M"    2.82
              "Aug 18, 2017" 7.44 7.46 7.47  7.4 "2.61M"     -.8
              "Aug 17, 2017"  7.5 7.46 7.53 7.42 "2.74M"     .81
              "Aug 16, 2017" 7.44 7.49 7.49 7.38 "2.56M"    -.67
              "Aug 15, 2017" 7.49 7.58 7.61 7.45 "2.42M"   -1.06
              "Aug 14, 2017" 7.57 7.46  7.6 7.42 "2.48M"    2.02
              "Aug 11, 2017" 7.42 7.62 7.64  7.4 "3.59M"   -3.51
              "Aug 10, 2017" 7.69 7.71 7.96 7.51 "3.88M"    -.65
              "Aug 09, 2017" 7.74 7.65 7.85 7.56 "4.23M"     .65
              "Aug 08, 2017" 7.69 7.51 7.74 7.42 "5.05M"    2.53
              "Aug 07, 2017"  7.5 7.58 7.64 7.35 "5.59M"   -1.32
              end
              I guess that's where the error arose. However, my senior colleague has just helped me to gain access to Datastream and it's going live for me on Monday. Due to the data constraint I have (I mentioned before that I could not find data that goes back to, say, year 2000, on investing.com), Datastream is going to make my life much more easier, this also means I will be getting a new set of data in a possibly different format, I will keep you posted. I certainly do not want to give up finding what went wrong in the conundrum I am facing right now, but a new set of data means a new page for me and the most important thing is that I certainly do not want to waste your time on solving a problem that is going away, although not in a sense that it is 'solved'.

              Thank you very much.
              Last edited by sladmin; 09 Apr 2018, 08:55. Reason: anonymize poster

              Comment


              • #22
                Stata has no difficulty accepting spaces in filenames, but you have to enclose the filename in double-quotes. So it has to be -use `"`s'"', clear-, not -use `s', clear- if there will be blanks.

                Are you working under a tight deadline for this project? If not, it might be best for you to take a break and learn the fundamentals of Stata. The problems you're pointing out here are, for the most part, errors in the code I have shown you. But I have only limited ability to test out the code, and little access to information like the filenames involved. These are things that, with a basic knowledge of Stata under your belt, you would be able to recognize and fix on your own. It will take you some time to do this, but in the end you will progress much faster if you can fix these glitches on your own and not have to make a post and then wait for a response. To get the basics, go to the PDF documentation that comes with your Stata installation (Select PDF Documentation on the Help menu) and read the Getting Started [GS] and Users' Guide [U] volumes. These will acquaint you with the basic approach to data management and analysis that Stata uses, and will also teach you the basic commands that every user needs to know to work effectively in Stata. You won't remember every detail, but you will learn enough that in situations like the ones you are encountering here, you will be able to identify which commands are likely to be involved in solving your problem, and then you can refer to the -help- files or the PDF documentation for more details. You will also, in many circumstances, be able to troubleshoot problems in the code you are running.

                If you are working under too tight a deadline to do this, then I recommend that you make this your top priority once this project is completed.

                There is no misunderstanding between us about the existence of a variable called firm in the data sets you have. In fact, I specifically mentioned earlier in this thread that because there is no such variable in the data you are getting, that I was assuming that you could identify the firms by the filename. And the command -gen firm = `"`s'"'- is there precisely to create such a variable in the combined file. Clearly you need to be able to distinguish which observations go with which firms to do the actual herding analysis, so you have to make such a variable if it isn't already there.

                Comment


                • #23
                  Dear Clyde,

                  I am embarrassed to have you point that out for me, but yes, my supervisor is rather occupied at the moment so not much help can/will be provided to me. However, the deadline is still there to be met. I still hope once I got the data, I will be able to figure things out, following what you've taught me in the previous posts. As personal as it gets, I have around less than 2 weeks to run the regressions I've mentioned before and probably one or two robustness tests before I can call it a day. I hope your experience in the field of teaching does not tell you that it is an unrealistic goal.

                  Thank you very much.

                  Comment


                  • #24
                    I suspect that many of the long threads on this Forum arise from circumstances similar to yours: a person who is a Stata beginner (we were all beginners once) being handed a project that requires more than a beginner's knowledge of Stata and a tight deadline, but little in the way of support and supervision. The positive side of this is that you will learn a lot in a short period of time. It's a bit like learning to swim by being thrown into the deep end of the pool, and it's probably not the most effective way to learn, but...

                    I can't assess how feasible it is for you to conclude this problem in 2 weeks. I don't know what other things are at the moment competing for your time. Presumably you cannot spend 2 weeks working on it 24/7. Perhaps you have only 15 minutes a day to devote to this; perhaps you can spend several hours a day on it. Also, I can't predict what problems, if any, will be presented by the data that you get. Some data sets are pretty easy to work with and require little cleaning. Some are a mess and require far more effort to clean up than what is involved in the ultimate analysis. What I can say is that an experienced Stata programmer, given clean data sets, could complete this problem in less than a day, probably even less than an hour, of uninterrupted effort.

                    What I can say is that I think that at this point you have a framework of code for importing the data sets and combining them. You also have code in #11 to calculate CSAD and perform the regression. So it is a matter of how easy or difficult it turns out to be to adapt the code you have to the data you get and adapt the data to the code.

                    You may also have some questions at the end about interpreting the results, and that could take some time depending on your level of statistical sophistication and experience with linear regressions.

                    Comment


                    • #25
                      Dear Clyde,

                      Thanks for your advice, it is, of course, as useful as always. I have a good news- through hours of research and trial-and-error, I have gained access to Datastream and learnt to acquire data in the most time-efficient manner. If you could recall, in short, I will have 4 panels. Unfortunately, the data looks a little different from what it was, and I could not run -dataex- command because apparently the 'input statement exceeds linesize limit.' and Stata has asked me to 'Try specifying fewer variables'. Now, let's go back to the example we discussed, you have taught me how to generate Rm,t and that is not a problem, I could apply it to the new set of data I have. However, I still face difficulty in finding Ri,t, since I could not execute the -dataex- command, I think a screenshot would (more or less) suffice. Below is what I see when I run -describe-:

                      Code:
                      . describe
                      
                      Contains data
                        obs:         2,610                          
                       vars:            49                          
                       size:     1,007,460                          
                      ------------------------------------------------------------------------------
                                    storage   display    value
                      variable name   type    format     label      variable label
                      ------------------------------------------------------------------------------
                      Name            int     %td..                 Name
                      HANGZHOUSTMTU~B double  %10.0g                HANGZHOU STM.TURBINE 'B'
                      SHANDONGAIRLB   double  %10.0g                SHANDONG AIRL.'B'
                      SHNCHIWANPETR~E double  %10.0g                SHN.CHIWAN PETROLEUM SUPP.BASE
                                                                      'B'
                      CHONGQINGJIAN~M double  %10.0g                CHONGQING JIANSHE VEHICLE SYSTEM
                                                                      'B'
                      DONGFENGSCITE~B double  %10.0g                DONGFENGSCI-TECH GROUP 'B'
                      FOSHANHUAXINP~B double  %10.0g                FOSHAN HUAXIN PACK. 'B'
                      GUANGDONGJADI~B double  %10.0g                GUANGDONG JADIETE HDG. 'B'
                      SHANDONGZHONG~H double  %10.0g                SHANDONG ZHONGLU OCEANIC FISH.
                                                                      'B'
                      TSANNKUENENTERB double  %10.0g                TSANN KUEN ENTER.'B'
                      WAFANGDIANBRGB  double  %10.0g                WAFANGDIAN BRG. 'B'
                      ANHUIGUJINGDI~B double  %10.0g                ANHUI GUJING DIST. 'B'
                      BENGANGSTLPLA~B double  %10.0g                BENGANG STL.PLATES 'B'
                      BOETECHGPB      double  %10.0g                BOE TECH.GP.'B'
                      CHANGCHAIB      double  %10.0g                CHANGCHAI 'B'
                      CHINAFANGDAGR~B double  %10.0g                CHINA FANGDA GROUP 'B'
                      CHINANATACCOR~B double  %10.0g                CHINA NAT.ACCORD MDC.'B'
                      CHONGQINGCHAN~B double  %10.0g                CHONGQING CHANGAN AUTMB. 'B'
                      CSGHOLDINGB     double  %10.0g                CSG HOLDING 'B'
                      DALIANREFRIGB   double  %10.0g                DALIAN REFRIG. 'B'
                      DONGXUOTTECHN~B double  %10.0g                DONGXU OT.TECHNOLOGY 'B'
                      FAWERAUTOMOTI~B double  %10.0g                FAWER AUTOMOTIVE PARTS 'B'
                      FIYTAHOLDINGSB  double  %10.0g                FIYTA HOLDINGS 'B'
                      FOSHANELECTLTGB double  %10.0g                FOSHAN ELECT.& LTG.'B'
                      GUANGDONGELEC~B double  %10.0g                GUANGDONG ELEC.PWR.DEV. 'B'
                      GUANGDONGPRVL~B double  %10.0g                GUANGDONG PRVL.EXPR.DEV. 'B'
                      HAINANDONGHAI~G double  %10.0g                HAINAN DONGHAI TOURISM NTRE HDG.
                                                                      'B'
                      HAINANJINGLIA~B double  %10.0g                HAINAN JINGLIANG HOLDINGS 'B'
                      HEFEIMEILINGB   double  %10.0g                HEFEI MEILING 'B'
                      HUBEISANONDAB   double  %10.0g                HUBEI SANONDA 'B'
                      JIANGLINGMOTO~B double  %10.0g                JIANGLING MOTORS 'B'
                      KONKAGROUPB     double  %10.0g                KONKA GROUP 'B'
                      LUTHAITEXJOIN~B double  %10.0g                LU THAI TEX.JOINT STK. 'B'
                      SHANDONGCHENM~B double  %10.0g                SHANDONG CHENMING PAPER HDG.'B'
                      SHENZHENNANSH~B double  %10.0g                SHENZHEN NANSHAN PWR.'B'
                      SHENZHENPROPS~B double  %10.0g                SHENZHEN PROPS.& RES. DEV.'B'
                      SHENZHENSEGB    double  %10.0g                SHENZHEN SEG 'B'
                      SHENZHENTEXHDGB double  %10.0g                SHENZHEN TEX.(HDG.) 'B'
                      SHNCHINBICYCLEB double  %10.0g                SHN.CHIN.BICYCLE 'B'
                      SHNCHIWANWHAR~P double  %10.0g                SHN.CHIWAN WHARF HDG.'B' SUSP -
                                                                      SUSP.20/11/17
                      SHNSHENBAOIND~P double  %10.0g                SHN.SHENBAO INDL.'B' SUSP -
                                                                      SUSP.22/08/17
                      SHNTELLUSHLDGB  double  %10.0g                SHN.TELLUS HLDG.'B'
                      SHNVCTONWARDT~A double  %10.0g                SHN.VCT.ONWARD TEXTILE INDUSTRIA
                                                                      'B'
                      SHNWONGTEEINT~B double  %10.0g                SHN.WONGTEE INTL.ENTER. 'B'
                      SHNZHONGHENGH~B double  %10.0g                SHN.ZHONGHENG HUAFA 'B'
                      SZSEZRLSTPROP~B double  %10.0g                SZ SEZ RLST.& PROPS. (GP.) 'B'
                      WEIFUHIGHTECH~B double  %10.0g                WEIFU HIGH TECH.GP.'B'
                      WUXILITTLESWANB double  %10.0g                WUXI LITTLE SWAN 'B'
                      YANTAICHANGYU~B double  %10.0g                YANTAI CHANGYU PION.WINE 'B'
                      ------------------------------------------------------------------------------
                      Sorted by:
                           Note: Dataset has changed since last saved.
                      
                      .
                      The variables are all the names of the firms, and the data listed below the variables are rightfully the price.
                      Click image for larger version

Name:	cropped data.png
Views:	1
Size:	151.6 KB
ID:	1434220



                      Above is a screenshot of the data (of which I am terribly, terribly sorry about the shrinkage, my hands are tied). B, C, D ... are, as mentioned above, the firms' name, but essentially they really are the stock prices of the firms. I am using a 10 years data, hence, I guess that's why Stata can't generate an example data simply because the sample size is more than a million. Please let me know if I have confused you (again). I am doing exactly the same thing and I am looking at herding in the Shenzhen B shares market, with 48 individual B-shares firms this time.

                      Thank you.
                      Last edited by sladmin; 09 Apr 2018, 08:55. Reason: anonymize poster

                      Comment


                      • #26
                        The screenshot, predictably enough, is unreadable on my computer. In your -describe- listing, I don't see a variable with a name that looks like it gives the date, and it isn't possible to proceed without that. But I notice that the variable called "Name" is, paradoxically, both an integer variable and is formatted as a date. So I'm going to guess that "Name" is actually the date variable. In this case, the next step is to -reshape- the data to long layout. After that it will be suitably arranged to use with the code given in earlier posts in this thread. Based on your -describe- output, I will assume that all of the variables in this data set are firm names (or stock names) except for the "Name" (really date) variable.

                        Code:
                        rename Name date
                        
                        // BUILD A LIST OF STOCK NAMES
                        ds date, not // ALL VARIABLES BUT DATE ARE STOCK PRICES
                        local stocks `r(varlist)'
                        
                        //  PREFIX ALL THE STOCK NAMES WITH UNDERSCORE
                        rename (`r(varlist)') _=
                        
                        //  RESHAPE LONG
                        reshape long _, i(date) j(firm) string
                        
                        //  RENAME _ MEANINGFULLY
                        rename _ price
                        
                        isid firm date, sort
                        save 34_firms_data, replace
                        This new 34_firms_data.dta file should be suitable for use with the code from the earlier posts in the thread. Notice that you don't have to loop over a bunch of different files to put it together: the data came in one file, it just needed to be re-organized.

                        Comment


                        • #27
                          Dear Clyde,

                          Thank you so much! I have had major progress in my regressions. I am almost done with my work and I have taken care of my robustness test using dummy variables successfully via learning from Youtube and the Stata manual, however, do you think I should detrend my data with the -i.year- command to avoid false correlation? As far as I am concerned, time-series data has tendency to be auto-correlated. How do I add lagged dependent variables to take care of the situation? I do not think higher orders will be relevant in my case so I'd only need my tests to be adjusted with AR(1). As mentioned above, I have my regressions done, if I have to perform the aforementioned modifications, that is, the detrending and AR(1), do I have to start all over again?

                          Thank you.

                          Comment


                          • #28
                            The question of whether to detrend your analysis by including time in the model is a content-based question that I cannot answer. If time trends are substantively important effects on your outcome variables, then the answer is yes. But not being in your discipline and knowing nothing about the subject matter of your analysis, I can't advise you whether that is the case or not. Similarly, I cannot advise you on the need for or suitability of modeling autoregressive correlation structure in your data. In this case, I also can't advise you about the Stata commands for doing so--this just isn't something that comes up in my work and I've never learned how to do it.

                            What I can tell you is that it should not be necessary to re-build your data sets from scratch to do this. You should be able to start with the built-up and merged Stata data set you created along the way, and just revise the final analysis part.

                            Comment


                            • #29
                              Dear Clyde,

                              Thank you for your swift reply. I have done the detrending after discussing with my colleagues, perhaps I will post a question about modelling an AR(1) structure in the forum. I really, really can't thank you enough for your help throughout this period of stress, I believe I am nearly there in my job. Your expertise and your passion to help are inspiring, what I have learnt from you is definitely not bounded by Stata- it goes way further than that.

                              Thank you so much, Clyde.

                              Comment


                              • #30
                                Dear Clyde,

                                I am sorry to re-bump this thread. I am facing a relevant, but much more sophisticated problem now.

                                I am trying to fit a GARCH (1,1)-in-mean model in my regression, as specified in the screenshot below:

                                Click image for larger version

Name:	robustness.PNG
Views:	2
Size:	5.9 KB
ID:	1435679

                                As you can see, there will be three new elements here, RF is the (readily available) demand deposit rate, which is the token of the 'Risk Free' rate of return, and EPS, which is the earnings yield, derived by dividing share price from Earning Per Share (which I can acquire separately), and lastly, the conditional variance. I am only using daily data here, how should I go about calculating the EPS for each firm in Stata over my period of interest? Also, how do I specify the conditional variance as shown above? I tried looking back at your previous posts but I find myself running in circles.

                                Thank you.

                                Comment

                                Working...
                                X