Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why does the viewer show smcl-code?

    I have a short and a long log file in smcl format, log1.smcl and log2.smcl. Why does the viewer show the contents of the second file with the following smcl code (marked in red by me)
    Code:
             448 Guernsey (UK){smcl}
    {res}{asis}         450 Denmark
    in rows 231-232 whereas the same text in the first file (here in rows 4-5) appears in the viewer without this "extraneous" smcl code?

    Here the content of the first file (log1.smcl):
    Code:
    {smcl}
    {com}{sf}{ul off}{res}{txt}isrd_sample:
    {res}{asis}         447 Isle of Man (UK){smcl}
    {res}{asis}         448 Guernsey (UK){smcl}
    {res}{asis}         450 Denmark{smcl}
    {res}{asis}           3 not usable{smcl}
    {err}variable p_weight has no value label attached
    {res}{err}variable a_weight has no value label attached
    {smcl}
    {err}{sf}{ul off}
    and here the content of the second file (log2.smcl):
    Code:
    {smcl}
    {com}{sf}{ul off}{res}{txt}isrd_sample:
    {res}{asis}           0 extended{smcl}
    {res}{asis}           1 original (grades 7-9){smcl}
    {err}variable caseid has no value label attached
    {res}{txt}natsamp:
    {res}{asis}           0 city based sample{smcl}
    {res}{asis}           1 national sample{smcl}
    {txt}srvmode:
    {res}{asis}           1 paper & pencil{smcl}
    {res}{asis}           2 online (unipark){smcl}
    {res}{asis}           3 online (fluid){smcl}
    {res}{asis}           4 online (other){smcl}
    {txt}srvlang:
    {res}{asis}           1 Abkhaz{smcl}
    {res}{asis}           2 Afar{smcl}
    {res}{asis}           3 Afrikaans{smcl}
    {res}{asis}           4 Akan{smcl}
    {res}{asis}           5 Albanian{smcl}
    {res}{asis}           6 Amharic{smcl}
    {res}{asis}           7 Arabic{smcl}
    {res}{asis}           8 Aragonese{smcl}
    {res}{asis}           9 Armenian{smcl}
    {res}{asis}          10 Assamese{smcl}
    {res}{asis}          11 Avaric{smcl}
    {res}{asis}          12 Avestan{smcl}
    {res}{asis}          13 Aymara{smcl}
    {res}{asis}          14 Azerbaijani{smcl}
    {res}{asis}          15 Bambara{smcl}
    {res}{asis}          16 Bashkir{smcl}
    {res}{asis}          17 Basque{smcl}
    {res}{asis}          18 Belarusian{smcl}
    {res}{asis}          19 Bengali, Bangla{smcl}
    {res}{asis}          20 Bihari{smcl}
    {res}{asis}          21 Bislama{smcl}
    {res}{asis}          22 Bosnian{smcl}
    {res}{asis}          23 Breton{smcl}
    {res}{asis}          24 Bulgarian{smcl}
    {res}{asis}          25 Burmese{smcl}
    {res}{asis}          26 Catalan{smcl}
    {res}{asis}          27 Chamorro{smcl}
    {res}{asis}          28 Chechen{smcl}
    {res}{asis}          29 Chichewa, Chewa, Nyanja{smcl}
    {res}{asis}          30 Chinese{smcl}
    {res}{asis}          31 Chuvash{smcl}
    {res}{asis}          32 Cornish{smcl}
    {res}{asis}          33 Corsican{smcl}
    {res}{asis}          34 Cree{smcl}
    {res}{asis}          35 Croatian{smcl}
    {res}{asis}          36 Czech{smcl}
    {res}{asis}          37 Danish{smcl}
    {res}{asis}          38 Divehi, Dhivehi, Maldivian{smcl}
    {res}{asis}          39 Dutch{smcl}
    {res}{asis}          40 Dzongkha{smcl}
    {res}{asis}          41 Eastern Punjabi, Eastern Panjabi{smcl}
    {res}{asis}          42 English{smcl}
    {res}{asis}          43 Esperanto{smcl}
    {res}{asis}          44 Estonian{smcl}
    {res}{asis}          45 Ewe{smcl}
    {res}{asis}          46 Faroese{smcl}
    {res}{asis}          47 Fijian{smcl}
    {res}{asis}          48 Finnish{smcl}
    {res}{asis}          49 French{smcl}
    {res}{asis}          50 Fula, Fulah, Pulaar, Pular{smcl}
    {res}{asis}          51 Galician{smcl}
    {res}{asis}          52 Ganda{smcl}
    {res}{asis}          53 Georgian{smcl}
    {res}{asis}          54 German{smcl}
    {res}{asis}          55 Greek{smcl}
    {res}{asis}          56 Guaraní{smcl}
    {res}{asis}          57 Gujarati{smcl}
    {res}{asis}          58 Haitian, Haitian Creole{smcl}
    {res}{asis}          59 Hausa{smcl}
    {res}{asis}          60 Hebrew{smcl}
    {res}{asis}          61 Herero{smcl}
    {res}{asis}          62 Hindi{smcl}
    {res}{asis}          63 Hiri Motu{smcl}
    {res}{asis}          64 Hungarian{smcl}
    {res}{asis}          65 Icelandic{smcl}
    {res}{asis}          66 Ido{smcl}
    {res}{asis}          67 Igbo{smcl}
    {res}{asis}          68 Indonesian{smcl}
    {res}{asis}          69 Interlingua{smcl}
    {res}{asis}          70 Interlingue{smcl}
    {res}{asis}          71 Inuktitut{smcl}
    {res}{asis}          72 Inupiaq{smcl}
    {res}{asis}          73 Irish{smcl}
    {res}{asis}          74 Italian{smcl}
    {res}{asis}          75 Japanese{smcl}
    {res}{asis}          76 Javanese{smcl}
    {res}{asis}          77 Kalaallisut, Greenlandic{smcl}
    {res}{asis}          78 Kannada{smcl}
    {res}{asis}          79 Kanuri{smcl}
    {res}{asis}          80 Kashmiri{smcl}
    {res}{asis}          81 Kazakh{smcl}
    {res}{asis}          82 Khmer{smcl}
    {res}{asis}          83 Kikuyu, Gikuyu{smcl}
    {res}{asis}          84 Kinyarwanda{smcl}
    {res}{asis}          85 Kirundi{smcl}
    {res}{asis}          86 Komi{smcl}
    {res}{asis}          87 Kongo{smcl}
    {res}{asis}          88 Korean{smcl}
    {res}{asis}          89 Kurdish{smcl}
    {res}{asis}          90 Kwanyama, Kuanyama{smcl}
    {res}{asis}          91 Kyrgyz{smcl}
    {res}{asis}          92 Lao{smcl}
    {res}{asis}          93 Latin{smcl}
    {res}{asis}          94 Latvian{smcl}
    {res}{asis}          95 Limburgish, Limburgan, Limburger{smcl}
    {res}{asis}          96 Lingala{smcl}
    {res}{asis}          97 Lithuanian{smcl}
    {res}{asis}          98 Luba-Katanga{smcl}
    {res}{asis}          99 Luxembourgish, Letzeburgesch{smcl}
    {res}{asis}         100 Macedonian{smcl}
    {res}{asis}         101 Malagasy{smcl}
    {res}{asis}         102 Malay{smcl}
    {res}{asis}         103 Malayalam{smcl}
    {res}{asis}         104 Maltese{smcl}
    {res}{asis}         105 Manx{smcl}
    {res}{asis}         106 Māori{smcl}
    {res}{asis}         107 Marathi{smcl}
    {res}{asis}         108 Marshallese{smcl}
    {res}{asis}         109 Mongolian{smcl}
    {res}{asis}         110 Nauruan{smcl}
    {res}{asis}         111 Navajo, Navaho{smcl}
    {res}{asis}         112 Ndonga{smcl}
    {res}{asis}         113 Nepali{smcl}
    {res}{asis}         114 Northern Ndebele{smcl}
    {res}{asis}         115 Northern Sami{smcl}
    {res}{asis}         116 Norwegian{smcl}
    {res}{asis}         117 Norwegian Bokmål{smcl}
    {res}{asis}         118 Norwegian Nynorsk{smcl}
    {res}{asis}         119 Nuosu{smcl}
    {res}{asis}         120 Occitan{smcl}
    {res}{asis}         121 Ojibwe, Ojibwa{smcl}
    {res}{asis}         122 Old Church Slavonic, Church Slavonic, Old Bulgarian{smcl}
    {res}{asis}         123 Odia (Oriya){smcl}
    {res}{asis}         124 Oromo{smcl}
    {res}{asis}         125 Ossetian, Ossetic{smcl}
    {res}{asis}         126 Pāli{smcl}
    {res}{asis}         127 Pashto, Pushto{smcl}
    {res}{asis}         128 Persian (Farsi){smcl}
    {res}{asis}         129 Polish{smcl}
    {res}{asis}         130 Portuguese{smcl}
    {res}{asis}         131 Quechua{smcl}
    {res}{asis}         132 Romanian{smcl}
    {res}{asis}         133 Romansh{smcl}
    {res}{asis}         134 Russian{smcl}
    {res}{asis}         135 Samoan{smcl}
    {res}{asis}         136 Sango{smcl}
    {res}{asis}         137 Sanskrit (Saṁskṛta){smcl}
    {res}{asis}         138 Sardinian{smcl}
    {res}{asis}         139 Scottish Gaelic, Gaelic{smcl}
    {res}{asis}         140 Serbian{smcl}
    {res}{asis}         141 Shona{smcl}
    {res}{asis}         142 Sindhi{smcl}
    {res}{asis}         143 Sinhalese, Sinhala{smcl}
    {res}{asis}         144 Slovak{smcl}
    {res}{asis}         145 Slovene{smcl}
    {res}{asis}         146 Somali{smcl}
    {res}{asis}         147 Southern Ndebele{smcl}
    {res}{asis}         148 Southern Sotho{smcl}
    {res}{asis}         149 Spanish{smcl}
    {res}{asis}         150 Sundanese{smcl}
    {res}{asis}         151 Swahili{smcl}
    {res}{asis}         152 Swati{smcl}
    {res}{asis}         153 Swedish{smcl}
    {res}{asis}         154 Tagalog{smcl}
    {res}{asis}         155 Tahitian{smcl}
    {res}{asis}         156 Tajik{smcl}
    {res}{asis}         157 Tamil{smcl}
    {res}{asis}         158 Tatar{smcl}
    {res}{asis}         159 Telugu{smcl}
    {res}{asis}         160 Thai{smcl}
    {res}{asis}         161 Tibetan Standard, Tibetan, Central{smcl}
    {res}{asis}         162 Tigrinya{smcl}
    {res}{asis}         163 Tonga (Tonga Islands){smcl}
    {res}{asis}         164 Tsonga{smcl}
    {res}{asis}         165 Tswana{smcl}
    {res}{asis}         166 Turkish{smcl}
    {res}{asis}         167 Turkmen{smcl}
    {res}{asis}         168 Twi{smcl}
    {res}{asis}         169 Ukrainian{smcl}
    {res}{asis}         170 Urdu{smcl}
    {res}{asis}         171 Uyghur{smcl}
    {res}{asis}         172 Uzbek{smcl}
    {res}{asis}         173 Venda{smcl}
    {res}{asis}         174 Vietnamese{smcl}
    {res}{asis}         175 Volapük{smcl}
    {res}{asis}         176 Walloon{smcl}
    {res}{asis}         177 Welsh{smcl}
    {res}{asis}         178 Western Frisian{smcl}
    {res}{asis}         179 Wolof{smcl}
    {res}{asis}         180 Xhosa{smcl}
    {res}{asis}         181 Yiddish{smcl}
    {res}{asis}         182 Yoruba{smcl}
    {res}{asis}         183 Zhuang, Chuang{smcl}
    {res}{asis}         184 Zulu{smcl}
    {res}{asis}         185 Other{smcl}
    {res}{asis}         997 ambiguous answer{smcl}
    {res}{asis}         999 no answer{smcl}
    {err}variable srvdate has no value label attached
    {res}{txt}country:
    {res}{asis}          10 United States{smcl}
    {res}{asis}          11 Canada{smcl}
    {res}{asis}          12 Midway Island (USA){smcl}
    {res}{asis}          13 Wake Island (USA){smcl}
    {res}{asis}          70 Russia{smcl}
    {res}{asis}          76 Kazakhstan{smcl}
    {res}{asis}         200 Egypt{smcl}
    {res}{asis}         270 South Africa{smcl}
    {res}{asis}         300 Greece{smcl}
    {res}{asis}         310 Netherlands{smcl}
    {res}{asis}         320 Belgium{smcl}
    {res}{asis}         321 Flanders (BE){smcl}
    {res}{asis}         322 Wallonia (BE){smcl}
    {res}{asis}         330 France{smcl}
    {res}{asis}         340 Spain{smcl}
    {res}{asis}         360 Hungary{smcl}
    {res}{asis}         390 Italy{smcl}
    {res}{asis}         391 Vatican{smcl}
    {res}{asis}         400 Romania{smcl}
    {res}{asis}         410 Switzerland{smcl}
    {res}{asis}         430 Austria{smcl}
    {res}{asis}         440 United Kingdom{smcl}
    {res}{asis}         441 England and Wales (UK){smcl}
    {res}{asis}         442 Northern Ireland (UK){smcl}
    {res}{asis}         443 Scotland (UK){smcl}
    {res}{asis}         445 Jersey (UK){smcl}
    {res}{asis}         447 Isle of Man (UK){smcl}
    {res}{asis}         448 Guernsey (UK){smcl}
    {res}{asis}         450 Denmark{smcl}
    {res}{asis}           3 not usable{smcl}
    {err}variable p_weight has no value label attached
    {res}{err}variable a_weight has no value label attached
    {smcl}
    {err}{sf}{ul off}

  • #2
    I cannot replicate the problem. I copied both code snippets verbatim into the Stata 18 (and Stata 16.1) do-file editor and saved the files as log1.smcl and log2.smcl. Then I typed
    Code:
    view log1.smcl
    view log2.smcl
    I do not see a {smc} tag in either of the viewed files.

    Comment


    • #3
      This is strange: I can reproduce the problem using Stata 18 and Stata 17. Maybe this depends on the operating system? I am using Linux (Ubuntu 20.04). Unfortunately, I can't try it with Windows or MacOS.

      Comment


      • #4
        Originally posted by Dirk Enzmann View Post
        Maybe this depends on the operating system? I am using Linux (Ubuntu 20.04).
        It's possible. I am on Windows 10.

        Comment


        • #5
          I was able to reproduce the problem on Windows, Mac, and Unix. I suspect what's happening is that Stata's SMCL parser is reading the contents of the file into a buffer. The buffer's size is smaller than the file so it has to be read in buffer sized chunks and one of the SMCL tags is not fully contained within the buffer. The parser is supposed to be robust to this so we'll look into it. That's just a guess and it may be more involved than that.

          The reason Daniel couldn't reproduce the problem is because he's on Windows and the default line endings in the Do-file Editor in Stata for Windows is DOS which contains a carriage return and a line feed instead of the Unix line ending of just a line feed that Unix and Mac use. The extra carriage return character from DOS line endings shifted the position of the where the SMCL tags are in the buffer.
          Last edited by Chinh Nguyen (StataCorp); 15 Aug 2024, 06:52.
          -Chinh Nguyen

          Comment


          • #6
            We investigated and have determined that although not documented as such, the SMCL renderer prefers that the {smcl} tag be on its own line. The renderer in general can handle that tag not being on its own line except when it encounters an incomplete tag due to it not fitting within its buffer for parsing. We might be able to make the renderer robust to that situation but I can't make any promise that it'll happen anytime soon so it's much better if whatever is generating that SMCL output is changed to always output the {smcl} tag on its own line.
            -Chinh Nguyen

            Comment


            • #7
              Originally posted by Chinh Nguyen (StataCorp) View Post
              it's much better if whatever is generating that SMCL output is changed to always output the {smcl} tag on its own line.
              Thanks for looking into this.


              If whatever is generating that SMCL output happens to be elabel (SSC, SJ, or GitHub), I am unsure what the best fix might be. I am using Mata to replicate the output of label list as:
              Code:
              printf("{res}{asis}%12.0g %s{smcl}\n", vvec()[i], tvec()[i])
              where vvec() and tvec() access the values and labels but these details are irrelevant. The only reason for putting the {asis} and {smcl} tags there in the first place is to print SMCL as-is when it is part of the value label itself. Consider
              Code:
              . label define foo 42 "foo {it}bar" 73 "bar"
              Here is what label list will produce:
              Code:
              . label list foo
              foo:
                        42 foo {it}bar
                        73 bar
              And here is what replicating the thing in Mata looks like without switching back and forth between {asis} and {smcl} tags:
              Code:
              . mata {
              >    
              >     vl = st_vlload("foo",v=.,t="")
              >     printf("{txt}foo:\n")
              >     for (i=1;i<=rows(v);i++)
              >         printf("{res}%12.0g %s\n",v[i],t[i])
              >        
              > }
              foo:
                        42 foo bar
                        73 bar
              Notice that everything after (and including) the first "bar" appears in italics. To avoid that, I used the code shown above. Now, I can get the {smcl} tag into its own line simply as:
              Code:
              printf("{res}{asis}%12.0g %s\n{smcl}\n", vvec()[i], tvec()[i])
              and the additional newline before {smcl} won't show because Stata seems to "ignore" lines that contain only {smcl}. Thus, the output then looks clean (and mimic that of label list):
              Code:
              . mata {
              >    
              >     vl = st_vlload("foo",v=.,t="")
              >    
              >     printf("{txt}foo:\n")
              >     for (i=1;i<=rows(v);i++)
              >         printf("{res}{asis}%12.0g %s\n{smcl}\n",v[i],t[i])
              >    
              > }
              foo:
                        42 foo {it}bar
                        73 bar

              Yet, in a log-file, we would see the extra lines:
              Code:
              {smcl}
              (output omitted)
              {res}{txt}foo:
              {res}{asis}          42 foo {it}bar
              {smcl} <- note the extra line here
              {res}{asis}          73 bar
              {smcl}
              (output omitted)
              which is probably not what we want.

              Alternatively, I might be able to replace all(?) curly braces in the labels with the SMCL equivalents {c -(} and {c )-}. This seems to mimic what label list does, which becomes obvious in a log-file:
              Code:
              {smcl}
              (output omitted).
              . label list foo
              {txt}foo:
              {res}          42 foo {c -(}it{c )-}bar
                        73 bar
              {txt}
              {com}.
              (output omitted)
              However, manipulating the labels themselves instead of how I print them might have unintended side effects.

              I will have to play around with this stuff, and, just like StataCorp, I can't promise that I will find the time to do that anytime soon. Sorry.
              Last edited by daniel klein; 19 Aug 2024, 16:16.

              Comment

              Working...
              X