Forum software's rendering of "greater than," "ampersand" etc.

Mike Lacy

Join Date: Apr 2014

Posts: 2417
#1

Forum software's rendering of "greater than," "ampersand" etc.

02 Jul 2019, 11:22

I'd like to know how to avoid having the forum software translate the conventional symbols for such things as "greater than," "less than," "and" into something like ">" or "amp&" My impression is that this doesn't always happen when such symbols are included in a StataList post, but many times it does.

There must be some place where this behavior of the forum software is explained, but I haven't found it, so a pointer would be appreciated.
Tags: None
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#2

02 Jul 2019, 12:41

It is the same. In the ASCII world there was only one code for every character and one way to enter it. In the Unicode HTML there are many different ways of entering the same character. See e.g. here:
http://www.fileformat.info/info/unic...r/3e/index.htm

The forum software should convert your content 'less than' into an entity to signify that that angle bracket is not starting the HTML control tag, but rather is part of the content that should be rendered as is.

Where is it causing problems?

Best regards, Sergiy
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

02 Jul 2019, 12:53

I'm not totally clear on this, but I do not recall having seen this happen when I am looking at www.statlist.org in a web browser - neither in the body of the post nor in the title of the topic, But I could be forgetting something, as I often do.

Where I have often seen it happen is in the display of the title of a Statalist post in my RSS or in the subject line of emails for a subscribed topic.

Dredging up memories from an earlier life, there are 5 characters that are special to HTML and thus to get their unspecial meaning they must be "escaped" which in HTML means using a character reference consisting of an ampersand followed by a "character name" or numeric encoding followed by a semicolon. So the (suitably surrounded) character names

Code:

amp lt gt quot apos

represent

Code:

& < > " '

when the HTML is rendered for presentation.

More on this at
https://en.wikipedia.org/wiki/HTML#C...ity_references
I hypothesize that the processes that extract text from the forum to create the RSS feed and the subscription emails does not substitute the original characters for the character references. And thus we see examples like the following screenshot of this topic - containing prolific quotation marks in the title - in my RSS reader. And I suspect it will appear similarly in the subscription email I receive if there are further posts to this topic.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

02 Jul 2019, 13:01

Having successfully gotten the screen shot into the previous post on the first try, I don't want to upset things by editing the post. (I do not have good luck inserting graphic attachments - often what looks good in preview is not so good when posted. Or ceases looking good if the post is edited. Or vanishes completely to be replaced by the sad little blue box with the question mark.)

My comment and Sergiy's crossed, and his triggered another memory. I think I have seen this substitution of character references in forum posts and the like when material has been copied from an earlier post and pasted into a subsequent post. (Sort of as if the copy got the raw HTML rather than the substituted characters.) But that's just a sense I have; I can't point to any examples; I don't remember the exact circumstances; and I've never tried to reproduce it.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2417
#5

02 Jul 2019, 13:07

Thanks Bill and Sergiy. An example of this kind of problem occurs in the code I posted in a previous thread.
I was prompted to ask now because a reader was (understandably) confused by some errant symbols in my code. At least in my browser (not RSS) , I get those same pesky stray ">" and the like that confused the reader. Now, just for curiosity, I'll put in some symbols here to see what happens (lt, gt, and), and put them inside [CODE] tags. Everything looks fine on Preview, but if I recall my past experience, I've had it look right on Preview but not later, bizarrely enough. Noting Bill's comments about copying, it does occur to me that I usually type up code in a text editor and paste it into StataList posts, so I'll try that too.

Typed directly:

Code:

2 < 10 9 > 3 this & that

Pasted from a text editor:

Code:

2 < 10 9 > 3 this & that
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

02 Jul 2019, 13:25

Just a quick note as I leave for a while: I've looked at your previous example in my web browser and do see the same character references that you do. Again, I think it has something to do with how you copied the code from the earlier post of the same code and pasted it into a new code block. But it will take some experimentation to see if my vague ideas on how to reproduce the problem can pan out.

I do not think either typing directly or pasting from a text editor will create the problem demonstrated in your code block. I think it comes from copying from something that already contains HTML character references.

My experience is that preview is not always a faithful preview.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#7

02 Jul 2019, 17:09

A small amount of play in the Sandbox was not able to reproduce the problem you demonstrated in posts #9 & #10 of the referenced topic.

I now have seen two problems that I believe are both induced somehow by copying from CODE blocks. This is one of them.

The other is that thing we've seen where you copy code from a code block and paste it into Stata and you wind up with cryptic syntax errors for which the only cure is to retype the line in question entirely and delete the pasted original. I spent a morning one day trying my best to reproduce that problem. At one point I had a posted code block where in a sequence of spaces every other space had been replaced with the A0 no-break space character. (Unfortunately I edited that post, and after saving the changes - to an entirely different section of text - the problem was gone from the post.) So the forum software seems to be working some magic behind the scenes with CODE blocks and that magic isn't always right.

This is clearly Not a Good Thing. The stuff I described in post #3 is just another annoying side effect of modern complexity, although it deserves to be fixed. But problems with CODE blocks should be fixed. Perhaps this should be raised with Stata Technical Services?

with either the moderators (at Contact Us) or to Stata Technical Services.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2417
#8

03 Jul 2019, 07:29

Bill, thanks for your work here. It is possible, I think, that I was copying from CODE blocks while preparing these postings. Maybe I can something close enough to a reproducible example to offer to Tech Services.
Comment

Announcement

Forum software's rendering of "greater than," "ampersand" etc.

Comment

Comment

Comment

Comment

Comment

Comment

Comment