Third party cookies may be stored when visiting this site. Please see the cookie information.

Perl: Safely handling and showing European Characters on a html page

I have updated my Quiz Engine as used on : Penguin Tutor LPI Exams, and will be applying the same change to First Aid Quiz Web Site when it is upgraded to the latest version.

The problem I had was in accepting user names, in a secure manor. The quiz asks the user for a name, which is shown during the quiz and is logged. The user may just leave it blank and so end up as an anonymous user. I'm always paranoid about input from the users so use strict checks to ensure that the input cannot be used to attack the computer.

In my earlier version the program would only accept alpha numeric characters and spaces. As the First Aid Quiz web site is predominately UK based this didn't cause two many problems. I did get the occassional person with an apostrophe (') in their name, or the occassional ampersand (&). With the Penguin Tutor LPI Quiz I have a much wider international appeal and as a result there are more people wanting to use accents etc. in their name.

I have therefore added support for multiple characters. This may not still be perfect, but should stop a few more people getting error messages.

The method that I have used is pretty much a manual list of regexp replaces to convert certain characters into their html equivelants. The security checking is then performed after this to ensure that no other invalid characters are used.

This is a manual list of over 50 characters. So I initially check to see if we need to perform these translations using:

if ($value =~ /^[\w ]*$/) {return $value;}

I have then got a few special characters I allow:

$value =~ s/\&/\&/g;
$value =~ s/\'/\'/g;
$value =~ s/\"/\"/g;
$value =~ s/\,/\¸/g;

Note that the single quote will only validate with xhtml/xml, and not with HTML 4. Technically this could break the compliance of the current version of the Quiz engine, but as I hope to convert it to xhtml in future it makes more sense to leave it in.

It is important that the first entry is the entry for ampersand (&). If this was included any later then it would perform a replacement on the ampersand used in the other translations.

The rest of the entries are similar, but based on the different characters. I found the details of the correct naming from: HTML Codes.

$value =~ s/Ÿ/\Ÿ/g;
$value =~ s/À/\À/g;
$value =~ s/Á/\Á/g;
$value =~ s/�/\Â/g;
$value =~ s/�/\Ã/g;
$value =~ s/�/\Ä/g;
$value =~ s/�/\Å/g;
$value =~ s/�/\Æ/g;
$value =~ s/�/\Ç/g;
$value =~ s/�/\È/g;
$value =~ s/�/\É/g;
$value =~ s/�/\Ê/g;
$value =~ s/�/\Ë/g;
$value =~ s/�/\Ì/g;
$value =~ s/�/\Í/g;
$value =~ s/�/\Î/g;
$value =~ s/�/\Ï/g;
$value =~ s/�/\Ò/g;
$value =~ s/�/\Ó/g;
$value =~ s/�/\Ô/g;
$value =~ s/�/\Õ/g;
$value =~ s/�/\Ö/g;
$value =~ s/�/\Ù/g;
$value =~ s/�/\Ú/g;
$value =~ s/�/\Û/g;
$value =~ s/�/\Ü/g;
$value =~ s/�/\Ý/g;
$value =~ s/�/\ß/g;
$value =~ s/�/\à/g;
$value =~ s/�/\á/g;
$value =~ s/�/\â/g;
$value =~ s/�/\ã/g;
$value =~ s/�/\ä/g;
$value =~ s/�/\å/g;
$value =~ s/�/\æ/g;
$value =~ s/�/\ç/g;
$value =~ s/�/\è/g;
$value =~ s/�/\é/g;
$value =~ s/�/\ê/g;
$value =~ s/�/\ë/g;
$value =~ s/�/\ì/g;
$value =~ s/�/\í/g;
$value =~ s/�/\î/g;
$value =~ s/�/\ï/g;
$value =~ s/�/\ñ/g;
$value =~ s/�/\ò/g;
$value =~ s/�/\ó/g;
$value =~ s/�/\ô/g;
$value =~ s/�/\ö/g;
$value =~ s/�/\ù/g;
$value =~ s/�/\ú/g;
$value =~ s/�/\û/g;
$value =~ s/�/\ü/g;
$value =~ s/�/\ý/g;
$value =~ s/�/\ÿ/g;

There are possibly better ways of doing this. For example it would have been possible to hold the entries in an array, which would have looked cleaner, but possibly even less efficient. I could order it so that the more common ones are checked first, and then it periodically checks to see if there are still some other special characters.

Note this is designed to be included in a sub-routine (which you can see from the return). So passing the client entered string into $value in the subroutine and then return it at the end.

If anyone has any other suggestions please post a comment with the details. In the meanwhile this can be inserted into your perl code.

» PenguinTutor Facebook page