How I learned to start worrying and love utf-8
If you're only dealing with English-language tunes, and only on this site, none of this should hurt the head too much and you can probably survive without knowing about it. If you want to use accented characters, you should probably read it. If those characters come from anywhere outside the Latin1 regions (Northwest Europe & the Americas ?) you definitely should, and even more so if you plan to be downloading tunes involving such characters for use with other ABC programs.
The bottom line here is that ABC has mostly been used for tunes of the Latin1 regions, and don't necessarily support attempts to use them outside that context. I only know of one ABC program that provides any coherent handling of character sets other than latin1 (iso-8859-1). I regard this as a misfeature which they'll probably grow out of eventually, and have chosen not to propagate it here, preferring to run things on utf8. Thus, I expect the "web-page" aspects of this site - the listings of tune-titles, the forms, etc - to be able to deal with any characters you can find a way to input. If they can't, it's a bug and please complain.
This has the less-than-ideal result of allowing you to enter things that the ABC programs won't be happy with. Wherever the tune is provided as an image, or as MIDI, the site doesn't generate these itself, it makes use of the existing programs, which can't necessarily cope with all of the things you might have input. This imposes limitations on what can be represented in the images generated here. As of now, one of the character sets Latin 1-6 (otherwise known as iso-8859 1-6) has to be specified to the underlying ABC program, for each tune, in order to have the text displayed in an image represented properly. The site takes care of doing this, of course, but the underlying restriction is unavoidable - characters that aren't found in one of these just "won't work" (there can also be unusable combinations of otherwise-possible characters, where they don't both occur in the same one of the character-sets. The '©' (copyright) character, for instance, only seems to exist in latin1, so attempts to use this with characters that can only be found in one of the others will fail. That's why this site doesn't use it when placing text from the %%Copyright line into images). These characters will simply be displayed as a different, wrong, character (or possibly 2 of them); so far as I'm aware, the rest of the tune data will be unaffected and should be displayed normally.
Input
- Input via an ABC file upload
- Follows the same rules and makes the same assumptions as other ABC programs, so if a file works there it should be
fine here too. The usual ABC default of Latin1 (iso-8859-1) is assumed unless you specify otherwise. If you want
to use any other character set you say so by writing a special magic line at the top of the file. There
are 2 possible forms these lines can take :-
- %%encoding <N>
- Where <N> is a number, between 1 and 6.
abcm2ps uses this to specify a character set Latin 1-6. I don't know of any other programs that actuallly implement any explicit handling of character sets at all, so if you need this feature, it's probable that this is how you're doing it, and thus that your files will already include a suitable such line.
This line should appear before the start of the tune(s) it applies to (ie before the X: line, not in an actual tune header). For uploading to this site, any such line can be overridden by a subsequent one, making it possible to mix character sets in a single file. (I think this doesn't work for abcm2ps, but it does here). - %%abc-charset <charset>
- There have been many suggestions of "improving" ABC, over the years. One such attempt suggested that a line in the above form should specify the character set in use, where <charset> would be something like 'iso-8859-3' or 'utf8'. This was intended, I think, to cover a much fuller set of possibilities than the Latin1-6 of the abcm2ps form described above, but in practice it has the slight disadvantage that no programs implement it. It will work here, in the sense that this site will recognise such lines and accept any valid encodings it specifies (WHICH ARE WHAT ? LIST 'EM). But as before, if this results in characters that can't be fitted into Latin 1-6, they won't be rendered properly in the images until we have ABC rendering programs that support them.
- input via the editor form
- A caveat - . The machine I'm developing this on runs on utf8, and everything's transparent; but I realise I'm not clear what will happen when people use this from a machine running a 'legacy' 8bit character set. I daresay it'll become clear in due course. Set your things up to use utf8 if you can, you know it makes sense.
Output
- ABC Downloads from this site
- If you download a list of ABC tunes from here, character-set lines as described above will be generated as appropriate (both will be given wherever they can be, in an attempt to be as helpful as possible). It will try to find a single character-set that can express all the characters used in the file; but there may not be one that includes all the characters in all selected tunes, in which case it will fall back on issuing appropriate lines immediately before the tune(s) in question. I think abcm2ps does not handle this, but will only accept one such line per file Thus, it may give you a file which abcm2ps isn't capable of dealing with, even if each individual tune is valid in abc2mps. If it finds a tune containing characters, or combinations of characters, that just can't be expressed in one of the Latin1-6 sets, it will not give an 'abcm2ps' style line, since there isn't one, but will send it as utf8, with a '%%abc-charset: utf8' line, since there's no other possibility that won't result in corruption. As I indicated already, I don't know of any ABC programs that will deal with this.
A caveat here - my knowledge of how other programs handle these issues is out of date. Wherever I make a statement about the behaviour of other ABC programs, you should understand it as meaning that that was how it used to be, and that I know of no indications that anything has changed since. If anyone knows of any programs to which I am doing an injustice in this respect, please correct me. I'll take another look round when I get time, and see what they're up to these days. I hope to be wrong, eventually.
But the specs say ...
That you do accented characters via "TeX-style escape sequences". Yes, indeed. Every letter a special case. This site doesn't handle these in any special way, whatever of these sequences works with your program of choice will continue to work here, consult the documentation for that program if you want to know what they are. It would be nice to handle these gracefully. Translate them for the listings shouldn't be too hard, searching/filtering would probably be messier. I'm not clear how widely used they are these days, much of what I see ignores them in favour of total naïvety. All the ones I know of are latin1, anyway.