[cups.general] Character set encoding names

Michael Sweet mike at easysw.com
Wed Aug 16 18:11:03 PDT 2006


Tim Waugh wrote:
> On Wed, 2006-08-16 at 14:51 -0400, Michael Sweet wrote:
>> Tim Waugh wrote:
>>> ...
>>> 1. Are filters expected to be able to use $CHARSET as an encoding name
>>> suitable to passing to iconv_open(), and if not how are they meant to
>>> interpret them?
>> CHARSET will be the ISO-registered name for the character set,
>> which rarely is the same as the locale's charset name.  To make
>> matters worse, some character sets are known by multiple names.
> 
> Can't we use the IANA names, so that the filters can use iconv_open()?

First, IANA provides both the ISO-defined names and non-ISO alternate
names that have been registered with them.  We ONLY use the ISO names
and make no attempt to track the original/alternate name that was used.

Second, locales generally don't follow IANA or ISO naming - think
iso-8859-1 vs ISO8859-1, etc.  Since each OS vendor has adopted
slightly different naming conventions, it is pretty much impossible
to support every possible locale charset name - we can only address
the common ones that have a corresponding ISO name we support.

Finally, iconv_open() is not portable, nor are the character set
names it accepts.  Even UTF-8 is not guaranteed on non-Linux
systems... :(

.....

As for using iconv_open() in a filter, don't use it unless you need
to convert data in a document file and don't care about portability.
Even then, you should not depend on CHARSET to provide you with the
character set - that exists only for plain text files, and even then
it is a guess based on the user's locale.

ALL messages sent to stderr MUST be in UTF-8 - that's all that the
scheduler uses, and the command-line and web interfaces depend on
text strings using UTF-8...

>>> 2. Should CUPS ship a windows-932 charset file, or is it a bug that
>>> texttops looks for one?
>> texttops currently does not support double-byte or variable-byte
>> text encodings other than UTF-8, thus there is no file for windows-932
>> aka Shift JIS aka WINDOWS-31J.
> 
> I'm happy to use our own filter (based on paps) -- the trouble is that
> I'll need to make a look-up table to work out what charset name to give
> to iconv_open() based on $CHARSET. :-(
> 
> The same will be true of any 3rd party text/plain filter, of course.
> Wouldn't it make more sense for CUPS to give the real suitable-for-iconv
> charset name to the filters?  Even in a separate environment variable,
> if you like?

If it was a simple one-to-one mapping, sure.  Unfortunately, that is
*not* the case...  Given that the locale charset names are highly
platform-specific, your best bet is to provide a Red Hat-specific
lookup table from ISO name to iconv name.

(If you do supply your own text filter, make sure you support ALL of
the standard CUPS options!)

-- 
______________________________________________________________________
Michael Sweet, Easy Software Products           mike at easysw dot com
Internet Printing and Publishing Software        http://www.easysw.com




More information about the cups mailing list