Archon::Utilities::CharEnc Struct Reference

Handles transcoding character sequences between various character encodings. More...

#include <archon/util/charenc.H>

Collaboration diagram for Archon::Utilities::CharEnc:

Collaboration graph
[legend]
List of all members.

Public Member Functions

 CharEnc (string sourceEncoding, string targetEncoding)
 Construct an object that can be used for incremental transcoding.
 ~CharEnc ()
void transcode (const char *&sourceBuffer, size_t &sourceBytesLeft, char *&targetBuffer, size_t &targetBytesLeft, bool fail=false) throw (TranscodeException)

Static Public Member Functions

static string encode (wstring s, string encoding=UTF_8, bool fail=false) throw (TranscodeException)
static wstring decode (string s, string encoding=UTF_8, bool fail=false) throw (TranscodeException)
 Illegal input charcters will be converted to the Unicode replacement character.
static string transcode (string s, string sourceEncoding, string targetEncoding, bool fail=false) throw (TranscodeException)
 Illegal charcters in the input will be converted to the Unicode replacement character or another suitable replacement character available in the target encoding.

Static Public Attributes

static const string US_ASCII = "US-ASCII"
 Classical American 7-bit encoding.
static const string ISO_8859_1 = "ISO-8859-1"
 ISO Latin 1 encoding.
static const string ISO_8859_15 = "ISO-8859-15"
 ISO Latin 1 encoding with Euro sign.
static const string UTF_8 = "UTF-8"
 ISO 8-bit variable length Unicode (UCS) encoding.
static const string UTF_16LE = "UTF-16LE"
 ISO 16-bit little-endian variable length Unicode (UCS) encoding.
static const string UTF_16BE = "UTF-16BE"
 ISO 16-bit big-endian variable length Unicode (UCS) encoding.
static const string UTF_32LE = "UTF-32LE"
 ISO 32-bit little-endian fixed length Unicode (UCS) encoding.
static const string UTF_32BE = "UTF-32BE"
 ISO 32-bit big-endian fixed length Unicode (UCS) encoding.
static const string WINDOWS_1252 = "WINDOWS-1252"
 MS Windows expansion of US-ASCII which is incompatible with ISO Latin 1.

Classes

struct  TranscodeException

Detailed Description

Handles transcoding character sequences between various character encodings.

See also:
http://www.iana.org/assignments/character-sets

Definition at line 37 of file charenc.H.


Constructor & Destructor Documentation

Archon::Utilities::CharEnc::CharEnc string  sourceEncoding,
string  targetEncoding
 

Construct an object that can be used for incremental transcoding.

Such a thing would be needed when transcoding a stream since here the complete data set is never available.

Parameters:
sourceEncoding The encoding of the source for the following transcoding operation. See the IANA registry for the complete list of character encoding.
targetEncoding The encoding of the target for the following transcoding operation. See the IANA registry for the complete list of character encoding.
See also:
http://www.iana.org/assignments/character-sets

Definition at line 128 of file charenc.C.

References Archon::Utilities::Text::toString().


Member Function Documentation

wstring Archon::Utilities::CharEnc::decode string  s,
string  encoding = UTF_8,
bool  fail = false
throw (TranscodeException) [static]
 

Illegal input charcters will be converted to the Unicode replacement character.

Parameters:
enc The type of encoding to use. See the IANA registry for the complete list of character encoding.
s A string holding encoded characters.
fail Pass true if you want an exception when an input character could not be converted. This will either be when the input is malformed according to the specified source encode or when the input character cannot be represented in the target encoding. The default is to output a replacement character in these cases.
Returns:
A string of un-encoded Unicode (UCS) characters.
Exceptions:
TranscodeException Never thrown unless fail = true.
See also:
http://www.iana.org/assignments/character-sets

Definition at line 84 of file charenc.C.

References transcode().

string Archon::Utilities::CharEnc::encode wstring  s,
string  encoding = UTF_8,
bool  fail = false
throw (TranscodeException) [static]
 

Parameters:
enc The type of encoding to use. See the IANA registry for the complete list of character encoding.
s A string of un-encoded Unicode (UCS) characters.
fail Pass true if you want an exception when an input character could not be converted. This will either be when the input is malformed according to the specified source encode or when the input character cannot be represented in the target encoding. The default is to output a replacement character in these cases.
Returns:
A string holding encoded characters.
Exceptions:
TranscodeException Never thrown unless fail = true.
See also:
http://www.iana.org/assignments/character-sets
Todo:
Should utilize Autoconf's endianness detection macro.

Definition at line 62 of file charenc.C.

void Archon::Utilities::CharEnc::transcode const char *&  sourceBuffer,
size_t &  sourceBytesLeft,
char *&  targetBuffer,
size_t &  targetBytesLeft,
bool  fail = false
throw (TranscodeException)
 

Parameters:
sourceBuffer Input buffer containing characters in the source encoding as specified for the constructor of this class.
sourceBytesLeft The number of valid bytes of input remaining in the input buffer. Note that this is not necessarily the same as the number of characters.
targetBuffer Output buffer which will be filled with the character in the target encoding as specified for the constructor of this class.
targetBytesLeft The remaining free space in number of bytes available in the output buffer. Note that this is not necessarily the same as the number of characters which will fit into the output buffer.
fail Pass true if you want an exception when an input character could not be converted. This will either be when the input is malformed according to the specified source encode or when the input character cannot be represented in the target encoding. The default is to output a replacement character in these cases.

Definition at line 167 of file charenc.C.

References n, and transcode().

string Archon::Utilities::CharEnc::transcode string  s,
string  sourceEncoding,
string  targetEncoding,
bool  fail = false
throw (TranscodeException) [static]
 

Illegal charcters in the input will be converted to the Unicode replacement character or another suitable replacement character available in the target encoding.

Parameters:
sourceEncoding The encoding of the source string. See the IANA registry for the complete list of character encoding.
targetEncoding The encoding of the target string. See the IANA registry for the complete list of character encoding.
fail Pass true if you want an exception when an input character could not be converted. This will either be when the input is malformed according to the specified source encode or when the input character cannot be represented in the target encoding. The default is to output a replacement character in these cases.
Exceptions:
TranscodeException Never thrown unless fail = true.
See also:
http://www.iana.org/assignments/character-sets

Definition at line 106 of file charenc.C.

References transcode().

Referenced by decode(), Archon::Utilities::FormatLibjpeg::load(), Archon::Utilities::FormatLibpng::save(), Archon::Utilities::FormatLibjpeg::save(), and transcode().


The documentation for this struct was generated from the following files:
Generated on Sun Jul 30 22:57:18 2006 for Archon by  doxygen 1.4.4