codec_ext8.c File Reference

#include <assert.h>
#include <stdlib.h>
#include <string.h>
#include <parserutils/charset/mibenum.h>
#include "charset/codecs/codec_impl.h"
#include "utils/endian.h"
#include "utils/utils.h"
#include "charset/codecs/ext8_tables.h"

Go to the source code of this file.

Data Structures

struct  charset_ext8_codec
 Windows charset codec. More...

Defines

#define READ_BUFSIZE   (8)
#define WRITE_BUFSIZE   (8)

Functions

static bool charset_ext8_codec_handles_charset (const char *charset)
 Determine whether this codec handles a specific charset.
static parserutils_error charset_ext8_codec_create (const char *charset, parserutils_charset_codec **codec)
 Create an extended 8bit codec.
static parserutils_error charset_ext8_codec_destroy (parserutils_charset_codec *codec)
 Destroy an extended 8bit codec.
static parserutils_error charset_ext8_codec_encode (parserutils_charset_codec *codec, const uint8_t **source, size_t *sourcelen, uint8_t **dest, size_t *destlen)
 Encode a chunk of UCS-4 (big endian) data into extended 8bit.
static parserutils_error charset_ext8_codec_decode (parserutils_charset_codec *codec, const uint8_t **source, size_t *sourcelen, uint8_t **dest, size_t *destlen)
 Decode a chunk of extended 8bit data into UCS-4 (big endian).
static parserutils_error charset_ext8_codec_reset (parserutils_charset_codec *codec)
 Clear an extended 8bit codec's encoding state.
static parserutils_error charset_ext8_codec_read_char (charset_ext8_codec *c, const uint8_t **source, size_t *sourcelen, uint8_t **dest, size_t *destlen)
 Read a character from the extended 8bit to UCS-4 (big endian).
static parserutils_error charset_ext8_codec_output_decoded_char (charset_ext8_codec *c, uint32_t ucs4, uint8_t **dest, size_t *destlen)
 Output a UCS-4 character (big endian).
static parserutils_error charset_ext8_from_ucs4 (charset_ext8_codec *c, uint32_t ucs4, uint8_t **s, size_t *len)
 Convert a UCS4 (host endian) character to extended 8bit.
static parserutils_error charset_ext8_to_ucs4 (charset_ext8_codec *c, const uint8_t *s, size_t len, uint32_t *ucs4)
 Convert an extended 8bit character to UCS4 (host endian).

Variables

struct {
   uint16_t   mib
   const char *   name
   size_t   len
   uint32_t *   table
known_charsets []
const parserutils_charset_handler charset_ext8_codec_handler


Define Documentation

#define READ_BUFSIZE   (8)

Definition at line 45 of file codec_ext8.c.

#define WRITE_BUFSIZE   (8)

Definition at line 51 of file codec_ext8.c.


Function Documentation

parserutils_error charset_ext8_codec_create ( const char *  charset,
parserutils_charset_codec **  codec 
) [static]

parserutils_error charset_ext8_codec_decode ( parserutils_charset_codec codec,
const uint8_t **  source,
size_t *  sourcelen,
uint8_t **  dest,
size_t *  destlen 
) [static]

Decode a chunk of extended 8bit data into UCS-4 (big endian).

Parameters:
codec The codec to use
source Pointer to pointer to source data
sourcelen Pointer to length (in bytes) of source data
dest Pointer to pointer to output buffer
destlen Pointer to length (in bytes) of output buffer
Returns:
PARSERUTILS_OK on success, PARSERUTILS_NOMEM if output buffer is too small, PARSERUTILS_INVALID if a character cannot be represented and the codec's error handling mode is set to STRICT,
On exit, source will point immediately _after_ the last input character read, if the result is _OK or _NOMEM. Any remaining output for the character will be buffered by the codec for writing on the next call.

In the case of the result being _INVALID, source will point _at_ the last input character read; nothing will be written or buffered for the failed character. It is up to the client to fix the cause of the failure and retry the decoding process.

Note that, if failure occurs whilst attempting to write any output buffered by the last call, then source and sourcelen will remain unchanged (as nothing more has been read).

If STRICT error handling is configured and an illegal sequence is split over two calls, then _INVALID will be returned from the second call, but source will point mid-way through the invalid sequence (i.e. it will be unmodified over the second call). In addition, the internal incomplete-sequence buffer will be emptied, such that subsequent calls will progress, rather than re-evaluating the same invalid sequence.

sourcelen will be reduced appropriately on exit.

dest will point immediately _after_ the last character written.

destlen will be reduced appropriately on exit.

Call this with a source length of 0 to flush the output buffer.

Definition at line 324 of file codec_ext8.c.

References charset_ext8_codec_read_char(), endian_host_to_big(), PARSERUTILS_NOMEM, PARSERUTILS_OK, charset_ext8_codec::read_buf, and charset_ext8_codec::read_len.

Referenced by charset_ext8_codec_create().

parserutils_error charset_ext8_codec_destroy ( parserutils_charset_codec codec  )  [static]

Destroy an extended 8bit codec.

Parameters:
codec The codec to destroy
Returns:
PARSERUTILS_OK on success, appropriate error otherwise

Definition at line 171 of file codec_ext8.c.

References PARSERUTILS_OK, and UNUSED.

Referenced by charset_ext8_codec_create().

parserutils_error charset_ext8_codec_encode ( parserutils_charset_codec codec,
const uint8_t **  source,
size_t *  sourcelen,
uint8_t **  dest,
size_t *  destlen 
) [static]

Encode a chunk of UCS-4 (big endian) data into extended 8bit.

Parameters:
codec The codec to use
source Pointer to pointer to source data
sourcelen Pointer to length (in bytes) of source data
dest Pointer to pointer to output buffer
destlen Pointer to length (in bytes) of output buffer
Returns:
PARSERUTILS_OK on success, PARSERUTILS_NOMEM if output buffer is too small, PARSERUTILS_INVALID if a character cannot be represented and the codec's error handling mode is set to STRICT,
On exit, source will point immediately _after_ the last input character read. Any remaining output for the character will be buffered by the codec for writing on the next call.

Note that, if failure occurs whilst attempting to write any output buffered by the last call, then source and sourcelen will remain unchanged (as nothing more has been read).

sourcelen will be reduced appropriately on exit.

dest will point immediately _after_ the last character written.

destlen will be reduced appropriately on exit.

Definition at line 205 of file codec_ext8.c.

References charset_ext8_from_ucs4(), endian_big_to_host(), len, PARSERUTILS_NOMEM, PARSERUTILS_OK, charset_ext8_codec::write_buf, WRITE_BUFSIZE, and charset_ext8_codec::write_len.

Referenced by charset_ext8_codec_create().

bool charset_ext8_codec_handles_charset ( const char *  charset  )  [static]

Determine whether this codec handles a specific charset.

Parameters:
charset Charset to test
Returns:
true if handleable, false otherwise

Definition at line 92 of file codec_ext8.c.

References known_charsets, len, mib, N_ELEMENTS, name, and parserutils_charset_mibenum_from_name().

parserutils_error charset_ext8_codec_output_decoded_char ( charset_ext8_codec c,
uint32_t  ucs4,
uint8_t **  dest,
size_t *  destlen 
) [inline, static]

Output a UCS-4 character (big endian).

Parameters:
c Codec to use
ucs4 UCS-4 character (host endian)
dest Pointer to pointer to output buffer
destlen Pointer to output buffer length
Returns:
PARSERUTILS_OK on success, PARSERUTILS_NOMEM if output buffer is too small,

Definition at line 475 of file codec_ext8.c.

References endian_host_to_big(), PARSERUTILS_NOMEM, PARSERUTILS_OK, charset_ext8_codec::read_buf, and charset_ext8_codec::read_len.

Referenced by charset_ext8_codec_read_char().

parserutils_error charset_ext8_codec_read_char ( charset_ext8_codec c,
const uint8_t **  source,
size_t *  sourcelen,
uint8_t **  dest,
size_t *  destlen 
) [inline, static]

Read a character from the extended 8bit to UCS-4 (big endian).

Parameters:
c The codec
source Pointer to pointer to source buffer (updated on exit)
sourcelen Pointer to length of source buffer (updated on exit)
dest Pointer to pointer to output buffer (updated on exit)
destlen Pointer to length of output buffer (updated on exit)
Returns:
PARSERUTILS_OK on success, PARSERUTILS_NOMEM if output buffer is too small, PARSERUTILS_INVALID if a character cannot be represented and the codec's error handling mode is set to STRICT,
On exit, source will point immediately _after_ the last input character read, if the result is _OK or _NOMEM. Any remaining output for the character will be buffered by the codec for writing on the next call.

In the case of the result being _INVALID, source will point _at_ the last input character read; nothing will be written or buffered for the failed character. It is up to the client to fix the cause of the failure and retry the decoding process.

sourcelen will be reduced appropriately on exit.

dest will point immediately _after_ the last character written.

destlen will be reduced appropriately on exit.

Definition at line 418 of file codec_ext8.c.

References charset_ext8_codec::base, charset_ext8_codec_output_decoded_char(), charset_ext8_to_ucs4(), parserutils_charset_codec::errormode, PARSERUTILS_CHARSET_CODEC_ERROR_STRICT, PARSERUTILS_INVALID, PARSERUTILS_NEEDDATA, PARSERUTILS_NOMEM, and PARSERUTILS_OK.

Referenced by charset_ext8_codec_decode().

parserutils_error charset_ext8_codec_reset ( parserutils_charset_codec codec  )  [static]

Clear an extended 8bit codec's encoding state.

Parameters:
codec The codec to reset
Returns:
PARSERUTILS_OK on success, appropriate error otherwise

Definition at line 376 of file codec_ext8.c.

References PARSERUTILS_OK, charset_ext8_codec::read_buf, charset_ext8_codec::read_len, charset_ext8_codec::write_buf, and charset_ext8_codec::write_len.

Referenced by charset_ext8_codec_create().

parserutils_error charset_ext8_from_ucs4 ( charset_ext8_codec c,
uint32_t  ucs4,
uint8_t **  s,
size_t *  len 
) [inline, static]

Convert a UCS4 (host endian) character to extended 8bit.

Parameters:
c The codec instance
ucs4 The UCS4 character to convert
s Pointer to pointer to destination buffer
len Pointer to destination buffer length
Returns:
PARSERUTILS_OK on success, PARSERUTILS_NOMEM if there's insufficient space in the output buffer, PARSERUTILS_INVALID if the character cannot be represented
_INVALID will only be returned if the codec's conversion mode is STRICT. Otherwise, '?' will be output.

On successful conversion, *s and *len will be updated.

Definition at line 509 of file codec_ext8.c.

References charset_ext8_codec::base, parserutils_charset_codec::errormode, PARSERUTILS_CHARSET_CODEC_ERROR_STRICT, PARSERUTILS_INVALID, PARSERUTILS_NOMEM, PARSERUTILS_OK, and charset_ext8_codec::table.

Referenced by charset_ext8_codec_encode().

parserutils_error charset_ext8_to_ucs4 ( charset_ext8_codec c,
const uint8_t *  s,
size_t  len,
uint32_t *  ucs4 
) [inline, static]

Convert an extended 8bit character to UCS4 (host endian).

Parameters:
c The codec instance
s Pointer to source buffer
len Source buffer length
ucs4 Pointer to destination buffer
Returns:
PARSERUTILS_OK on success, PARSERUTILS_NEEDDATA if there's insufficient input data PARSERUTILS_INVALID if the character cannot be represented

Definition at line 557 of file codec_ext8.c.

References PARSERUTILS_INVALID, PARSERUTILS_NEEDDATA, PARSERUTILS_OK, and charset_ext8_codec::table.

Referenced by charset_ext8_codec_read_char().


Variable Documentation

Initial value:

Definition at line 579 of file codec_ext8.c.

struct { ... } known_charsets[] [static]

size_t len

Definition at line 23 of file codec_ext8.c.

uint16_t mib

Definition at line 21 of file codec_ext8.c.

const char* name

Definition at line 22 of file codec_ext8.c.

uint32_t* table

Definition at line 24 of file codec_ext8.c.


Generated on Wed Jul 29 11:59:21 2015 for Libparserutils by  doxygen 1.5.6