p5-MARC-Charset, Convert MARC-8 encoded strings to UTF-8

Package name: p5-MARC-Charset-1.35nb5

MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings.
MARC-8 is a single byte character encoding that predates unicode, and allows
you to put non-Roman scripts in MARC bibliographic records.

Required to run:
[textproc/p5-XML-SAX] [databases/p5-gdbm] [lang/perl5] [devel/p5-Class-Accessor]

Required to build:

SHA1: 0db7c2294cb636abf79e3c69404d81ab0411fd86
RMD160: 3fc740ead065e809f35d22ece970f31f198a88b7
Filesize: 189.533 KB

   2013-08-15 10:02:23 by Wen Heping | Files touched by this commit (2) | Package updated
Log message:
Update to 1.35

Upstream changes:
1.35 Tue Aug 13 19:50:55 PDT 2013
    - improve conversion of certain composed characters to MARC8

      Some characters should not be fully decomposed
      before converting them to MARC8.  This patch adds
      a table of such characters, based on Annex A of
      and on some sample records provided by Jason
      Stephenson of MVLC.

    - recognize G0 and G1 characters properly

      When converting from MARC8 to UTF8, MARC::Charset now
      properly recognizes if a (single-byte) MARC8 character falls
      in G0 or G1.

      This is part of the fix for RT#63271 (converting characters
      in the Extended Cyrillic character set), but should also
      fix similar issues with converting characters in the extended
      Arabic set.

      This commit also means that all MARC8 character sets that support
      both G0 and G1 wll be properly converted, regardless of whether
      they're currently set as the G0 or G1 character set.  For example,
      it is now possible to convert Extended Latin as G0 or Basic Latin
      as G1.

      This fixes RT#63271

    - have MARC::Charset::Code->marc_value() handle G0/G1 conversion

      Since there's at present no need to do things like have
      ANSEL be the G0 character set when converting from UTF8 to
      MARC8, this commit centralizes the logic for deciding
      whether to return the G0 or G1 MARC8 representation of a

      Also add MARC::Charset::Code->g0_marc_value(), which returns
      the G0 representation of the character for use by the
      character DB.

    - New test cases for converting Vietnamese and Extended Cyrillic

1.34 Mon Feb 11 09:10:35 PST 2013
    - RT#83257: use AnyDBM_File rather than hardcode GDBM_File

      To improve portability, use AnyDBM_File to select a DBM
      rather than rely on GDBM_File.  GDBM_File apparently used
      to be a core module, but not all distributions included it,
      particularly OS X.  In any event, GDBM_File is no longer

      This patch also includes a tweak to allow MARC::Charset to
      work with NDBM_File and ODBM_File, neither of which
      support 'exists'.

      I've tested MARC::Charset successfully on the following

      - GDBM_File
      - DB_File
      - NDBM_File
      - ODBM_File
      - SDBM_File

      This is also my preferred order; SDBM_File is selected last
      because it produces the biggest data file on disk.

    - RT#38912: fix mapping of double diacritics (ligature and double
      Thanks to Thomas P. Ventimiglia for the bug report and test case.
