./textproc/p5-Text-Unidecode, Perl5 module that transliterates Unicode to US-ASCII

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]

Branch: CURRENT, Version: 1.30nb3, Package name: p5-Text-Unidecode-1.30nb3, Maintainer: pkgsrc-users

It often happens that you have non-Roman text data in Unicode, but you can't
display it -- usually because you're trying to show it to a user via an
application that doesn't support Unicode, or because the fonts you need aren't
accessible. You could represent the Unicode characters as "???????" or
"\15BA\15A0\1610...", but that's nearly useless to the user who actually wants
to read what the text says.

What Text::Unidecode provides is a function, unidecode(...) that takes Unicode
data and tries to represent it in US-ASCII characters (i.e., the universally
displayable characters between 0x00 and 0x7F). The representation is almost
always an attempt at transliteration -- i.e., conveying, in Roman letters, the
pronunciation expressed by the text in some other writing system.

Required to run:

Required to build:

Master sites: (Expand)

SHA1: 13c28520896a0073e0ea9333a2b6b770dcf17d6e
RMD160: c4f5ba6ac84eef0ce4999935b7a32da0576c8720
Filesize: 134.743 KB

Version history: (Expand)

CVS history: (Expand)

   2019-08-11 15:25:21 by Thomas Klausner | Files touched by this commit (3557) | Package updated
Log message:
Bump PKGREVISIONs for perl 5.30.0
   2019-06-30 22:17:50 by Nia Alarie | Files touched by this commit (1816) | Package updated
Log message:
Update packages using a search.cpan.org HOMEPAGE to metacpan.org.

The former now redirects to the latter.

This covers the most simple cases where http://search.cpan.org/dist/name
can be changed to https://metacpan.org/release/name.

Reviewed by hand to hopefully make sure no unwanted changes sneak in.
   2018-08-22 11:48:07 by Thomas Klausner | Files touched by this commit (3558)
Log message:
Recursive bump for perl5-5.28.0
   2017-06-05 16:25:36 by Ryo ONODERA | Files touched by this commit (2298)
Log message:
Recursive revbump from lang/perl5 5.26.0
   2016-11-28 14:37:53 by Thomas Klausner | Files touched by this commit (2) | Package updated
Log message:
Updated p5-Text-Unidecode to 1.30.

2016-11-26   Sean M. Burke  sburke@cpan.org
	* Release 1.30
	* Many many (forty?) tables were missing the final character! Fixed.
	* Minor stuff:
 	 . Added just a few Arabesque things to U+FD__
   	 . Renamed t/00400_just_load_module.t
	        to t/00400_just_load_main_module.t
	 . This is the first time non-7bit data appears in any Unidecode/x__.pm
	   files, although it is just in comments.  (In x02.pm, x03.pm, xfd.pm)
	   But this is just THE SHAPE OF THINGS TO COME.
	* Oh look, I blinked and a year went by.  I've been spending about the
	  past *two* years trying to think of how Unidecode v2-and-later's data
	  tables should work.
	* TODO: Kill the surrogatey "xD8", "xD9", "xDA", \ 
"xDB" blocks,
  	  and actually handle surrogates (when properly encoded).
	* TODO: Inaugurate the (private) Text::Unidecode::Blackbox namespace.
   2016-06-08 21:25:20 by Thomas Klausner | Files touched by this commit (2236) | Package updated
Log message:
Bump PKGREVISION for perl-5.24.
   2016-02-18 04:38:36 by Wen Heping | Files touched by this commit (2) | Package updated
Log message:
Update to 1.27

Upstream changes:
2015-10-21   Sean M. Burke  sburke@cpan.org
	* RELEASE 1.27.  (Stable.)
	The release, 1.25_01, didn't blow up, so this is just
	a re-release of it as a normal ("stable") version.
	* Minor changes to the documentation.  Nothing substantial.
	* Release 1.26 had a confusing mistake in the ChangeLog.
	Ignore v1.26.

2015-10-21   Sean M. Burke  sburke@cpan.org
	* RELEASE 1.26.  Mistake.  See above for change notes
	between v1.25_01 and v1.27.

2015-10-16   Sean M. Burke  sburke@cpan.org
	* RELEASE 1.25_01.

	* Here's a new thing that makes me nervous and hesitant, and that I've
	been talking myself into for weeks:

	  *  I've switched to accepting values in the range 0x80-0x9F  *
	  *  as if they are the Windows-1252 ("ANSI") characters.      *

	Previously they had all mapped to emptystring.

	Technically, Unicode specifies those codepoints as control characters
	that I've never heard of, "C1 Controls"...
	  U+0087 ESA - End of Selected Area
	  U+0088 HTS - Character (Horizontal) Tabulation Set
	  U+0089 HTJ - Character (Horizontal) Tabulation with Justification
	( See "C1" in https://en.wikipedia.org/wiki/C0_and_C1_control_codes )

	And Unidecode mapped all of those to emptystring.  Now they are treated
	as if you fed the Windows-1252 characters, as that is an extremely
	common thing to have happen.

	So if you feed character value 0x80 to it, it is taken to mean \ 
	(which Unidecode then decodes as "EUR", at the moment at least).
	(This doesn't interfere with the fact that U+20AC is the proper
	Unicode	place for the "��" to be found.)

	And the smartquotes at 0x91 to 0x94, �� �� �� \ 
�� turn into ' ' " " so yaaaay!

	Note that in theory, according to C1 Controls, 0x85 is "NEL: Next
	Line", "Equivalent to CR+LF. Used to mark end-of-line on some IBM
	I could map this to \n or \r\n or whatever, but I've never seen 0x85 in
	use in the wild, and I never heard anyone complain about my not having
	mapped it to "\n" in all the Unidecode versions since the first, in 2001.
	So instead, Unidecode takes 0x85 as its Windows-1252 value, the
	ellipsis "��" which of course it Unidecodes as "..."

	I'm not thrilled with the idea of going off spec but I think this
	should be okay, and it has massive DWIM value.
	Let's hope I'm not dividing Unicode times infinity by zero and then the
	whole universe will disa

	That's why I'm making this a developer release.  Unless anything
	besplodes by November 1st, I'll re-issue this as a stable release.
   2015-11-04 03:00:17 by Alistair G. Crooks | Files touched by this commit (797)
Log message:
Add SHA512 digests for distfiles for textproc category

Problems found locating distfiles:
	Package cabocha: missing distfile cabocha-0.68.tar.bz2
	Package convertlit: missing distfile clit18src.zip
	Package php-enchant: missing distfile php-enchant/enchant-1.1.0.tgz

Otherwise, existing SHA1 digests verified and found to be the same on
the machine holding the existing distfiles (morden).  All existing
SHA1 digests retained for now as an audit trail.