./textproc/GutenMark, Automatic, high-quality Gutenberg text formatter to LaTeX or HTML

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]

Branch: CURRENT, Version: 20090510, Package name: GutenMark-20090510, Maintainer: pkgsrc-users

GutenMark is a tool for automatically creating high-quality HTML
or LaTeX markup from Project Gutenberg etexts. In combination with
other freely-available conversion tools, GutenMark can convert
Project Gutenberg etexts into publication-quality Postscript or
PDF, for print-on-demand applications. The goal is for this
conversion is completely automatic, without manual markup or editing.

Required to run:

Master sites:

SHA1: d014d982a86f025b1390d48aa130a6ddf107e4a9
RMD160: d6b25878e53110ce18745c611eb9df6ef2185561
Filesize: 421.938 KB

Version history: (Expand)

CVS history: (Expand)

   2015-11-04 03:00:17 by Alistair G. Crooks | Files touched by this commit (797)
Log message:
Add SHA512 digests for distfiles for textproc category

Problems found locating distfiles:
	Package cabocha: missing distfile cabocha-0.68.tar.bz2
	Package convertlit: missing distfile clit18src.zip
	Package php-enchant: missing distfile php-enchant/enchant-1.1.0.tgz

Otherwise, existing SHA1 digests verified and found to be the same on
the machine holding the existing distfiles (morden).  All existing
SHA1 digests retained for now as an audit trail.
   2012-10-25 08:57:09 by Aleksej Saushev | Files touched by this commit (587)
Log message:
Drop superfluous PKG_DESTDIR_SUPPORT, "user-destdir" is default these days.
   2011-11-29 07:21:02 by Steven Drake | Files touched by this commit (2)
Log message:
Honor LDFLAGS from pkgsrc.
   2011-02-06 11:58:29 by Ryo ONODERA | Files touched by this commit (3) | Package updated
Log message:
Update to 20090510 snapshot.

  + There is no behavioral difference from the last snapshot.  If your
    present installation is working, there's no need to update it.
    However, Richard Downing (thanks Richard!) had pointed out that the
    Linux binaries I was providing were a 64-bit application and therefore
    wouldn't work on 32-bit machines.  The portability problems with the
    Linux binaries were actually a lot worse than that.  A temporary Linux
    installer was posted in March (without any source changes), so even if
    you have installed the Linux binaries they may have worked fine for
    you.  At any rate, I've now changed the build-procedure for when I
    snapshot the Linux binaries in order to make them much more portable.
    I believe they should work on any 32-bit or 64-bit 'x86 Linux
    distribution which is circa Fedora Core 4 or later.
  + Some additional hints for getting GUItenMark to work on Windows Vista
    have been added to the download page, courtesy of a reader known only
    as 'wendl' (thanks, wendl!).

02/16/09.  The Mac OS X version is now believed to be fully working on Mac
OS X 10.5 (PPC and Intel), 10.4 (PPC and Intel), and 10.3 (PPC, of
course!), though I only have a subset of these systems on which to test.

06/01/08.  Another pot-pourri:
  + The Mac OS X installer download should now work fully on Mac OS X 10.5
    (though tested only on Intel), meaning that GUItenMark works on that
    platform.  On Mac OS X prior to 10.5, only the command-line programs (
    GutenMark and GutenSplit) are expected to work.
  + Jason Pollock's instructions for compiling GutenMark and GutenSplit for
    iPhone are now included on the download page.
  + GUItenMark now provides a GUI front-end for GutenSplit (in addition to
    its continued support for GutenMark).
  + Fixed some portability issues in the Linux binary download.
    (Specifically, I found that even if the prerequisite wxWindows 2.8 was
    installed on the target machine, GUItenMark would still not work if the
    wxWidgets had been compiled with Unicode support, because the
    shared-library names would be different from what was expected.  This
    happened with Fedora 9, for example.  And there are other wxWidgets
    compilation flags that could cause the same problem.)  The fix was to
    compile the Linux version of the GUItenMark program so that it is
    statically linked to wxWidgets, thus removing the requirement that
    wxWidgets be installed on the target computer at all.  On the other
    hand, I now see that the Linux version of the program has lots of other
    requirements that you may or may not need to install.  Oh, well!
05/28/08.  GUItenMark now compiles and works for Mac OS X, but so far I'm
only able to build it for Mac OS X 10.5, so I haven't yet gone through the
hassle of adding it to the binary download for Mac OS X.  However, it is
quite easy to build following the instructions on the download page.  Also,
Jason Pollock <jason@pollock.ca> has sent some additional changes and
instructions for compiling for the iPhone, but I've not yet had a chance to
try them or incorporate them here.  Soon, I hope.

05/26/08.  A miscellaneous pot-pourri of changes:
  + I'm still working on Mac OS X support, because it has turned out to be
    harder than I thought.  (For some reason I'm not yet clear on,
    GUItenMark simply doesn't work on Mac OS X.)  Nevertheless, there is
    now a Mac OS X installation package on the download page, along with
    instructions for using it, and for compiling.  The installation package
    makes installing GutenMark and GutenSplit much easier on the Mac, even
    though the program the package purports to provide (GUItenMark) doesn't
    work yet.
  + By default, GUItenMark now defaults to using the desktop as the
    location to find input files and to create output files.  This seems
    much more logical and useful than the obscure directories it defaulted
    to previously.
  + Jason Pollock <jason@pollock.ca> (thanks, Jason!) has sent in
    greatly-reworked version of GutenSplit, with several options that give
    you some more flexibility in splitting on different levels of headers,
    and/or in omitting the table of contents.
  + Jason has also suggested some mods that allow cross-compiling GutenMark
    (from a Mac), so that it runs on an iPhone.  I have not personally
    tried this, and I'll trust Jason to let me know if the changes are to
    his satisfaction.
05/09/08.  Well, Mac OS X support is back (due to the addition of Mac OS X
support to the IMCROSS development system which I am using to create
Windows binaries), but untested, so I won't provide binaries just yet.
More on this later.

04/23/08.  Made the installer programs a little smaller, by removing the
Linux-only files from the Windows installer, and vice-versa.

04/22/08.  Reorganized how the installers are created, to avoid overwriting
some directories that are useful to me.  Also, the installer isn't built
automatically any longer (you have to do 'make snapshot').  But I don't
suppose either of those things is of interest to anyone but me.

04/21/08.  Big doings are transpiring!

|The GUItenMark graphical front-end program is now fully working on both  |
|Windows and Linux 'x86, and seems to work quite well.  Admittedly, I've  |
|tested only Windows XP, SuSE 10.0, and Fedora Core 5.                    |
|                                                                         |
|Linux and Win32 installer programs are now available.                    |
|                                                                         |
|The website has been completely revamped to replace all of the outmoded  |
|download, installation, and compilation instructions, and to provide the |
|necessary new instructions pertaining to GUI front-end and installer     |
|programs.                                                                |
|                                                                         |
|Direct support for Linux PPC, Mac OS X, FreeBSD, and NetBSD has now been |
|discontinued.  The software (or at least, GutenMark) can presumably still|
|be built for these platforms, but I simply don't have the time or        |
|resources that allow me to do it myself.  I doubt this will be much loss |
|to anybody, since it has been years since I've updated the binaries for  |
|those platforms anyhow.                                                  |

04/20/08.  The GutenMark program itself has had the groundwork for several
experimental improvements laid, but the changes haven't resulted yet in any
quality changes.   More importantly, there is now a GUI front-end for
GutenMark, cleverly called GUItenMark, for both Linux and Windows.  There
are a couple of improvements I'd like to make in this program, and a couple
of bug-fixes for the Windows version of the program, but it's basically
working and seems very useful.  At any rate, even I admit that it's
enormously easier to use than the GutenMark command-line program.  I don't
want to generally send it out into the world until I make the mentioned
changes, test it on more computers (so far I've only tried it on SuSE Linux
10.0 and Windows XP Home), have an installer program, and get the web-page
verbiage all fixed up.  However you, loyal reader, can try it out now:
 1. Download GUItenMark-demo.zip.  This is the complete package, and
    there's nothing else to download unless you want the source code as
    well!  But it is 13.4 Mbytes.
 2. For Windows, unzip this file in "C:\Program Files\", thus creating the
    directory "C:\Program Files\GutenMark".  For Linux, unzip it in your
    home directory, thus creating "~/GutenMark", and rename it to "~
 3. Left-click on your desktop to create an icon for the application
    program:  either "C:\Program Files\GutenMark\binary\GUItenMark.exe"
    (Windows) or else "~/.GutenMark/binary/GUItenMark" (Linux).
 4. Click the icon with your mouse.
 5. Operation of the program should be pretty darned self-explanatory, and
    it had better be since I haven't written any instructions for it yet.
    If not, let me know.
02/02/08.  A lot of explanation has been added on the usage page about
LaTeX, and particularly about using LaTeX on Windows.  Note also that
essentially all of the pre-processed etexts have been recently corrected or
improved in some way.  (Such changes are normally described on their own
"What's New" page rather than being described on this software-change

03/20/04.  Fixed PR #114.  This seemingly hasn't caused anybody a problem,
but ....  In conjunction with this, the "prefatory" area is no longer in a
smaller font size, except for the message that GutenMark itself adds.

02/21/04.  Fixed PR #113, which caused HTML headers to be omitted in some
files created by GutenSplit.

01/21/04.  A new utility program called GutenSplit has been added; this
program splits the HTML files created by GutenMark into smaller HTML files,
adding a table of contents, and links between all of the small HTML files.
The Makefiles have also been modified so that the various "GutenUtilities"
(including GutenSplit) are automatically built when GutenMark is built;
previously, this was a separate, manual build.  Also, if a cross-compiler
version of MinGW is installed on Linux, then a Linux build tries to create
not only the Linux versions of the executables, but also Win32 versions as

01/05/03.  LaTeX:  Fixed the incorrect mdash construct I've been using all
this time!  (I used "--" at first, and then later decided that \ 
looked better.  However, since neither of these is the correct LaTeX
construct, namely "---", there were problems with them being arbitrarily
broken across lines.  Therefore, I added an \mbox to correct this problem,
and then a \linebreak to correct problems with the \mbox, and then ....).

12/28/02.  LaTeX:  Supplied a workaround for an importing bug in LyX 1.2
(PR#110, spurious linefeeds inserted when importing a LaTeX
command-sequence of the form "\ \ ").

12/25/02.  LaTeX:  Messages about the text having been converted by
GutenMark, the software version, and what-not have been moved from the
"Prefatory Materials" to a copyright-area on the back-side of the
title-page.  Also, added a trick to fool LyX into correctly importing the
date change from 12/23.  Added the "--ron" command-line switch to group
together various settings that
I personally find useful.

12/23/02.  LaTeX:  Fixed PR#109 (hyphenation and linebreak problems with
mdashes).  Also, removed the date which had appeared on the title page.

12/16/02.  Fixed PR#108 (inability to compile in NetBSD).

11/24/02.  The source tarball has been corrected to contain the correct
html file for the regression test.  More slight improvements were made do
special.words.gz.  Added "Hon." to the list of honorifics.

11/22/02.  This version has lots of improvements that--in my view, at
least--make producing LaTeX much easier, along with a  few other
miscellaneous changes:  Corrected the email address displayed by the
software.  Added the word "The" to special.words.gz.  Fixed the bulk of the
problems (but probably not all) associated with too-long spaces following
honorifics and quotes in LaTeX.  Fixed smart single-quotes, so as not to be
fooled by words like 'em, 'til, etc.  Fixed smart single-quotes and smart
double-quotes to correctly treat cases like this:
    ... and so I says to him"--he paused briefly--"why don't you stop ...
Hopefully, fixed the missing quotes in LaTeX chapter names.  Mdashes in
LaTeX are now enclosed in LaTeX \mbox{}, to avoid breaking them across
lines; also, the --mdash-size switch has been added to allow longer (or
shorter) mdashes.  The LaTeX default is now \raggedbottom.  Now use LaTeX "
\ " everywhere that "~" had been used previously (allowing latex \ 
to much
more easily space shorter lines).

08/26/02.  Fixed PR#99, making the 20020809 ALL-CAPS turnoff a little more

08/25/02.  A few LaTeX conveniences were added, mainly to eliminate manual
post-corrections:  For page headings, the cases of the right- and
left-headings are matched; i.e., if one of them is all-caps then the other
one is forced to be all-caps also.  Hard-spaces in chapter headings have
been eliminated.  When the chapter heading is something like "CHAPTER IV.
THE SEARCH FOR PEACE", the page heading now only shows "THE SEARCH FOR
PEACE" and eliminates "CHAPTER IV."  The "\sloppy" \ 
markup is now used in
place of "\emergencystretch".

08/11/02.  Some LaTeX formatting changes were made to work around bugs in
LyX 1.2.  Also, some LaTeX command sequences (like that for the ae
ligature) were broken and have now been fixed.  Others may still be broken,
for all I know.

08/10/02.  Fixed a segfault in the author-deduction code added two days

08/09/02.  Fixed PR #93 (incorrect treatment of constructs like _[text]_
and [_text_]).  Fixed PR #94 (mixing/matching of ALL-CAPS italicizing mode
with other italicizing modes).  [This is handled in two ways:  GutenMark
attempts to deduce which emphasis mode is used, but also the "--caps-ok"
switch has been added to simply turn off ALL-CAPS conversion.]

08/08/02.  LaTeX-related changes:  Output file now includes a table of
contents.  Command-line switches "--no-toc", "--author", and \ 

08/05/02.  Added "--latex-sections" command-line option.

08/04/02.  All LaTeX-related:  Various bugs I found yesterday (see buglist)
were fixed.  Also, the hard spaces in sentence breaks have been completely
eliminated now.  Chapter headings are now ragged-right in all

08/03/02.  Added a new page to the website, for etexts I've converted to
LaTeX and PDF.

08/03/02.  Added "Rev.", "Gen.", and "Messrs" to \ 
the list of honorifics.
All of the following are LaTeX-only changes:  Trailing spaces and periods
(but not ellipses) are now removed from chapter names as used for page
headings.  For example, if the chapter name was "CHAPTER III." (as opposed
to "CHAPTER III"), it will now continue to appear as "CHAPTER \ 
III." except
in the page heading where it is not "CHAPTER III".  An \ 
factor is now added, to eliminate some run-ons into the right margin for
small page sizes.  The soft-hyphen mechanism after em-dashes has been
changed, because the old one didn't work.  Honorifics and other
abbreviations being treated as ends of sentences has been fixed.

07/25/02.  Addressed PR #85, in which abnormally long input lines can cause
corruption in the output file.

07/22/02.  Addressed PR #84, hopefully extending yesterday's fixes from
Windows 98 to the entire Win32 family.  Sadly -- or perhaps happily -- I
only have Windows 98 myself, and therefore can't properly test the fixes,
and did not know that that the previous fixes didn't work in Win2K.

07/21/02.  Fixed PR #83, in which the PATH environment variable didn't work
properly.  Fixed PR #80, in which the default configuration file needed --
unreasonably, I think -- to have the exact paths of the wordlists rather
than a more graceful means of finding the wordlists.
   2008-06-12 04:14:58 by Joerg Sonnenberger | Files touched by this commit (1134)
Log message:
Add DESTDIR support.
   2006-06-15 15:31:30 by Thomas Klausner | Files touched by this commit (24)
Log message:
Drop maintainership, I don't use them any longer.
   2006-02-06 00:11:50 by Joerg Sonnenberger | Files touched by this commit (4082)
Log message:
Recursive revision bump / recommended bump for gettext ABI change.
   2005-06-17 05:50:45 by Johnny C. Lam | Files touched by this commit (387)
Log message:
Create directories before installing files into them.