./textproc/link-grammar, Syntactic parsing library

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]

Branch: CURRENT, Version: 5.3.7, Package name: link-grammar-5.3.7, Maintainer: pkgsrc-users

The Link Grammar Parser is a syntactic parser of English, Russian,
Arabic and Persian (and other languages as well), based on Link
Grammar, an original theory of syntax and morphology. Given a
sentence, the system assigns to it a syntactic structure, which
consists of a set of labelled links connecting pairs of words. The
parser also produces a "constituent" (HPSG style phrase tree)
representation of a sentence (showing noun phrases, verb phrases,
etc.). The RelEx extension provides Stanford-style Dependency
Grammar output.

Master sites:

SHA1: 38d8bb26b853ab9406bc5a312b37b436ec00d066
RMD160: 0b2b4d867944a9a8362162938709f546b187897e
Filesize: 3472.144 KB

Version history: (Expand)

CVS history: (Expand)

   2016-09-12 16:06:08 by Makoto Fujiwara | Files touched by this commit (1) | Package updated
Log message:
Fix PLIST inconsistency after recent update to 5.3.7, sorry.
   2016-07-15 13:36:43 by Makoto Fujiwara | Files touched by this commit (3) | Package updated
Log message:
Updated textproc/link-grammer 5.2.5 to 5.3.7
Version 5.3.7: (7 May 2016)
 * Fix another MacOS build break, regarding library exports.

Version 5.3.6: (30 April 2016)
 * Add missing `parses-quotes-en.txt` file that python tests need.
 * Fix build break related to lg_fgetc when libeditline is missing.

Version 5.3.5: (28 April 2016)
 * Modified (hacked) Kazakh dictionary.
 * MacOS bug fix: fgetc behaves oddly in OSX, see bug #293.

Version 5.3.4: (16 March 2016)
 * Fix broken handling of apostrophe (issue #281).
 * Revamp the README file; describe transitivity.
 * Revised Turkish dictionary from Tatiana Batura, et al.
 * Prototype Kazakh dictionary from Tatiana Batura, et al.
 * Parse priority tweaks for the OpenCog chatbot.
 * Fix Windows printing problem affecting some utf8 codepoints (issue #285).

Version 5.3.3: (23 December 2015)
 * Improve support for quoted phrases.
 * Fixes for assorted zero-infinitive speech acts.
 * Add 37 paraphrasing verbs.
 * Add Greek mythological names.
 * A few dozen more common computing terms added to dictionary.
 * Misc coordination and question fixes.
 * Misc abbreviations.
 * Vietnamese dictionaries!
 * Major overhaul of subject-verb inversion.
 * Performance improvements on long sentences. (pull #247)
 * Change default setting of 'islands_ok' back to false (bug #140).
 * Fix for build break on Mac OSX el_capitan w/clang (bug #255).
 * Disable perl bindings by default; use Lingua::LinkParser

Version 5.3.2: (4 December 2015)
 * Performance improvements, esp. for long sentences.
 * Use std=c11 (the 2011 C standard) by default.
 * Partial Irish English support.
 * A few dozen common computing terms added to dictionary.
 * Fix for build break on Mac OSX.

Version 5.3.1: (22 November 2015)
 * Fix build break with SAT solver.

Version 5.3.0: (22 November 2015)
 * Major redesign of the python bindings.
 * Major redesign of sentence tokenization (the "wordgraph" design)
 * Verb 'steal' is optionally transitive.
 * Fixes for misc MSVC warnings.
 * Hebrew dictionary expansion.
 * Enhanced diagram printing, giving more space for link names.
 * Minor work on phonetic agreement for 'a' vs. 'an'.
 * Add ability to histogram the costs of different parses.
 * Improve support for splitting sentences.
 * Change default setting of 'islands_ok' to true.
 * Improve performance on long sentences.
 * Fix rare crash due to memory corruption on long sentences.
 * Random morphology generation can be enabled at runtime.
 * Remove obsolete, unmaintained MacOSX build file.
 * Extensive updates to man page.
 * Fix crash on long sentences (issue #137).
 * Fix a memory leak in language bindings (issue #138).
 * Remove bogus post-processor API function.
 * Fix broken domain letter printing.
 * New regex-file feature - negative regex'es.
 * Correct the handling of moprhology stems with non-LL links.
 * SAT solver now linked statically.
 * Assorted SAT sovler cleanup and improvements.
 * Performance improvement in fast matcher: 15% faster on fixes.batch.
   2015-11-04 03:00:17 by Alistair G. Crooks | Files touched by this commit (797)
Log message:
Add SHA512 digests for distfiles for textproc category

Problems found locating distfiles:
	Package cabocha: missing distfile cabocha-0.68.tar.bz2
	Package convertlit: missing distfile clit18src.zip
	Package php-enchant: missing distfile php-enchant/enchant-1.1.0.tgz

Otherwise, existing SHA1 digests verified and found to be the same on
the machine holding the existing distfiles (morden).  All existing
SHA1 digests retained for now as an audit trail.
   2015-08-26 14:23:01 by Thomas Klausner | Files touched by this commit (7) | Package updated
Log message:
Update to 5.2.5:

[ANNOUNCE] Link Grammar version 5.2.0 is now available.

This is a major release of the parser, with many important changes in
it.  The internals of the parser have been re-organized, resulting in
a speedup of 2x to 4x for typical English texts.  Multiple multi-
threading bugs were fixed, and there is now a simple multi-threading
unit test.  A memory leak was fixed, and a memory over-consumption
bug was fixed.  These changes were enabled by the final removal of the
"fat link" code from the parser.

Parser internals work continues apace: it is expected that a version
5.3.0 will follow shortly, featuring a completely re-designed tokenizer.
This redesign should enable simpler and better morphology support.

The ChangeLog notes other fixes as well:

Version 5.2.0 (27 December 2014)
 * y'all, ain't, gonna, y'gotta: Beverly Hillbillies basilect.
 * Permanent removal of the fat-link code.
 * Remove deprecated constituent tree code.
 * Windows: add terminal screen resizing support.
 * Windows: a build fix.
 * reign, rule, run, leave, come: can take predicative adjective.
 * Rework costs for many verb-derived adjectives.
 * Handle (predicative) adjectival modifiers for assorted perfect verbs.
 * Fixes for various color names.
 * Fixes for various affirmative answers.
 * Add 100 missing verbs.
 * Add preliminary lxc-docker (docker.io) support.
 * Remove MSVC6 support.
 * Fix memleak introduced in version 5.1.0
 * Speedup of 1.7x to 4x (depending on text) from linkage processing redesign.
 * Fix multi-threading safety bug.
 * Fix link-and-domain printing alignment (to handle utf8 char widths).
 * Windows: fixes for MSVC12 support.
 * Fix memory consumption bug (EMPTY_WORD) introduced in version 4.7.10.
 * Get rid of xrealloc, which clashes with libbfd symbol xrealloc.
 * Add multi-threaded parsing unit test.


Link Grammar version 5.1.2 is now available. Download from:

http://www.abisource.com/downloads/link … 1.2.tar.gz

The most serious fix in this release is a build-break fix for Apple OSX Mavericks.

Other fixes, from the ChangeLog:

 * Fix greeting: "How do you do?"
 * Fix indirect object in 'what' questions: 'To what do you owe your success?'
 * Fix assorted questions with verb "to be".
 * Compile fixes for Apple OSX version "Mavericks"


[ANNOUNCE] link-grammar version 5.1.0

This version includes a number of important changes. One of these is
that the connectors can now be given a direction (head and tail
indicators), so that link-grammar dependencies can now be true,
hierarchical dependency arrows.  This is of marginal importance for
English, where dependency directions are implicit, but is vital for
free-word-order languages, where bi-directional links are not enough.

Another important change is that costs can now be arbitrary floating
point numbers. This is particularly useful for providing fine-grained
parse ranking.  The LG cost system assigns a "cost" to every connector,
and the sum-total of costs for a sentence determines the parse ranking.
Since costs are additive, they behave as entropies (log P -- the
logarithm of a probability: probabilities are multiplicative, logarithms
are additive).

Under the covers, there's been some major work on the tokenization
(splitting sentences into words) and morphology (splitting words into
morphemes) code.  This work is ongoing, and should eventually result in
much better support for non-English languages.

Other notable changes include an updated Russian dictionary, and an
assortment of changes to the English dictionary.  An intriguing step
towards phonology: LG can now distinguish between the use of the
determiners "a" and "an" preceding nouns that start with \ 
or vowels.  Whether fancier phonology support is possible is a curious

The full changelog is below:

 * Updated Russian dictionaries from Sergei Protasov.
 * Added morphology-based unknown-word handling for Russian, from Sergei.
 * Fix up fat-linkage code, which was recently broken...
 * API cleanup: many command-line options never belonged in the API.
 * New emoticon support was clobbering certain dictionary words.
 * Fix: "Go to spot X", "It happens at time T."
 * Add a dozen missing verbs.
 * Minor work on greetings.
 * Add mechanism for denoting fractional costs in the file-backed dict.
 * Fix: broken handling of gerunds (due to bad verb-wall connectors)
 * Major redesign of morpheme splitting mechanism (from AmirP)
 * Minor extensions to support numeric formulas, e.g. 1 + 1 = 2.
 * Remove fat linkage support from the SAT solver.
 * Enable build of SAT solver by default.
 * Fix multiple bugs with unit stripping.
 * Add bounds-checking to the C API.
 * Fix the old disjunct-printing implementation.
 * Add support for easy-to-use link direction indicator.
 * Add random morphology generator tool.
 * Partial support for phonetic use of "a" vs. "an" for English.
 * Rework how coordination between conjunctions works: "either... or \ 
...", etc.
 * Major redesign of tokenization mechanism (from AmirP)


Version 5.0.0 of the Link Grammar Parser is now available.

(Yes, its April 1st.  No, this is not a joke.  Maybe I'll think of
something snarky next year.)

We are proud to announce a major new release of the Link Grammar Parser!
It contains many important changes and new additions.  One of the most
significant changes is that the license has been changed from the BSD
license to the LGPL.  This was done to enable considerably more
flexibility in accepting contributions to the project: it seems that
few are particularly interested in contributing to a BSD-licensed project.
This change has enabled folding in some new work:

 o Arabic and Persion dictionaries!  These were previously maintained
   as separate add-ons.  Including them as part of the distribution
   should make it easier for interested users.

 o A new 'bindings' directory, containing code for Java, Python, Common
   Lisp, OCaML and AutoIt programming languages.  The Python bindings
   are an updated version of the older pylinkgrammar-0.2.13 bindings.
   A SWIG interface file should make it easy to create other language
   bindigns as well.

 o Improved morphology support. This will be invisible to most users,
   but it lays the groundwork for add Hebrew support to the parser.

 o Expanded Lithuanian support. This remains a simplistic prototype, but
   it now performs a more sophisticated morphological analysis.

 o Experimental Turkish and Hebrew dictionaries.

 o A demo of the JSON parser server: it shows how to run the server,
   which will accept accept raw sentences on a socket, and returns the
   parsed forms.

 o Some slightly incompatible changes to the API: it was time for some

 o Misc minor updates to the English Language dictionaries.

 o Preliminary work for SQL-backed dynamic dictionaries. This should
   enable certain types of automated language learning.

The full changelog is shown below.


Version 5.0.0 (1 April 2014)
 * License upgrade to LGPLv2.1
 * Arabic dictionaries, from Jon Dehdari
 * Persian dictionaries, from Jon Dehdari
 * Support for Hebrew tokenization, from Amir P.
 * Fix wild-card matching for user-supplied word lookup.
 * Prototype Turkish dictionary from Can Bruce.
 * Re-arrange programming language bindings directory.
 * Adopt the orphaned/unsupported pylinkgrammar Python bindings.
 * Deprecate the obsolete CNode interface.
 * Provide low-level perl bindings.
 * Adopt the orphaned/unsupported OCaML bindings.
 * Support affirmative replies: "Who did it?" "John's evil twin."
 * Expanded Lithuanian dictionary.
 * Minor disjunct printing fixes.
 * Fix: "Mary is too XXX to talk to."
 * Prototype Hebrew dictionary from Amir P.
 * Change !suffixes flag to !morphology.
 * Introduce a bi-directional connector, for free-word-order languages.
 * Introduce a symmetric-AND operator, for free-word-order languages.
 * Add demo shell script for running the JSON parse server.
 * Bugfix: Java server failing when input sentence has commas in it!
 * New !test and !debug commands for selective debugging support.
 * Print post-processing rejection message, when !bad is enabled.
 * Remove some deprecated functions for C API.
 * Remove all deprecated functions from Java API.
 * Initial support for an SQL-backed dynamic dictionary.


Version 4.8.5 of the Link Grammar Parser is now available.

This is the third release in about a week; each prompted by a
build-break in the previous version.  Sorry!  There's been assorted
(minor) new work, and this has been enough to cause trouble for
various people.

Some notable changes in the last 6 weeks:
 * Improved Russian (UTF-8) support for MSWindows users.
 * Build files for MSVC12
 * Several Java binding fixes
 * English dictionary: add a verb-wall connector for present participles.

A full list of changes is given below. If none of these seem to affect
you, there is no particular need to upgrade.


Version 4.8.5 (5 January 2014)
 * Update memory usage accounting; fix accounting bugs.
 * Fix Java garbage collection bug.
 * Fix numerous compiler warnings in the SAT-solver code.
 * Fix build-break involving multiple declaration of 'Boolean'.

Version 4.8.4 (30 December 2013)
 * Fix build break for Mac OSX.

Version 4.8.3 (30 December 2013)
 * Create new msvc12 build files, restore old msvc9 files.
 * Revert location of the Windows mbrtowc declaration.
 * Add verb-wall connector for present participles.
 * Fix build-time include file directory paths.
 * Provide the 'any' language to enumerate all possible linkages.
 * Fix recognition of U+00A0, c2 a0, NO-BREAK SPACE as whitespace.
 * Improve parse-time performance of exceptionally long sentences.
 * Fix crash on certain sentences containing equals sign.

Version 4.8.2 (25 November 2013)
 * More MSWindows UTF-8/multi-byte fixes (for Russian).
 * Add missing JSONUtils file.

Version 4.8.1 (21 November 2013)
 * Ongoing work on viterbi.
 * Updated MSVC9 project files from Jand Hashemi (Lucky--)
 * Fix important bug in Java services: return top parses, not random ones.
 * Java: for the link-diagram string, do not limit to 80 char term width.
 * Windows: UTF-8 fixes so that Russian works in most MSWindows locales.


Version 4.8.0 of the Link Grammar Parser is now available.

This is the start of a new version series, containing an important
change to the English language dictionary. Three new link types are
introduced WV, CV and IV. These are used to connect the left-wall to
the primary verb of the sentence (WV), to connect the ruling clause
to the primary verb of a dependent clause (CV), and a similar link
for certain infinitive verbs (IV).  The goal of these links is to
make it easier to locate verbs, and thus to provide a more direct
mapping from the link-grammar formalism to a dependency parse (as
dependency parses always put the verb at the root of a sentence).

These are not the first links that explicitly indicate root verbs:
several other link types already play this role: The AF, CP, Eq, COq
and B links already play this role. The new WV, CV and IV links
round out this capability and do so in a very general form.  See
http://www.abisource.com/projects/link- … on-WV.html
for details.

With this release, we expect that all (non-auxiliary) verbs in a
sentence will be linked either to the wall, or to a controlling parent.
We also expect there to be some additional fixes and tightening-up
to occur in future releases, especially in regards to comparative

This release also includes a variety of fixes to the Java API/server.
In addition, some ancient, deprecated C code was removed.


Version 4.8.0 (24 October 2013)
 * Fix "he answered yes"
 * Support bulleted, numbered lists.
 * New link types from Lian Ruiting, for identifying the head-verb.
 * Java: fix bug when totaling WordNet word-sense score.
 * Java: add info to README about using the JSON parse server.
 * Java: remove many deprecated functions.
 * C API: remove some deprecated functions.
 * Java: fix silent failure when library is not found.
 * Java: Add support for fetching the ASCII-art diagram string.
 * Java: Fix insane language selection initialization.
 * Fix: "The pig runs SLOWER than the cat."
 * Fix: conjoined superlatives: "... the longest and the farthest."
 * Fix: "inside" can be used with conjunction: "near or \ 
 * Fix: conjoined question modifiers: "exactly when and precisely where..."
 * Fix: issue 59: crash/corruption when dictionary opened twice.
 * Fix: assorted exclamations!


Version 4.7.12 of the Link Grammar Parser is now available.

The biggest change in this version is a sharply updated Russian
dictionary, which fixes a large number of bugs generated during
during the initial release.  Thanks to Sergey Protasov who did
almost all this work!

The other notable change is that the fat-link code is no longer
build by default.  It will be permanently removed in some future
version, "real soon now".

A miscellany of other minor changes are listed below.

The link-grammar homepage:

http://www.abiword.org/downloads/link-g … .12.tar.gz

The Link Grammar Parser is a syntactic parser of English (and other
languages as well), based on link grammar, an original theory of English
syntax.  Given a sentence, the system assigns to it a syntactic
structure, which consists of a set of labelled links connecting pairs of
words. The parser also produces a "constituent" (Penn tree-bank style
phrase tree) representation of a sentence (showing noun phrases, verb
phrases, etc.). The RelEx extension provides dependency-parse output.

Version 4.7.12 (25 May 2013)
 * Large fixes to the Russian dictionaries.
 * Windows: Explicitly fail if cygwin version is too old.
 * Tweak the lt dict to work again with the modern parser.
 * Make the fat linkages code be compile-time configurable.
 * Disable fat linkages by default; mark as deprecated.
 * Fix SAT-solver build; recent changes had broken it.
 * Export read-dict.h as a public API.
 * Ongoing development of the Viterbi prototype.
 * Windows: some UTF8/widechar refactoring.
 * Java bindings: add method to set the language.
 * CMake: add version checking to the CMakefile
 * Fix: failed handling of capitalized first word for Russian.
 * Fix: stemming failures in many cases (for Russian dictionaries)
 * Add flag to suppress stem-suffix printing.
 * Windows: Fixes to MSVC6 build files.
 * Fix: hash-table bug affecting Russian dictionaries
   2012-10-25 08:57:09 by Aleksej Saushev | Files touched by this commit (587)
Log message:
Drop superfluous PKG_DESTDIR_SUPPORT, "user-destdir" is default these days.
   2011-10-08 09:29:46 by Ryo ONODERA | Files touched by this commit (1)
Log message:
Add missing lines. Fix 'make package'.
   2010-11-26 15:43:31 by Adam Ciarcinski | Files touched by this commit (5)
Log message:
Changes 4.7.0:
* Fix: hunspell configuration on Fedora (bugtracker issue 47)
* Fix: 'turn' with adjective: "She turned him green" from wingedtachikoma
* Fix: comma-conjoined modifiers: "It tastes bitter, not sweet."
* Fix: conjoined question words: "When and where is the party?"
* Fix: recognize short, capitalized words (Los, La, etc.).
* Treat colon as synonym for is: "The answer: yes."
* Fix: begin with prepositions: "It all began in Chicago."
* Fix: "What does it come to?" and related.
* Fix: null infinitive: "I'd like to, I want to."
* Fix: "Because I said so."
* Fix: "sure" as preverbal adverb: "It sure is."
* Fix: Gerunds with determiners: "a running of the bulls"
* SJ link for conjoined nouns/noun phrases.
* Sort linkages according to whether fat linkage was used.
* Add flag to enable use of fat linkage during parsing.
  (Fat links now disabled by default).
* Add male/female gender tags to misc nouns.
* Fix: misc optionally transitive verbs: mix, paint, boot
* Fix: word order: "look about fearfully", "look fearfully \ 
about", around
* Fix: recognize simple fractions
* Fix: "is" with uncountable nouns: "there is blood on your \ 
* Fix: Roman numeral suffixes e.g. "Henry VIII"
* Fix: regression in dates followed by punctuation. "In the 1950s, ..."
* Fix: verbs drank, drunk are optionally transitive.
* Fix: regression: "all the X", X can be plural or mass.
* Fix: verbs paint, color may be ditranstive: "paint the car bright green"
   2009-06-30 02:07:26 by Joerg Sonnenberger | Files touched by this commit (159)
Log message:
Mark packages as MAKE_JOBS_SAFE=no that failed in a bulk build with
MAKE_JOBS=2 and worked without.