pkgsrc.se | The NetBSD package collection

./textproc/sentencepiece, Unsupervised text tokenizer for Neural Network-based text generation

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]

Branch: CURRENT, Version: 0.1.97, Package name: sentencepiece-0.1.97, Maintainer: pkgsrc-users

SentencePiece is an unsupervised text tokenizer and detokenizer
mainly for Neural Network-based text generation systems where the
vocabulary size is predetermined prior to the neural model training.
SentencePiece implements subword units (e.g., byte-pair-encoding
(BPE)) and unigram language model with the extension of direct
training from raw sentences. SentencePiece allows us to make a
purely end-to-end system that does not depend on language-specific
pre/postprocessing.

Master sites:

https://github.com/google/ (Download)

Filesize: 11665.465 KB

Version history: (Expand)

(2023-03-13) Package added to pkgsrc.se, version sentencepiece-0.1.97 (created)

CVS history: (Expand)

2023-07-18 20:47:54 by Nia Alarie | Files touched by this commit (10)

Log message:
textproc: Adapt packages (where possible) to USE_(CC|CXX)_FEATURES

2023-07-13 15:55:10 by Nia Alarie | Files touched by this commit (2)

Log message:
*: Revert two recent commits that dropped a cwrappers-enforced C++ standard
by packages that already use -std=c++XX until the discussion about C++
standard versions is resolved.

Requested by pkgsrc-pmc@.

2023-07-13 15:49:17 by Nia Alarie | Files touched by this commit (11)

Log message:
*: Remove all instances of GCC_REQD where my name is the most recent
in 'cvs annotate' (part 2)

2023-07-12 16:52:16 by Havard Eidnes | Files touched by this commit (1)

Log message:
sentencepiece: Use mk/atomic64.mk as that's required.

2023-07-11 08:09:29 by Nia Alarie | Files touched by this commit (2)

Log message:
sentencepiece: Require a C++17 compiler the proper way.

2023-03-13 15:17:12 by Thomas Klausner | Files touched by this commit (6)

Log message:
textproc/sentencepiece: import sentencepiece-0.1.97

SentencePiece is an unsupervised text tokenizer and detokenizer
mainly for Neural Network-based text generation systems where the
vocabulary size is predetermined prior to the neural model training.
SentencePiece implements subword units (e.g., byte-pair-encoding
(BPE)) and unigram language model with the extension of direct
training from raw sentences. SentencePiece allows us to make a
purely end-to-end system that does not depend on language-specific
pre/postprocessing.