Path to this page:
./
textproc/py-sentencepiece,
Unsupervised text tokenizer for Neural Network-based text generation
Branch: CURRENT,
Version: 0.2.0,
Package name: py312-sentencepiece-0.2.0,
Maintainer: pkgsrc-usersSentencePiece is an unsupervised text tokenizer and detokenizer
mainly for Neural Network-based text generation systems where the
vocabulary size is predetermined prior to the neural model training.
SentencePiece implements subword units (e.g., byte-pair-encoding
(BPE)) and unigram language model with the extension of direct
training from raw sentences. SentencePiece allows us to make a
purely end-to-end system that does not depend on language-specific
pre/postprocessing.
This package contains the Python module.
Master sites:
Filesize: 11700.011 KB
Version history: (Expand)
- (2025-06-06) Updated to version: py312-sentencepiece-0.2.0
- (2023-03-13) Package added to pkgsrc.se, version py310-sentencepiece-0.1.97 (created)
CVS history: (Expand)
2025-06-06 09:51:53 by Thomas Klausner | Files touched by this commit (5) |  |
Log message:
{py-,}sentencepiece: update to 0.2.0
|
2023-04-25 16:55:28 by Thomas Klausner | Files touched by this commit (1) |
Log message:
py-sentencepiece: not for python 2
|
2023-03-13 15:18:27 by Thomas Klausner | Files touched by this commit (4) |
Log message:
textproc/py-sentencepiece: import py-sentencepiece-0.1.97
SentencePiece is an unsupervised text tokenizer and detokenizer
mainly for Neural Network-based text generation systems where the
vocabulary size is predetermined prior to the neural model training.
SentencePiece implements subword units (e.g., byte-pair-encoding
(BPE)) and unigram language model with the extension of direct
training from raw sentences. SentencePiece allows us to make a
purely end-to-end system that does not depend on language-specific
pre/postprocessing.
This package contains the Python module.
|