./textproc/py-sentencepiece, Unsupervised text tokenizer for Neural Network-based text generation

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]


Branch: CURRENT, Version: 0.1.97, Package name: py310-sentencepiece-0.1.97, Maintainer: pkgsrc-users

SentencePiece is an unsupervised text tokenizer and detokenizer
mainly for Neural Network-based text generation systems where the
vocabulary size is predetermined prior to the neural model training.
SentencePiece implements subword units (e.g., byte-pair-encoding
(BPE)) and unigram language model with the extension of direct
training from raw sentences. SentencePiece allows us to make a
purely end-to-end system that does not depend on language-specific
pre/postprocessing.

This package contains the Python module.


Master sites:

Filesize: 11665.465 KB

Version history: (Expand)


CVS history: (Expand)


   2023-04-25 16:55:28 by Thomas Klausner | Files touched by this commit (1)
Log message:
py-sentencepiece: not for python 2
   2023-03-13 15:18:27 by Thomas Klausner | Files touched by this commit (4)
Log message:
textproc/py-sentencepiece: import py-sentencepiece-0.1.97

SentencePiece is an unsupervised text tokenizer and detokenizer
mainly for Neural Network-based text generation systems where the
vocabulary size is predetermined prior to the neural model training.
SentencePiece implements subword units (e.g., byte-pair-encoding
(BPE)) and unigram language model with the extension of direct
training from raw sentences. SentencePiece allows us to make a
purely end-to-end system that does not depend on language-specific
pre/postprocessing.

This package contains the Python module.