⚠️ Disclaimer: Datasets hyperlinked from this page are not owned or distributed by Google. Such datasets are made available by third parties. Please review the terms and conditions made available by the third parties before using the data.
This codebase provides a Natural Language Processing modeling toolkit written in TF2. It allows researchers and developers to reproduce state-of-the-art model results and train custom models to experiment new research ideas.
We provide modeling library to allow users to train custom models for new research ideas. Detailed instructions can be found in READMEs in each folder.
Layers are the fundamental building blocks for NLP models. They can be used to
assemble new tf.keras
layers or models.
Networks are combinations of tf.keras
layers (and possibly other networks).
They are tf.keras
models that would not be trained alone. It encapsulates
common network structures like a transformer encoder into an easily handled
object with a standardized configuration.
Models are combinations of tf.keras
layers and models that can be trained.
Several pre-built canned models are provided to train encoder networks. These
models are intended as both convenience functions and canonical examples.
Losses contains common loss computation used in NLP tasks.
Losses |
---|
weighted_sparse_categorical_crossentropy_loss |
We provide SoTA model implementations, pre-trained models, training and
evaluation examples, and command lines. Detail instructions can be found in the
READMEs for specific papers. Below are some papers implemented in the repository
and more NLP projects can be found in the
projects
folder:
We provide a single common driver train.py to train above SoTA models on popular tasks. Please see docs/train.md for more details.
We provide a large collection of baselines and checkpoints for NLP pre-trained models. Please see docs/pretrained_models.md for more details.
Please read through the model training tutorials and references in the docs/ folder.