mdl

TF-NLP Model Garden

⚠️ Disclaimer: Datasets hyperlinked from this page are not owned or distributed by Google. Such datasets are made available by third parties. Please review the terms and conditions made available by the third parties before using the data.

This codebase provides a Natural Language Processing modeling toolkit written in TF2. It allows researchers and developers to reproduce state-of-the-art model results and train custom models to experiment new research ideas.

Features

Major components

Libraries

We provide modeling library to allow users to train custom models for new research ideas. Detailed instructions can be found in READMEs in each folder.

Layers

Layers are the fundamental building blocks for NLP models. They can be used to assemble new tf.keras layers or models.

Layers
BertPackInputs | BertTokenizer | BigBirdAttention | BigBirdMasks | BlockDiagFeedforward | CachedAttention
ClassificationHead | ExpertsChooseMaskedRouter | FactorizedEmbedding | FastWordpieceBertTokenizer
FeedForwardExperts | FourierTransformLayer | GatedFeedforward | GaussianProcessClassificationHead
HartleyTransformLayer | KernelAttention | KernelMask | LinearTransformLayer | MaskedLM | MaskedSoftmax
MatMulWithMargin | MixingMechanism | MobileBertEmbedding | MobileBertMaskedLM
MobileBertTransformer | MoeLayer | MoeLayerWithBackbone | MultiChannelAttention | MultiClsHeads
MultiHeadRelativeAttention | OnDeviceEmbedding | PackBertEmbeddings | PerDimScaleAttention
PerQueryDenseHead | PositionEmbedding | RandomFeatureGaussianProcess | ReZeroTransformer
RelativePositionBias | RelativePositionEmbedding | ReuseMultiHeadAttention | ReuseTransformer
SelectTopK | SelfAttentionMask | SentencepieceTokenizer | SpectralNormalization
SpectralNormalizationConv2D | StridedTransformerEncoderBlock | StridedTransformerScaffold
TNTransformerExpandCondense | TalkingHeadsAttention | TokenImportanceWithMovingAvg | Transformer
TransformerDecoderBlock | TransformerEncoderBlock | TransformerScaffold | TransformerXL
TransformerXLBlock |get_mask |TwoStreamRelativeAttention | VotingAttention | extract_gp_layer_kwargs
extract_spec_norm_kwargs

Networks

Networks are combinations of tf.keras layers (and possibly other networks). They are tf.keras models that would not be trained alone. It encapsulates common network structures like a transformer encoder into an easily handled object with a standardized configuration.

Networks
AlbertEncoder | BertEncoder | BertEncoderV2 | Classification | EncoderScaffold | FNet | MobileBERTEncoder
FunnelTransformerEncoder | PackedSequenceEmbedding | SpanLabeling | SparseMixer | XLNetBase
XLNetSpanLabeling

Models

Models are combinations of tf.keras layers and models that can be trained. Several pre-built canned models are provided to train encoder networks. These models are intended as both convenience functions and canonical examples.

Models
BertClassifier | BertPretrainer | BertPretrainerV2 | BertSpanLabeler | BertTokenClassifier | DualEncoder
ElectraPretrainer | Seq2SeqTransformer | T5Transformer | T5TransformerParams | TransformerDecoder
TransformerEncoder | XLNetClassifier | XLNetPretrainer | XLNetSpanLabeler | attention_initializer

Losses

Losses contains common loss computation used in NLP tasks.

Losses
weighted_sparse_categorical_crossentropy_loss

State-of-the-Art models and examples

We provide SoTA model implementations, pre-trained models, training and evaluation examples, and command lines. Detail instructions can be found in the READMEs for specific papers. Below are some papers implemented in the repository and more NLP projects can be found in the projects folder:

  1. BERT: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Devlin et al., 2018
  2. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations by Lan et al., 2019
  3. XLNet: XLNet: Generalized Autoregressive Pretraining for Language Understanding by Yang et al., 2019
  4. Transformer for translation: Attention Is All You Need by Vaswani et al., 2017

Common Training Driver

We provide a single common driver train.py to train above SoTA models on popular tasks. Please see docs/train.md for more details.

Pre-trained models with checkpoints and TF-Hub

We provide a large collection of baselines and checkpoints for NLP pre-trained models. Please see docs/pretrained_models.md for more details.

More Documentations

Please read through the model training tutorials and references in the docs/ folder.