Preparing Data for BERT Training

Short excerpt below. Click through to read at the original source.

This article is divided into four parts; they are: • Preparing Documents • Creating Sentence Pairs from Document • Masking Tokens • Saving the Training Data for Reuse Unlike decoder-only models, BERT’s pretraining is more complex.

Read at Source