25.8.20
This website uses cookies to ensure you get the best experience on our website. Learn more

NLP with LLMs: Working with Tokenizers in Hugging Face

Skillsoft issued completion badges are earned based on viewing the percentage required or receiving a passing score when assessment is required. Hugging Face, a leading company in the field of artificial intelligence (AI), offers a comprehensive platform that enables developers and researchers to build, train, and deploy state-of-the-art machine learning (ML) models with a strong emphasis on open collaboration and community-driven development. In this course, you will discover the extensive libraries and tools Hugging Face offers, including the Transformers library, which provides access to a vast array of pre-trained models and datasets. Next, you will set up your working environment in Google Colab. You will also explore the critical components of the text preprocessing pipeline: normalizers and pre-tokenizers. Finally, you will master various tokenization techniques, including byte pair encoding (BPE), Wordpiece, and Unigram tokenization, which are essential for working with transformer models. Through hands-on exercises, you will build and train BPE and WordPiece tokenizers, configuring normalizers and pre-tokenizers to fine-tune these tokenization methods.

Issued on

July 24, 2024

Expires on

Does not expire