Introduction to Natural Language Processing with Scikit-learn

Difficulty: Intermediate
Duration: 3 minutes and 23 seconds
Students: 256
Rating: 4.6/5

This lesson covers the basic techniques you need to know in order to fit a Natural Language Processing Machine Learning pipeline using scikit-learn, a machine learning library for Python.

Learning Objectives

  • Learn about the two main scikit-learn classes for natural language processing: CountVectorizer and TfidfVectorizer
  • Learn how to create Bag-of-Words (boW) representations and TF-IDF representations
  • Learn how to create a machine learning pipeline to classify BBC news articles into different categories

Intended Audience

This lesson is intended for anyone who wishes to understand how NLP works and, more particularly, how to implement it using scikit-learn.

Prerequisites

To get the most out of this lesson, you should already have an understanding of the Python programming language.

Covered Topics