Abhijeet Kamble
3 min readOct 22, 2019

--

NPL and NLTK

What are NLP and NLTK?

Let’s jump straight into what Natural Language Processing actually means. There are various definitions out there, but one that I like is this: Natural language processing is a field concerned with the ability of a computer to understand, analyze, manipulate, and potentially generate human language. By human language, we’re simply referring to any language used for everyday communication. This can be English, Spanish, French, anything like that. Now it’s worth noting that Python doesn’t naturally know what any given word means. All it will see is a string of characters. For instance, it has no idea what natural actually means. It sees that it’s seven characters long, but the individual characters don’t mean anything to Python and certainly the collection of those characters together don’t mean anything, either. So we know that, what an N is, what an A is, and we know that together, those seven characters makes up the word natural, and we know what that means. So NLP is the field of getting the computer to understand what naturally actually signifies, and from there we can get into the manipulation or potentially even generation of that human language. You probably experience natural language processing on a daily basis. They may not really even know it. So here are a few examples that you may see on a day to day basis. The first would be a spam filter, so this is just where your email server is determining whether an incoming email is spam or not, based on the content of the body, the subject, and maybe the email domain. The second is auto-complete, where Google is basically predicting what you’re interested in searching for based on what you’ve already entered and what others commonly search for with those same phrases. So if I search for natural language processing, it knows that many other people are interested in learning NLP with Python, or learning it through a course(that’s how I learned), or looking for jobs related to natural language processing. So it can auto-complete your search for you. The last is auto-correct, where say a smart . keyboard is trying to help you correct a misspelling. I like this example because it shows how auto-correct has actually evolved over time and continues to evolve and learn. So with iOS6, if you’re trying to say, “I’ll be ill tomorrow,” It wouldn’t necessarily correct I’ll be I’ll tomorrow until iOS7, where it actually corrects, it auto-completes tomorrow and corrects I’ll into ill. So it’ll correctly send as I’ll be ill tomorrow. So that just kind of shows how NLP is has evolved in it’s early . stages and how even super powerful systems like iOS(in Steve Jobs’ time) were learning what natural language even means. Now NLP is a very broad umbrella that encompasses many topics. A few of those might be sentiment analysis, topic modeling, text classification, and sentence segmentation or part-or-speech tagging. The core component of natural language processing is extracting all the information from a block of text that is relevant to a computer understanding the language. This is task specific, as well. Different information is relevant for a sentiment analysis task than is relevant for a topic modeling task. So that’s a very quick introduction into what natural language processing is. Now let’s start thinking about the tools that we actually need to build these processes. The natural language toolkit is the most utilized package for handling natural language processing tasks in Python. Usually called NLTK for short, it is a suite of open-source tools originally created in 2001 at the University of Pennsylvania for the purpose of making building NLP processes in Python easier. This package has been expanded through the extensive contributions of open-source users in the years since its original development. NLTK is great because it basically provides a jumpstart to building any NLP process by giving you the basic tools that you can then chain together to accomplish your goal rather than having to build all those tools from scratch. A lot of tools are packaged into NLTK, and in the next blog we’ll dive into the package and explore some of those tools.

--

--