SQL problems

Abhijeet Kamble
4 min readDec 5, 2019

Two common tasks in data science work are filtering data so we can work with just the data we’re interested in, and the other is grouping data so we can calculate values for an entire group such as a product category or a department. Let’s start with filtering data using numeric values. We’ll use the staff table for this example. Let’s assume we want to list everything with a salary greater than $100,000. So I’ll use the select command, and for this example, we’ll include the last name, department, and of course, salary. And we will select this from the staff table. Now, we only want to include a certain subset of the employees, so we use a WHERE clause. We’re going to say WHERE salary is greater than 100,000, and if we execute that statement, we’ll get a list of employees who work in various departments, but all of whom earn salaries over $100,000. Filtering based on character strings works in similar ways. Now, let’s generate a list of employees who work in the tools department, and we’ll keep the columns the same. So I’ll just change the WHERE clause to say WHERE department = tools, and I’ll execute that statement. And now all of the employees listed are in the tools department, and if we were to browse a little bit, we’d notice that yes, in fact, they are all in the tools department. Now let’s combine the two filters. It’s pretty common to have more than one condition we want to filter on. In fact, sometimes WHERE clauses can be the longest part of a select statement, because there are so many conditions that need to be satisfied. To filter for rows that satisfy both conditions, we use the AND keyword. So I’ll include that in my WHERE clause, and I’ll say WHERE department = tools AND salary > 100,000. And we’ll execute this statement, and we’ll notice that sure enough, everyone listed here is in the tools department and has a salary greater than $100,000. Now, if we want at least one of the conditions to be met but maybe not both, then we can use the OR keyword. And I’ll change the AND to OR, execute that statement, an you’ll notice, now we have some people who work in departments other than tools, but all of the people who aren’t in the tools department have salaries over $100,000. And if we browse a little bit, we’ll notice that some of the people in the tools department are listed, but they have salaries less than $100,000. For example, in this case, Watson is in the tools department and earns a salary of $81,870. Now, in SQL, the LIKE operator is used to match patterns. There are many options for working with the LIKE operator. For example, if we wanted to look at all of the departments that begin with the letter B, we could change this WHERE clause to say WHERE department LIKE ‘B%.’ And this means select all staff from the departments where the department name begins with a B and has zero or more characters after it. And let’s, to make this a little more interesting, let’s add a GROUP BY statement here, and let’s GROUP BY department, and we’ll change this. Instead of doing less than the department, we’ll remove the last_name, we’ll GROUP BY department, and we will sum the salary. And this will allow us to see how much we’re spending on salaries in each department that begins with a B. We’ll execute that statement, and you’ll notice, there are three departments listed, and then the sum of the salaries is listed as well. Now, we can make the LIKE operator actually be more specific. For example, if we just wanted to see departments that began with the letter B followed by the letter O, we could specify that, and if we execute the statement, as we expected, we see just one department, and that’s the one that begins with the letters B-O. The special symbol percent can be used in different parts of the string. It does not have to be at the end of the string. For example, if you want information on the departments that begin with B and end with Y, you could use the following query: like that, we’ll have a percent in the middle, and we’ll end with Y. With this query, any department that begins with B and ends with Y will match. So if we execute this statement, as we expect, we have two departments. Both begin with B and both end with Y. One note, be careful when using percent at the beginning of a string. When this happens, the database cannot use indexes that may be on the column. This will cause the database to scan every row in the table looking for matches. That’s okay, it just may take a lot longer than you expect.

--

--