COMP 200 Elements of Computer Science &
COMP 130 Elements of Algorithms and Computation
Spring 2012

Python Regular Expressions “Finger Exercises”

In class, we searched in a text for a “fixed” pattern, i.e., a single string. E.g., we searched in "Can I help you?" for the string "help". Often, however, we want to search for a more general pattern. We will start to explore regular expressions in these finger exercises, and follow this up in class.

Consider looking for the Semitic root “S–L–M”, as found in words such as “Islam”, “Muslim”, “salam”, “shalom”, “Solomon”, and “Suleiman”.

To search for this, we want to specify that the pattern is "s" or "S", followed by any letter, followed by "l", followed by any letter, followed by "m". This is captured by the regular expression "[sS][a-z]*l[a-z]*m". Let's break this down in its parts to understand it:

Thus, this represents the desired pattern.

Regular Expressions in Python

How do we use regular expressions in Python code? All of the examples are going to use the re package, so first let's import that:

In these finger exercises, we will focus on the function re.search. It is mostly straightforward: its first argument is a regular expression to search for, and its second argument is the text to search in.

The result of this is a bit odd, however. It isn't True, but something called a Match Object. While a Match Object can be used in various ways, including to find where the pattern matched in the text, but we'll consider a very simple way of using it.

Let's see one more example like this.

When there is no match, re.search doesn't return False. Instead, it returns None. That's a kind-of weird value which Python doesn't even print. Again, an easy way to use this is the following.

In the following examples, we'll just show the re.search call. As in the above examples, it might be more useful to put the code within an if-else. Try each of these.

Another thing you can do with regular expressions is to specify that you want to match any one of multiple options. Here, let's see if the text contains either "green", "red", or "blue".

One of our future uses will be to find delimeters between words. Words can be separated by spaces and punctuation. Let's ignore punctuation for the moment and consider just spaces. The surprising thing is that we'll need to consider not just spaces, but space-like things. The two most easily-explained space-like characters are “tab” (corresponding to the Tab key) and the end-of-line ”new line” (roughly corresponding to the Enter key). These are denoted "\t" and "\n", respectively. There are several others, denoted "\r" (carriage return), "\f" (form feed), and "\v" (vertical tab). Together, these are known as whitespace.

So, we can see if a text string contains whitespace.

In class, we'll do a couple more examples of searching, plus we'll see a couple more functions in the regular expression library. To find all the matches of a pattern in a text, we'll use re.findall. To split a text into a bunch of words separated by delimeters, we'll use re.split.

Additional optional readings about Python regular expressions