COMP 200: Elements of Computer Science
Spring 2013

Practice with Regular Expressions

Now let's practice with REs more, including the new features that you've learned about. These examples will use both re.findall and the British National Corpus.

British National Corpus

The BNC is a large collection of (British) English sentences. This database provides much more realistic examples than the silly sentences that I come up with. It is also much larger than any of the text data files that we have provided as samples. This database is often used for research generally in the area of computational linguistics, especially for work in language education and also natural language processing. This is not the only such corpus (a large body of text), this one is convenient for our purposes, because it provides a simple RE search on its home page.

There are a few things to know about this search, that we will illustrate by examples. Type in each of the following examples into the search box on the BNC home page.

Exercises

For each of the following, create an example to match with re.findall and also search the BNC. Because of the differences noted in the previous examples, you may need slightly different REs for each use.

Additional Systems Searchable by REs