Suppose you’re using a program that takes a regular expression as an argument. You didn’t get the match you expected, then you realize you’d like your search to be case-insensitive. If you were using grep you’d go back and add a -i flag. If you were writing a Perl script, …
April 8, 2023, 12:55 p.m.
ICD codes are diagnostic codes created by the WHO. (Three TLAs in just the opening paragraph!) The latest version, ICD-11, went into effect in January of this year. A few countries are using ICD-11 now; it’s expected to be at least a couple years before the US moves from ICD-10 …
Regular expressions can do a lot of tasks in practice that they cannot do in theory. That’s because a particular application of regular expressions comes with context and with error tolerance. For example, much has been said about how regular expressions cannot parse HTML. This is strictly true, but it …
Word problems Suppose you have a sequence of symbols and a set of rewriting rules for replacing some patterns of symbols with others. Now you’re given two such sequences. Can you tell whether there’s a way to turn one of them into the other? This is known as the word …
Jan. 24, 2022, 12:37 a.m.
In his book Mastering Regular Expressions, Jeffrey Friedl uses corner quotes to delimit regular expressions. Here’s an example I found by opening his book a random: ⌜(\.\d\d[1-9]?)\d*⌟ The upper-left corner at the beginning and the lower-right corner at the end are not part of the regular expression. This particularly comes …
The grep utility searches text files for regular expressions, but it can search for ordinary strings since these strings are a special case of regular expressions. However, if your regular expressions are in fact simply text strings, fgrep may be much faster than grep. Or so I’ve heard. I did …
April 26, 2021, 5:31 p.m.
In The Perl Cookbook, Tom Christiansen gives his rewrite of the Unix utility grep that he calls tcgrep. Why not grep with PCRE? You can get basically the same functionality as tcgrep by using grep with its PCRE option -P. Since tcgrep searches directories recursively, a more direct comparison would …
Here’s a frivolous problem whose solution illustrates three features of Perl: Arbitrary precision floating point Lazy quantifiers in regular expressions Returning the positions of matched groups. Our problem is to look for the digits 3, 1, 4, and 1 in the decimal part of π. First, we get the first …
There are several ways to quote strings in Python. Triple quotes let strings span multiple lines. Line breaks in your source file become line break characters in your string. A triple-quoted string in Python acts something like “here doc” in other languages. However, Python’s indentation rules complicate matters because the …
According to the Python Cookbook, “Mixing Unicode and regular expressions is often a good way to make your head explode.” It is thus with fear and trembling that I dip my toe into using Unicode with Greek and Hebrew. I heard recently that there are anomalies in the Hebrew Bible …
Sept. 20, 2020, 9:13 p.m.
I started this post by wanting to look at the frequency of LaTeX commands, but then thought that some people mind find the code to find the frequencies more interesting than the frequencies themselves. So I’m splitting this into two posts. This post will look at the shell one-liner to …
April 19, 2020, 7:27 p.m.
In this tutorial, learn how to use regular expressions and the pandas library to manage large data sets during data analysis. The post Tutorial: Python Regex (Regular Expressions) for Data Scientists appeared first on Dataquest.
Suppose you’re looking for instances of -42 in a file foo.txt. The command grep -42 foo.txt won’t work. Instead you’ll get a warning message like the following. Usage: grep [OPTION]... PATTERN [FILE]... Try 'grep --help' for more information. Putting single or double quotes around -42 won’t help. The problem is …
If you learned regular expressions by using a programming language like Perl or Python, you may be surprised when tools like grep seems broken. That’s because what you think of as simply regular expressions, these tools consider extended regular expressions. Tell them to search on extended regular expressions and some …
Special characters make text processing more complicated because you have to pay close attention to context. If you’re looking at Python code containing a regular expression, you have to think about what you see, what Python sees, and what the regular expression engine sees. A character may be special to …
Regular expressions are challenging, but not for the reasons commonly given. Non-reasons Here are some reasons given for the difficulty of regular expressions that I don’t agree with. Cryptic syntax I think complaints about cryptic syntax miss the mark. Some people say that Greek is hard to learn because it …