Regular expression to match string not containing a word.

The fact that regex doesn’t support inverse matching is not entirely true. You can mimic this behavior by using negative look-arounds:


1
^((?!hede).)*$

The regex above will match any string, or line without a line break, not containing the (sub) string ‘hede’.As mentioned, this is not something regex is “good” at (or should do), but still, it is possible.

Explanation

A string is just a list of

1
n

characters. Before, and after each character, there’s an empty string. So a list of

1
n

characters will have

1
n+1

empty strings. Consider the string

1
"ABhedeCD"

:


1
2
3
4
    +--+---+--+---+--+---+--+---+--+---+--+---+--+---+--+---+--+
S = |e1| A |e2| B |e3| h |e4| e |e5| d |e6| e |e7| C |e8| D |e9|
    +--+---+--+---+--+---+--+---+--+---+--+---+--+---+--+---+--+
index    0      1      2      3      4      5      6      7

where the

1
e

‘s are the empty strings. The regex

1
(?!hede).

looks ahead to see if there’s no substring

1
"hede"

to be seen, and if that is the case (so something else is seen), then the

1
.

(dot) will match any character except a line break. Look-arounds are also called zero-width-assertions because they don’t consume any characters. They only assert/validate something.

So, in my example, every empty string is first validated to see if there’s no

1
"hede"

up ahead, before a character is consumed by the

1
.

(dot). The regex

1
(?!hede).

will do that only once, so it is wrapped in a group, and repeated zero or more times:

1
((?!hede).)*

. Finally, the start- and end-of-input are anchored to make sure the entire input is consumed:

1
^((?!hede).)*$

As you can see, the input

1
"ABhedeCD"

will fail because on

1
e3

, the regex

1
(?!hede)

fails (there is

1
"hede"

up ahead!).

Jan 18 at 15:49 community-wiki Thanks to Bart Kiers

Scroll to top