Regular Expressions in Obsidian (copied from mdn web docs)

I-d-as · June 6, 2022, 10:47am

I wanted to share this note for anyone who might find it useful. It separates the regex based on what is and is not implemented within Obsidian Search.

# Regex in Obsidian Search
Source:  [mdn web docs](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions)

- - - 

- [[#Regex in Obsidian Search]]
	- [[#Implemented]]
		- [[#Assertions Implemented]]
			- [[#Boundary-type Assertions]]
			- [[#Other Assertions]]
		- [[#Character Classes Implemented]]
		- [[#Groups and Ranges Implemented]]
		- [[#Quantifiers Implemented]]
	- [[#Not Implemented]]
		- [[#Character Classes Not Implemented]]
		- [[#Groups and Ranges Not Implemented]]

- - - 

- [[#Full Definitions]]
	- [[#Assertions]]
		- [[#Boundary-type Assertions Full Definitions]]
		- [[#Other Assertions Full Definitions]]
	- [[#Character Classes Full Definitions]]
	- [[#Groups and Ranges Full Definitions]]
	- [[#Quantifiers Full Definitions]]

- - - 

## Implemented

### Assertions (Implemented)

#### Boundary-type Assertions

Characters | Meaning
----------- | ---
`/^/` | Matches the [[#beginning of input\|beginning of input]]
`/$/` | Matches the [[#end of input\|end of input]]
`/\b/` | Matches a [[#b word boundary\|word boundary]]
`/\B/` | Matches a [[#B non-word boundary\|non-word boundary]]

- - - 

#### Other Assertions

> [!note] **Note:** The `?` character may also be used as a quantifier.

Characters | Meaning
----------- | ---
`/x(?=y)/` | [[#x y lookahead assertion\|Lookahead assertion]]: Matches "x" only if "x" is followed by "y"
`/x(?!y)/` | [[#x y negative lookahead assertion\|Negative lookahead assertion]]: Matches "x" only if "x" is not followed by "y"
`/(?<=y)x/` | [[#y x lookbehind assertion\|Lookbehind assertion]]: Matches "x" only if "x" is preceded by "y"
`/(?<!y)x/` | [[#y x negative lookbehind assertion\|Negative lookbehind assertion]]: Matches "x" only if "x" is not preceded by "y"

- - - 

### Character Classes (Implemented)

Characters | Meaning
----------- | ---
`/./` | Matches any [[#single character]] except line terminators
`/\d/` | Matches any [[#d digit\|digit]]
`/\D/` | Matches any character that is [[#D not a digit\|not a digit]] (Arabic numeral)
`/\w/` | Matches any [[#w alphanumeric character\|alphanumeric character]] from the basic Latin alphabet, including the underscore
`/\W/` | Matches any character that is [[#W not a word character\|not a word character]] from the basic Latin alphabet
`/\s/` | Matches a [[#s single white space character\|single white space character]], including space, tab, form feed, line feed, and other Unicode spaces
`/\S/` | Matches a [[#S single character other than white space\|single character other than white space]]
`/\t/` | Matches a [[#t horizontal tab\|horizontal tab]]
`/\n/` | Matches a [[#n linefeed\|linefeed]]
`/\/` | Indicates that the following character should be treated specially, or "[[#escaped]]".

- - - 

### Groups and Ranges (Implemented)

Characters | Meaning
---------- | ---
`/x`\|`y/` | [[#x y matches either\|Matches either]] "x" or "y"
`/[xyz]/` or `/[a-c]/` | A [[#xyz or a-c character class\|character class]]. Matches any one of the enclosed characters
`/[^xyz]` or `[^a-c]/` | A [[#xyz or a-c negated character class\|negated character class]] or complemented character class. That is, it matches anything that is not enclosed in the brackets
`/\n/` | Where "n" is a positive integer. A [[#n back reference\|back reference]] to the last substring matching the n parenthetical in the regular expression (counting left parentheses)

- - - 

### Quantifiers (Implemented)

Characters | Meaning
----------- | ---
`/x*/` | Matches the preceding item "x" [[#x 0 or more times\|0 or more times]]
`/x+/` | Matches the preceding item "x" [[#x 1 or more times\|1 or more times]]
`/x?/` | Matches the preceding item "x" [[#x 0 or 1 times\|0 or 1 times]]
`/x{n}/` | Matches exactly [[#x n n occurrences\|n occurrences]] of the preceding item x
`/x{n,}/` | Matches [[#x n at least n occurrences\|at least n occurrences]] of the preceding item x
`/x{n,m}/` | Matches [[#x n m at least n and at most m\|at least n and at most m]] occurrences of the preceding item x

- - - 

## Not Implemented

### Character Classes (Not Implemented)

Characters | Meaning
----------- | ---
`/\r/` | Matches a [[#r carriage return\|carriage return]]
`/\v/` | Matches a [[#v vertical tab\|vertical tab]]
`\/f\` | Matches a [[#f form-feed\|form-feed]]
`/[\b]/` | Matches a [[#b backspace\|backspace]]
`/\0/` | Matches a [[#0 NUL character\|NUL character]]
`/\cX/` | Matches a [[#cX control character\|control character]]
`/\xhh/` | Matches the character with the code `hh` ([[#xhh two hexadecimal digits\|two hexadecimal digits]]).
`/\uhhhh/` | Matches a UTF-16 code-unit with the value `hhhh` ([[#uhhhh four hexadecimal digits\|four hexadecimal digits]]).
`/\u{hhhh}/` or `\u{hhhhh}` | (Only when the `u` flag is set.) Matches the character with the Unicode value `U+hhhh` or `U+hhhhh` ([[#u hhhh or u hhhhh hexadecimal digits\|hexadecimal digits]]).
`\p{UnicodeProperty}`, `\P{UnicodeProperty}` | Matches a character based on its [[#p UnicodeProperty P UnicodeProperty Unicode character properties\|Unicode character properties]]

- - - 

### Groups and Ranges (Not Implemented)

Characters | Meaning
----------- | ---
`/(x)/` | [[#x capturing group\|Capturing group]]: Matches `x` and remembers the match
`\k<Name>` | A back reference to the last substring matching the [[#k Name back reference to named capturing group\|Named capturing group]] specified by `<Name>`
`/(?<Name>x)/` | [[#Name x named capturing group\|Named capturing group]]: Matches "x" and stores it on the groups property of the returned matches under the name specified by `<Name>`
`/(?:x)/` | [[#x non-capturing group\|Non-capturing group]]: Matches "x" but does not remember the match

- - - 

# Full Definitions

## Assertions

### Boundary-type Assertions (Full Definitions)

#### `^` (beginning of input)

- Matches the beginning of input. If the multiline flag is set to true, also matches immediately after a line break character. For example, `/^A/` does not match the "A" in "an A", but does match the first "A" in "An A".

> [!note] **Note:** This character has a different meaning when it appears at the start of a [[#Groups and Ranges Implemented|range]].

- - - 

#### `$` (end of input)

- Matches the end of input. If the multiline flag is set to true, also matches immediately before a line break character. For example, `/t$/` does not match the "t" in "eater", but does match it in "eat".

- - - 

#### `\b` (word boundary)

- Matches a word boundary. This is the position where a word character is not followed or preceded by another word-character, such as between a letter and a space. Note that a matched word boundary is not included in the match. In other words, the length of a matched word boundary is zero.

- Examples:
	-   `/\bm/` matches the "m" in "moon".
	-   `/oo\b/` does not match the "oo" in "moon", because "oo" is followed by "n" which is a word character.
	-   `/oon\b/` matches the "oon" in "moon", because "oon" is the end of the string, thus not followed by a word character.
	-   `/\w\b\w/` will never match anything, because a word character can never be followed by both a non-word and a word character.

- To match a backspace character (`[\b]`), see [[#Character Classes Implemented|Character Classes]].

- - - 

#### `\B` (non-word boundary)

- Matches a non-word boundary. This is a position where the previous and next character are of the same type: Either both must be words, or both must be non-words, for example between two letters or between two spaces. The beginning and end of a string are considered non-words. Same as the matched word boundary, the matched non-word boundary is also not included in the match. For example, `/\Bon/` matches "on" in "at noon", and `/ye\B/` matches "ye" in "possibly yesterday".

- - - 

### Other Assertions (Full Definitions)

#### `x(?=y)` (lookahead assertion)

- **Lookahead assertion:** Matches "x" only if "x" is followed by "y". For example, /`Jack(?=Sprat)/` matches "Jack" only if it is followed by "Sprat".  
- `/Jack(?=Sprat|Frost)/` matches "Jack" only if it is followed by "Sprat" or "Frost". However, neither "Sprat" nor "Frost" is part of the match results.

- - - 

#### `x(?!y)` (negative lookahead assertion)

- **Negative lookahead assertion:** Matches "x" only if "x" is not followed by "y". For example, `/\d+(?!\.)/` matches a number only if it is not followed by a decimal point. `/\d+(?!\.)/.exec('3.141')` matches "141" but not "3".

- - - 

#### `(?<=y)x` (lookbehind assertion)

- **Lookbehind assertion:** Matches "x" only if "x" is preceded by "y". For example, `/(?<=Jack)Sprat/` matches "Sprat" only if it is preceded by "Jack". 
- `/(?<=Jack|Tom)Sprat/` matches "Sprat" only if it is preceded by "Jack" or "Tom". However, neither "Jack" nor "Tom" is part of the match results.

- - - 

#### `(?<!y)x` (negative lookbehind assertion)

- **Negative lookbehind assertion:** Matches "x" only if "x" is not preceded by "y". For example, `/(?<!-)\d+/` matches a number only if it is not preceded by a minus sign. `/(?<!-)\d+/.exec('3')` matches "3". `/(?<!-)\d+/.exec('-3')` match is not found because the number is preceded by the minus sign.

- - - 

## Character Classes (Full Definitions)

#### `.` (single character)

- Matches any single character except line terminators: `\n`, `\r`, `\u2028` or `\u2029`. For example, `/.y/` matches "my" and "ay", but not "yes", in "yes make my day".
- Inside a character class, the dot loses its special meaning and matches a literal dot.

> [!note] Note that the `m` multiline flag doesn't change the dot behavior. So to match a pattern across multiple lines, the character class `[^]` can be used — it will match any character including newlines.

> [!note] ES2018 added the `s` "dotAll" flag, which allows the dot to also match line terminators.


- - - 

#### `\d` (digit)

- Matches any digit (Arabic numeral). Equivalent to `[0-9]`. For example, `/\d/` or `/[0-9]/` matches "2" in "B2 is the suite number".

- - - 

#### `\D` (not a digit)

- Matches any character that is not a digit (Arabic numeral). Equivalent to `[^0-9]`. For example, `/\D/` or `/[^0-9]/` matches "B" in "B2 is the suite number".

- - - 

#### `\w` (alphanumeric character)

- Matches any alphanumeric character from the basic Latin alphabet, including the underscore. Equivalent to `[A-Za-z0-9]`. For example, `/\w/` matches "a" in "apple", "5" in "$5.28", "3" in "3D" and "m" in "Émanuel".

- - - 

#### `\W` (not a word character)

- Matches any character that is not a word character from the basic Latin alphabet. Equivalent to `[^A-Za-z0-9]`. For example, `/\W/` or `/[^A-Za-z0-9]/` matches "%" in "50%" and "É" in "Émanuel".

- - - 

#### `\s` (single white space character)

- Matches a single white space character, including space, tab, form feed, line feed, and other Unicode spaces. Equivalent to `[ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]`. For example, `/\s\w*/` matches " bar" in "foo bar".

- - - 

#### `\S` (single character other than white space)

- Matches a single character other than white space. Equivalent to `[^ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]`. For example, `/\S\w*/` matches "foo" in "foo bar".

- - - 

#### `\t` (horizontal tab)

- Matches a horizontal tab.

- - - 

#### `\r` (carriage return)

- Matches a carriage return.

- - - 

#### `\n` (linefeed)

- Matches a linefeed.

- - - 

#### `\v` (vertical tab)

- Matches a vertical tab.

- - - 

#### `\f` (form-feed)

- Matches a form-feed.

- - - 

#### `[\b]` (backspace)

- Matches a backspace. If you're looking for the word-boundary character (`\b`), see [[#Assertions Implemented|Assertions]].

- - - 

#### `\0` (NUL character)

- Matches a NUL character. Do not follow this with another digit.

- - - 

#### `\cX` (control character)

- Matches a control character using [caret notation](https://en.wikipedia.org/wiki/Caretnotation), where "X" is a letter from A–Z (corresponding to codepoints `U+0001`–`U+001A`). For example, `/\cM\cJ/` matches "\r\n".

- - - 

#### `\xhh` (two hexadecimal digits)

- Matches the character with the code `hh` (two hexadecimal digits).

- - - 

#### `\uhhhh` (four hexadecimal digits)

- Matches a UTF-16 code-unit with the value `hhhh` (four hexadecimal digits).

- - - 

#### `\u{hhhh}` or `\u{hhhhh}` (hexadecimal digits)

- (Only when the `u` flag is set.) Matches the character with the Unicode value `U+hhhh` or `U+hhhhh` (hexadecimal digits).

- - - 

#### `\p{UnicodeProperty}`, `\P{UnicodeProperty}` (Unicode character properties)

- Matches a character based on its Unicode character properties ([Unicode character properties](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/RegularExpressions/UnicodePropertyEscapes)) (to match just, for example, emoji characters, or Japanese katakana characters, or Chinese/Japanese Han/Kanji characters, etc.).

- - - 

#### `\` (escaped)

- Indicates that the following character should be treated specially, or "escaped". It behaves one of two ways.

-   For characters that are usually treated literally, indicates that the next character is special and not to be interpreted literally. For example, `/b/` matches the character "b". By placing a backslash in front of "b", that is by using `/\b/`, the character becomes special to mean match a word boundary.
-   For characters that are usually treated specially, indicates that the next character is not special and should be interpreted literally. For example, “`*`” is a special character that means 0 or more occurrences of the preceding character should be matched; for example, `/a*/` means match 0 or more "a"s. To match `*` literally, precede it with a backslash; for example, `/a\*/` matches "a*".

> [!note] **Note:** To match this character literally, escape it with itself. In other words to search for `\` use `/\\/`.

- - - 

## Groups and Ranges (Full Definitions)

#### `x|y` (matches either)

- Matches either "x" or "y". For example, `/green|red/` matches "green" in "green apple" and "red" in "red apple".

- - - 

#### `[xyz]` or `[a-c]` (character class)

- A character class. Matches any one of the enclosed characters. You can specify a range of characters by using a hyphen, but if the hyphen appears as the first or last character enclosed in the square brackets it is taken as a literal hyphen to be included in the character class as a normal character.

- For example, `[abcd]` is the same as `[a-d]`. They match the "b" in "brisket", and the "c" in "chop".

- For example, `[abcd-]` and `[-abcd]` match the "b" in "brisket", the "c" in "chop", and the "-" (hyphen) in "non-profit".

- For example, `[\w-]` is the same as `[A-Za-z0-9_-]`. They both match the "b" in "brisket", the "c" in "chop", and the "n" in "non-profit".

- - - 

#### `[^xyz]` or `[^a-c]` (negated character class)

- A negated or complemented character class. That is, it matches anything that is not enclosed in the brackets. You can specify a range of characters by using a hyphen, but if the hyphen appears as the first or last character enclosed in the square brackets it is taken as a literal hyphen to be included in the character class as a normal character. For example, `[^abc]` is the same as `[^a-c]`. They initially match "o" in "bacon" and "h" in "chop".

> [!note] **Note:** The `^` character may also indicate the [[#Assertions Implemented|beginning of input]].

- - - 

#### `(x)` (capturing group)

- **Capturing group:** Matches `x` and remembers the match. For example, `/(foo)/` matches and remembers "foo" in "foo bar".

- A regular expression may have multiple capturing groups. In results, matches to capturing groups typically in an array whose members are in the same order as the left parentheses in the capturing group. This is usually just the order of the capturing groups themselves. This becomes important when capturing groups are nested. Matches are accessed using the index of the result's elements (`[1], ..., [n]`) or from the predefined `RegExp` object's properties (`$1, ..., $9`).

- Capturing groups have a performance penalty. If you don't need the matched substring to be recalled, prefer non-capturing parentheses (see below).

- [String.match()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match) won't return groups if the `/.../g` flag is set. However, you can still use [String.matchAll()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/matchAll) to get all matches.

- - - 

#### `\n` (back reference)

- Where "n" is a positive integer. A back reference to the last substring matching the n parenthetical in the regular expression (counting left parentheses). For example, `/apple(,)\sorange\1/` matches "apple, orange," in "apple, orange, cherry, peach".

- - - 

#### `\k<Name>` (back reference to named capturing group)

- A back reference to the last substring matching the **Named capture group** specified by `<Name>`.

- For example, `/(?<title>\w+), yes \k<title>/` matches "Sir, yes Sir" in "Do you copy? Sir, yes Sir!".

> [!note] **Note:** `\k` is used literally here to indicate the beginning of a back reference to a Named capture group.

- - - 

#### `(?<Name>x)` (named capturing group)

- **Named capturing group:** Matches "x" and stores it on the groups property of the returned matches under the name specified by `<Name>`. The angle brackets (`<` and `>`) are required for group name.

- For example, to extract the United States area code from a phone number, we could use `/\((?<area>\d\d\d)\)/`. The resulting number would appear under `matches.groups.area`.

- - - 

#### `(?:x)` (non-capturing group)

- **Non-capturing group:** Matches "x" but does not remember the match. The matched substring cannot be recalled from the resulting array's elements (`[1], ..., [n]`) or from the predefined `RegExp` object's properties (`$1, ..., $9`).

- - - 

## Quantifiers (Full Definitions)

#### `x*` (0 or more times)

- Matches the preceding item "x" 0 or more times. For example, `/bo*/` matches "boooo" in "A ghost booooed" and "b" in "A bird warbled", but nothing in "A goat grunted".

- - - 

#### `x+` (1 or more times)

- Matches the preceding item "x" 1 or more times. Equivalent to `{1,}`. For example, `/a+/` matches the "a" in "candy" and all the "a"'s in "caaaaaaandy".

- - - 

#### `x?` (0 or 1 times)

- Matches the preceding item "x" 0 or 1 times. For example, `/e?le?/` matches the "el" in "angel" and the "le" in "angle."

- If used immediately after any of the quantifiers `*`, `+`, `?`, or `{}`, makes the quantifier non-greedy (matching the minimum number of times), as opposed to the default, which is greedy (matching the maximum number of times).

- - - 

#### `x{n}` (n occurrences)

- Where "n" is a positive integer, matches exactly "n" occurrences of the preceding item "x". For example, `/a{2}/` doesn't match the "a" in "candy", but it matches all of the "a"'s in "caandy", and the first two "a"'s in "caaandy".

- - - 

#### `x{n,}` (at least n occurrences)

- Where "n" is a positive integer, matches at least "n" occurrences of the preceding item "x". For example, `/a{2,}/` doesn't match the "a" in "candy", but matches all of the a's in "caandy" and in "caaaaaaandy".

- - - 

#### `x{n,m}` (at least n and at most m)

- Where "n" is 0 or a positive integer, "m" is a positive integer, and `m > n`, matches at least "n" and at most "m" occurrences of the preceding item "x". For example, `/a{1,3}/` matches nothing in "cndy", the "a" in "candy", the two "a"'s in "caandy", and the first three "a"'s in "caaaaaaandy". Notice that when matching "caaaaaaandy", the match is "aaa", even though the original string had more "a"s in it.

`x*?`  
`x+?`  
`x??`  
`x{n}?`  
`x{n,}?`  
`x{n,m}?`

- By default quantifiers like `*` and `+` are "greedy", meaning that they try to match as much of the string as possible. The `?` character after the quantifier makes the quantifier "non-greedy": meaning that it will stop as soon as it finds a match. For example, given a string like "`some <foo> <bar> new </bar> </foo> thing`":

-   `/<.*>/` will match "<foo> <bar> new </bar> </foo>"
-   `/<.*?>/` will match "`<foo>`"

- - -

I-d-as · June 6, 2022, 10:49am

Edit: I decided to add just add the text in a codeblock in the original post above. Thanks.

Archie · June 6, 2022, 11:40am

thank you for sharing, that can be helpful. is it possible in obsidian core right now or needs a plugin?

I-d-as · June 6, 2022, 11:54am

You’re welcome.

No need for any plugins. It is basically an exact clone of the documentation for JavaScript regex that the Obsidian developers linked to from the help. I simply packaged it so it was nicely viewable in Obsidian with working internal heading links. One difference is that I shortened the Meaning column to a more minimal description, and linked to the Full Description at the bottom. And like I mentioned, I separated the items by those that are and are not currently implemented in Obsidian Search based on my tests. Hopefully I didn’t make any typos.

One thing I probably should mention, even though it is quite minor, is that I had to adapt how I formatted the [x|y] in the Groups and Ranges table, since the pipe wanted to interrupt the table formatting. It still displays fine, but copy pasting it could introduce a stray backtick or two as well as a backslash before the pipe.

The best thing I can recommend is to just follow the link at the top to the website. Then you can compare that with what you see when you copy the text into a single note in Obsidian. Let me know if you see any errors. I was quite careful, so hopefully there aren’t any.

Thanks.

Archie · June 6, 2022, 11:57am

I have to read it in due time, my guess is it is based on js regex language of not the exact same thing

I-d-as · June 6, 2022, 12:06pm

You are exactly right, I think. This was kind of my reasoning for creating it. I was trying to learn regex for Obsidian and was thrown off when just certain syntax wouldn’t work. So, I separated those into the Not Implemented section. If you stick to the Implemented section, all of the syntax works. I imagine I am not the only one trying to learn this specifically for use in Obsidian.

It also serves the purpose of creating a baseline with which to compare Obsidian regex search behavior with other uses. For example, I would be very happy to be informed that an item I placed in the Not Implemented section actually is implemented. Anyways, when and if you take a closer inspection, please share any discrepancies you spot.

Thanks!

Archie · June 6, 2022, 12:16pm

Sure thing, I will look at it soon, thank you again

I-d-as · June 6, 2022, 6:05pm

@Archie @CawlinTeffid @scholarInTraining : No need to respond, but since you may have copied the note earlier, I just wanted to mention that I have been and will continue to incrementally edit the text as I inevitably notice additional minor discrepancies between the source website and how the content displays in Obsidian. Much of it stems from the behavior of characters like backticks, angle brackets, backslashes, and the like.

It’s still my fault, so I wanted to offer an honest apology for posting this before ironing out all the kinks. Fortunately, having posted it here has inspired me to continue to better proofread it. As I noted above, if anyone notices any inconsistencies, I would very much appreciate a heads up!

CawlinTeffid · June 6, 2022, 9:27pm

Cool, thanks, I just saved the link to examine later.

It might be easier just to copy the HTML from the source and edit that instead of dodging around Markdown.

Archie · June 7, 2022, 7:37am

Ok, so please notify us at the time updates to check it out

I-d-as · June 13, 2022, 7:46am

@Archie No problem. I’ve updated it a few times with very minor changes, but will surely add another post here if I realize any major issues and make further updates.

Thanks.

system · September 11, 2022, 7:47am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.