I need help with Text manipulation in Obsidian

PaulJi · December 10, 2024, 12:31am

First, search the help docs a

I’ve copy pasted a book from a searchable pdf file. It comes in great including page numbers. Would like to know how to find the word “Page” and replace with a line space and bolded “Page”. Better still would be “Page and the numbers up. to three digits” and the replace with a line space withe Page and the page mumbers.

I’ve copy pasted Page into the replace text box.

Would like to know what regex text to make this happen.

Yurcee · December 10, 2024, 1:19am

Can I ask you to add example lines BEFORE and AFTER so we know what to target?

PaulJi · December 10, 2024, 4:34am

Hi Yurcee,
I’ve attached two files. “Pasted from PDF.png” and “Find and Replace with line space and Bold Text.png”.

I edited a pdf file with a pdf editor adding consecutive footnotes pages (Page 1,2,3 …etc.). “Pasted from PDF” shows a sample of the text pasted as a new note within Obsidian.

In this sample, I would like to add a line space after the text “I should return straight to the hill station where I was”

In addition I would like to bold the text “Page + the consecutive page numbers” that show in Obsidian. The result would be as indicated in “Find and Replace with line space and Bold text”.

Find and Replace with line space and Bold Text

Thanks for your help.

Yurcee · December 10, 2024, 11:22am

Okay, original text:

...some text is here on and on, yada yada, still going on
Page 149
YADA YADAW - IT'S YADA YADA TIME, BABY
and now we're going to the next page.
Yada yada, on and on...

Capture Page with number with: ([pP]age\s[0-9]{1,4}) (assuming we can have page with a small p as well, but it can be removed from the brackets)

Replace with: \n**$1**\n (here the last \n can be removed if needed)

Result:

...some text is here on and on, yada yada, still going on

**Page 149**

YADA YADAW - IT'S YADA YADA TIME, BABY
and now we're going to the next page.
Yada yada, on and on, until the break of dawn...

There have been some tutorials (I find one of them now here) put up here on the forum in the past how to handle text pasted into txt/md files and how to remove line breaks and trailing spaces, and there the Apply Patterns plugin is mentioned. But you can manipulate text outside of Obsidian with a regex-capable text editor of your choice (be it Notepad++, Sublime Text, VS Code, etc.).

If you condense your texts with the method mentioned in the post, then you’d need to use for replacement \n\n**$1**\n\n as the final step.

Explanation of regular expression:
[pP] - chararcter class within square brackets means we allow only small and large p’s here
\s - whitespace character
[0-9]{1,4} - numbers from 0-9999
\n - new line character
$1 - backreference for what we’ve put in round brackets: the whole of ([pP]age\s[0-9]{1,4})

If there are no Page string in your texts, it’s tricky and well nigh impossible, as we cannot differentiate between any numbers relating to counting, dates (e.g. 234 years ago, 50 cows, etc.). Then some robot can be prompted to clean up text based on language model data. E.g. Copilot plugin can be used with free Google Gemini.

PaulJi · December 12, 2024, 2:11am

Yuchee,
Your explanation makes sense and is awesome. I must be doing something wrong because the \n command doesn’t return a new line.

I’ve dragged a pdf file into Obsidian. This pdf had page numbers added in PDF Gear editor as footnotes.
Used the Text Extractor plugin to Extract text into a new note. The text in this new note had no line breaks.
Used Regex Find/Replace plugin. Added ([pP]age\s[0-9]{1,4}) to Find and \n**$1**\n\n to Replace.
Instead of the Page + page number moving to a new line the result showed " \n Page + page number)\n\n"

ie. …some text is here \nPage 125\n\n more text follows but no line return…

Would appreciate any suggestions, tweaks or even a different approach that would give me the result that you were able to obtain.

Yurcee · December 12, 2024, 6:44am

Some plugins don’t accept \n characters because the logic is not built into them.
I’d advise using alternative means. Apply Patterns (tutorial in the linked thread) or a non-Obsidian program that can be installed on your platform (I listed three).

Alternatively, you can install VSCode Editor within Obsidian.
Copy the contents of your note to the clipboard.
Open a code file with the icon on the ribbon. Window pops up.

You can add any name or leave My code file name as well, and you can leave ts as the file extension → Create.
Paste the contents into the file and press CTRL+H for the Search and Replace bars to appear. Then press the .* icon that switches on regex.
Copy and paste the search and replace lines given. Then you get:

Press the icon for Replace All on the bottom.
Result:

You cut all text and copy it back to your markdown note. Close the code file and delete it as well.

Obsviously your text will be longer with more results and replacements.

You can use this plugin. Apply Patterns is a bit trickier to set up.
You can get a javascript code written for common everyday tasks with Templater as free ChatGPT can now write it. Then may come later.

It doesn’t even have to be pure javascript, actually, as Templarer has functions built in. See how they can be writtten:

EDIT:
I changed the title of the thread to better reflect what was talked about here.

PaulJi · December 14, 2024, 5:00pm

Yuchee, Thank you so much for your help. The VSCode method works great. I was surprised how fast the conversion process takes place.

Yurcee · December 14, 2024, 5:40pm

Welcome.

Text Finder is another plugin that supports regex with \n in search and replace bars. No need to copy to different files with that one.

Of course, when you need to make this conversion many times, you would need a method to save these, so Apply Patterns would be best, or a Templater js script you can easily invoke.

PaulJi · December 21, 2024, 8:19pm

Hi Yurcee, I just noticed I missed the fact that in General Settings of the community plugin “Regex Find/Replace” one can turn on the process \n setting in the replace field. This will insert a ‘line break’. The \n**$1**\n works fine.

The nice thing about this plugin is that it seems to remember the last regex imput you’ve added making it simple to reuse the commands.

Yurcee · December 22, 2024, 4:14am

Yes, I now remember it retains the last used pattern. Didn’t know about the setting. But when you have 5-10 patterns for other clean up jobs (say, you have annotations coming in from Zotero), it is better to save them all (Apply Patterns maybe in conjunction with Commander, or Templater snippets), with telling names, maybe mention them in a how-to note even so you can train yourself on the job you are trying to do often.

system · March 22, 2025, 4:14am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.