Removing Carriage Returns and Line Breaks
A common problem when gathering information from PDFs and other sources is that when we paste our copied text into our apps and programs, we see that the lines don’t run out nicely to the end but are cut off. It is because we copied the line breaks in the process, in spite of ourselves, without knowing.
- A possible solution is trying to copy and paste differently: paste with or without formatting, as plain text; but that might not always be feasible.
Although there might be community plugins out there that can help with cleaning up our text, it’s better to take things in our own hands for more than one reason:
- We can cut down on clutter caused by too many community plugins installed (“which was that thingy that did this…?”);
- We can further customize clean-up jobs based on our use cases, languages, etc.
- We can learn a new trick and take our own initiative next time around.
We are going to employ regular expressions for our text clean-ups.
Templater method
Install community plugin Templater, set where you keep your templates, etc. Basic stuff. Create a new note in your Templates folder. E.g. “Clean up Template”.
Copy and paste this there:
<%*
const editor = app.workspace.activeLeaf.view.editor;
// Put your rules here
function applyRules(text) {
const rules = [
{
from: /\s+$/gm,
to: "",
},
{
from: /(\r\n)+|\r+|\n+/gm,
to: " ",
}
];
for (const rule of rules) {
text = text.replace(rule.from, rule.to);
}
return text;
}
// The text selected before running the template
const selText = editor.getSelection() || '';
// Effecting changes
const modifiedText = applyRules(selText);
editor.replaceSelection(`${modifiedText}`);
%>
In Templater, assign a hotkey to this template. You’ll be taken to Obsidian Hotkeys. Now this template will also be available as a command, should you want to put it on the Command Palette, Editing Toolbar, Mobile Toolbar, etc.
A variation also deletes hyphens, if in your language there are a lot of long words and words are often separated:
<%*
const editor = app.workspace.activeLeaf.view.editor;
// Put your rules here
function applyRules(text) {
const rules = [
{
from: /\s+$/gm,
to: "",
},
{
from: /(\r\n)+|\r+|\n+/gm,
to: " ",
},
{
from: /([a-zžáàäæãééíóöőüűčñßðđŋħjĸłß])(-\s{1})([a-zžáàäæãééíóöőüűčñßðđŋħjĸłß])/gm,
to: "$1$3",
}
];
for (const rule of rules) {
text = text.replace(rule.from, rule.to);
}
return text;
}
// The text selected before running the template
const selText = editor.getSelection() || '';
// Effecting changes
const modifiedText = applyRules(selText);
editor.replaceSelection(`${modifiedText}`);
%>
- This will also join compound words that are not supposed to be spelled without hyphens, so you need to proofread your text for any mistakes. Or just use the first template without the extra rule.
You can easily add more rules over time, if you know some regex or find something off of the internet or from a chat robot.
What the script does
Apart from the comments in the script, what this does:
- Sets the work area as active leaf.
- In a function, does the rule replacements in a loop.
- Look how the from (“match”) and to (“replace”) rules follow one another. You can add more.
- Using two variables we make the modifications and replace text in the editor.
Regex rules and what they do
Rule 1: Deletes any trailing whitespaces from the end of the lines.
Rule 2: Exchanges carriage returns/hard line breaks with a space character, effectively making your text flow continuously.
Rule 3: If there is a hyphen and a space between letters, it deletes them.
Apply Patterns plugin method
Again, install the plugin. Name the pattern like before: “Clean up text” or something. You add the rules – the same ones as above – one by one. Don’t forget to set global and multiline switches on each of them.
Rule one:
Matching text: \s+$
Replacement: `` (nothing; leave box empty)
Rule two:
Matching text: (\r\n)+|\r+|\n+
Replacement:
(one space)
Rule three:
(if you need it)
Matching text: ([a-zžáàäæãééíóöőüűčñßðđŋħjĸłß])(-\s{1})([a-zžáàäæãééíóöőüűčñßðđŋħjĸłß])
Replacement: $1$3
Scroll down down to the Commands section and name the command the same name. In the Pattern Name Filter, you add the same name again you set above in the Patterns section: “Clean up text” or whatever.
Here you have different command options: you can perform regex replacements on selected text (like above), or the whole file you have open. It’s better to stick with selection here as well: tick Apply to Selection
.
Now disable and re-enable the Apply Patterns plugin to and now the command will be available on the Palette. You can bind a hotkey to it.
How to use
Select all (only) the text you want to perform cleanup operations on. Run the Templater template or the Apply Patterns command.
If you accidentally shrank text you did not want to, hit CNTRL+Z
and select only the text that needs clean up treatment.
You can see why selecting the text manually is a better idea:
Script is inspired by AlanG’s post.