Copy pasting from Google docs looses formatting and wraps text in "**"

Steps to reproduce

  • Create custom Google document
  • Add formatted text to it
  • Create an empty note in Obsidian
  • Copy text from Google doc and paste to Obsidian

Did you follow the troubleshooting guide? [Y]

Expected result

I expect that the copied text preserves original formatting (except underlined text)

Actual result

  • The pasted text is wrapped with ** from both sides
  • The formatting of the final text is completely lost.
  • When copying list of items there is an additional empty line added between each line, at the beginning and at the end.

Environment

SYSTEM INFO:
Obsidian version: v1.9.12
Installer version: v1.8.9
Operating system: Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:29 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T6000 24.6.0
Login status: logged in
Language: en
Catalyst license: supporter
Insider build toggle: on
Live preview: on
Base theme: adapt to system
Community theme: none
Snippets enabled: 0
Restricted mode: off
Plugins installed: 10
Plugins enabled: 6
1: Dataview v0.5.68
2: Highlightr v1.2.2
3: PlantUML v1.8.0
4: Commander v0.5.4
5: Kindle Highlights v1.9.2
6: Canvas Mindmap v1.0.2


Additional information

Below you can find different examples of copy-pasting text from Google docs.

Text with different formatting

Original

Result

Bold text (the same for italic text)

Original

(can not include because of posting limits)

Result

List

Original

Result

There was a similar bug created but it has been closed for inactivity: “Pasting formatted text gives double line breaks / double new line” (can not include link - posting limits, sorry).
There it was proposed to use copy-paste key combination which pastes unformatted text (Ctrl (Cmd) + Shift + V). In this case there is no wrapping ** but also there is no formatting at all. This is not a solution but a workaround.

I understand that underlined text is pasted unformatted because vanilla Obsidian doesn’t support that (that weird that you need the whole plugin just for that because Markdown designers were inspired by pure HTML but this is a separate story) but the remaining text should work.

You may disagree with me but I find it quite important because there are many people who work with Google docs and fixing formatting issues every time is quite time consuming and annoying.
I think it would be really helpful if you could fix this problem.

This is not an obsidian bug and depends on what Gdocs puts on the clipboard.
At beast, it’s a feature request.

There’s also File->Download->Markdown in Gdocs.

This is a bit of a long shot, but you might try changing the doc’s page setup and copying from there in case the setups behave differently.

This is most likely because copying from google docs puts all styling (bold, italics), etc… as inline css styling in the HTML fragment (instead of using semantic HTML)

AFAIK turndown (what obsidian uses to convert HTML to markdown) ignores inline CSS styles by default (and in any case, Obsidian likely performs HTML sanitization when converting HTML to markdown that likely discards inline styles).

For example, go to the Turndown demo page: Turndown Demo

paste this HTML fragment where bold is defined as semantic HTML:

<html>
<body>
<!--StartFragment--><p><strong>Node:</strong> JavaScript Web Server</p><!--EndFragment-->
</body>
</html>

You’ll see it’s converted to bold markdown as expected.

Whereas this is what you’ll get if you copy that text from google docs:

<html>
<body>
<!--StartFragment--><meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-98f3f905-7fff-cfe1-23eb-dceb1f7b4995"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Node:</span><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;"> JavaScript Web Server</span></b><!--EndFragment-->
</body>
</html>

In this case, you’ll see that it’s not converted to bold markdown by turndown, because the bold styling is done purely through inline css styles

Thank you for you answer.

Perhaps, but this doesn’t happen with other tools like Notion, Sublime text, etc. I could assume that they process input from Gdocs differently. If this is the case then why Obsidian can not do it?

I would not call downloading Markdown from Gdocs as a solution. This is the same type of workarounds as copy-pasting without formatting, use other tool as a buffer and then recopy from there to Obsidian, etc. Also this approach breaks workflows of other tools like Clipper.
I can offer more workarounds how to solve this problem. What I would like to have is that I don’t have to apply workarounds and just copy paste the text.

Thank you for your example!
Indeed copying from google docs use <b> tag with inline CSS styles. This is an example I took from the clipboard

<meta charset='utf-8'>
<meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-123">
    <span
            style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">This is a </span>
    <span
            style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">normal</span>
    <span
            style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;"> text and </span>
    <span
            style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:italic;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">italic</span>
    <span
            style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;"> text</span>
</b>

At least this explains where the surrounding ** are coming from. Obsidian thinks this is a bold text and converts <b> to **. Perhaps, because the code which analyzes the pasted code already “decided” that everything is bold there is no need to check the style of items inside.

Yeah google docs appears to be generating invalid HTML here

For some reason google docs wraps the entire text (not just the bold text) in <b></b> tags which doesn’t seem like valid semantic HTML.

For example in google docs I have:

And resulting clipboard when copying is (I removed the inline css styles for clarity):

<html>
   <body>
      <!--StartFragment-->
      <meta charset="utf-8">
      <b>
         <p dir="ltr">Test Doc</p>
         <p dir="ltr">Test Heading</p>
         <ol>
            <li dir="ltr" aria-level="1">
               <p dir="ltr" role="presentation">Test</p>
            </li>
            <li dir="ltr" aria-level="1">
               <p dir="ltr" role="presentation">List</p>
            </li>
            <ol>
               <li dir="ltr" aria-level="2">
                  <p dir="ltr" role="presentation">ABC</p>
               </li>
               <li dir="ltr" aria-level="2">
                  <p dir="ltr" role="presentation">DEF</p>
               </li>
            </ol>
         </ol>
         <p dir="ltr">&quot;Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.&quot;</p>
      </b>
      <!--EndFragment-->
   </body>
</html>

Only one line of text is bold, so wrapping the entire HTML fragment content in <b> tags doesn’t make any sense, and turndown doesn’t know what to do with that. This doesn’t cause problems when pasting into some Rich text editors, because the inline styles will take precedence over the semantic HTML tags. It effects turndown because it ignores inline styles and relies on semantic HTML.

The HTML for the list also seems to be invalid, I get:

<ol>
  <li>Test</li>
  <li>List</li>
  <ol>  <li>ABC</li>
    <li>DEF</li>
  </ol>
</ol>

When valid HTML would be (a sub-list should be placed inside a list item (<li>) of its parent list):

<ol>
  <li>Test</li>
  <li>List
    <ol>  <li>ABC</li>
      <li>DEF</li>
    </ol>
  </li>
</ol>

So turndown also can’t convert the nested list correctly

EDIT: Google docs crazy non-standard HTML seems to be a well-known issue: Simple Copy Paste from google docs yields bold text · Issue #459 · ProseMirror/prosemirror · GitHub (prosemirror isn’t anything related to obsidian, but if you scroll to the bottom you’ll see many other projects encountering the same problem in the linked issues)

EDIT2: Found an excellent rant about this lol: Pasted stuff from Google Docs is always BOLD! WHY!? - Adam Coster

2 Likes

I haven’t tried this plugin myself, but this plugin might help:

Of note it can do custom regex cleanup when pasting, e.g. one of the examples is converting google docs inline css styles to semantic formatting:

  1. Convert Google Docs span styles to semantic elements
  • Pattern: <span style="font-weight:\s*bold[^"]*">(.*?)</span>
  • Replacement: <strong>$1</strong>
  • Description: Converts Google Docs styled spans to proper HTML elements

If that regex pattern alone doesn’t fix it the ** above/below the pasted text, you might be able to add some regex in the post processing stage (Markdown Regex Replacements) to remove ** if it’s on its own line with no other content

2 Likes

Thank you!

I was able to fix problems with this plugin.

Solution

For the history this is a list of regex expressions you can add to the plugin to fix the problem

Remove “**”

Regex: <b\s+style="font-weight:normal;"[^>]*>(.*?)</b>
Replacement: $1

Reformat bold text

Regex: <span\s+style="[^"]*font-weight:700[^"]*">(.*?)</span>
Replacement: <strong>$1</strong>

Reformat italic text

Regex: <span\s+style="[^"]*font-style:italic[^"]*">(.*?)</span>
Replacement: <i>$1</i>

Also you need to enable Remove empty lines setting in the plugin to remove empty lines in between the pasted lines.

If you add all three regex patterns then the pasted content will match the original content.

P.S. I am greatly thankful to plugin author. It really helped solving the problem. However we all know that open source requires maintenance. I hope that plugin author will continue to maintain and update the plugin. Google docs is a very popular product which used by so many people in different areas. I think it would be great if Obsidian developers convert this plugin (or create a separate Google doc compatibility plugin) as a core plugin.