Understanding Web Clipper schema.org data parsing

What I’m trying to do

Using the new Web Clipper, I am trying to create a template to capture recipes into a cookbook which I’ve been building in Obsidian.

The pages I’m trying to clip seemingly contain schema.org Recipe data, but I don’t understand why the Web Clipper only shows schema variables on some pages.

Things I have tried

I’ve been using these two pages to investigate:
This page shows schema variables in the web clipper: Best Two-Hour Turkey Recipe
This page does not show schema variables in the web clipper: Best Brown Butter-Cardamom Banana Bread Recipe - How to Make Brown Butter-Cardamom Banana Bread

What I’ve noticed is that both pages have scheme sections in their source. And both schema sections seem to have properties which match the properties listed at Recipe - Schema.org Type.

I’m not sure what to look at next. My guess is that the page which doesn’t have its schema data parsed contains source code which is somehow invalid or doesn’t match the Web Clipper’s parsing rules. However, I’m not familiar with most web programming languages, so an obvious error wouldn’t stick out to me.

Shoot. The title should read “Parsing.” Sorry for the inadvertent borderline vulgarity.

2 Likes

Well, from the very little I know about the use of schema.org markup, I’d say that it’s “at the website owner discretion” :blush: … So, some website will use it extensively, others not so much or even not at all :woman_shrugging:

The clipper only return what it can find on the page of a website… Nothing more :blush:

Meaning that if it returns only few schema.org properties, it means that this is what the website provided (and there aren’t more)…

I took the liberty of correcting that for you :smile:

1 Like

The pages seem to have been authored differently (at different times, perhaps?). The turkey page has lots of accessible variables; the bread page doesn’t. :person_shrugging:


1 Like

Thanks for the insights and for taking a look at my reference pages. Thanks @Pch for fixing my title :smile:

There is definitely a difference between the ways that the pages are constructed. What I’m trying to sort out is why one page (and many others on the same site) have the schema variables in the pages source code, but they’re not parse.

This is is from the source code of the page which show schema variables in the web clipper: Best Brown Butter-Cardamom Banana Bread Recipe - How to Make Brown Butter-Cardamom Banana Bread

  <script type="application/ld+json">
  {
  "@context": "http://schema.org",
  "@type": "Recipe",
  "name": "Brown Butter-Cardamom Banana Bread",
  "description": "For a more flavorful banana bread without more effort, we paired one of banana’s most complementary spices, cardamom, with nutty browned butter. Blooming the...",
  "image": "https://www.177milkstreet.com/assets/site/_small/Brown-Butter-Cardamom-Banana-Bread.jpg",
    "totalTime": "PT1H15M",
  "cookTime": "PT1H15M",
  "recipeYield": "1 9-inch Loaf",
  "recipeCategory": "Desserts",

Based on the Web Clipper Troubleshooting info, I might need to delve into how the Readability library works to understand what’s going wrong.

From https://validator.schema.org/, the “Banana Bread” recipe seems to be valid :thinking:

But after refreshing (ignoring cache) the “Banana bread” recipe webpage, I got some errors in the console when trying to clip the webpage

Error parsing schema.org data: SyntaxError: Bad control character in string literal in JSON at position 1556 (line 40 column 85)
    at JSON.parse (<anonymous>)
    at content.js:1:33360
    at NodeList.forEach (<anonymous>)
    at t (content.js:1:33146)
    at content.js:1:34052
    at d.reject (content.js:1:8183)
Show 4 more frames
Problematic JSON content: {
  "@context": "http://schema.org",
  "@type": "Recipe",
  "name": "Brown Butter-Cardamom Banana Bread",
  "description": "For a more flavorful banana bread without more effort, we paired one of banana’s most complementary spices, cardamom, with nutty browned butter. Blooming the...",
  "image": "https://www.177milkstreet.com/assets/site/_small/Brown-Butter-Cardamom-Banana-Bread.jpg",
    "totalTime": "PT1H15M",
  "cookTime": "PT1H15M",
  "recipeYield": "1 9-inch Loaf",
  "recipeCategory": "Desserts",
  "isAccessibleForFree": "False",
    "hasPart": [
      {
        "@type": "WebPageElement",
        "isAccessibleForFree": "False",
        "cssSelector": ".ingredient"
      }, {
        "@type": "WebPageElement",
        "isAccessibleForFree": "False",
        "cssSelector": ".recipe__directions"
      }
    ],
  "recipeIngredient": [
      "113 grams (8 tablespoons) salted butter, plus more for the pan",
      "260 grams cups (2 cups) all-purpose flour, plus more for the pan",
      "1 teaspoon baking powder",
      "1 teaspoon baking soda",
      "½ teaspoon table salt",
      "1¼ teaspoons ground cardamom",
      "533 grams (2 cups) mashed ripe bananas (from 4 or 5 large bananas)",
      "149 grams (3/4 cup packed) dark brown sugar",
      "2 large eggs",
      "2 teaspoons vanilla extract",
      "1 tablespoon white sugar (optional)" 
        ],
  "recipeInstructions": [
                {
          "@type": "HowToStep",
          "name": "",
          "text": "Heat the oven to 350°F with a rack in the upper-middle position. 

Mist a 9-by-5-inch loaf pan with cooking spray. In a large bowl, whisk together the flour, baking powder, baking soda and salt. "
      } ,
                {
          "@type": "HowToStep",
          "name": "",
          "text": "In a medium saucepan over medium heat, melt the butter. Once melted, continue to cook, swirling the pan often, until the butter is fragrant and deep brown, 2 to 3 minutes.

 Remove the pan from the heat and immediately whisk in the cardamom. "
      } ,
                {
          "@type": "HowToStep",
          "name": "",
          "text": "Carefully add the bananas (the butter will sizzle and bubble up) and whisk until combined. 

Add the brown sugar, eggs and vanilla, then whisk until smooth. 

Add the banana mixture to the flour mixture and, using a silicone spatula, fold until just combined and no dry flour remains."
      } ,
                {
          "@type": "HowToStep",
          "name": "",
          "text": "Transfer the batter to the prepared pan and sprinkle evenly with the white sugar, if using. 

Bake until the loaf is well browned, the top is cracked and a toothpick inserted at the center of the loaf comes out clean, 50 to 55 minutes, rotating the pan halfway through. 

Cool the bread in the pan on a wire rack for 15 minutes, then turn out the loaf and cool completely before serving. 

Cooled bread can be wrapped tightly and stored at room temperature for up to 4 days or refrigerated for up to 1 week."
      }  
      ]}

The only difference I can spot between the 2 recipes is that there are newlines in the text of the steps of the Banana bread recipe… but not in the other one…


Edit: For the “Banana bread” recipe, the JSON-LD Playground definitively didn’t like the newlines within any of the schema:@Recipe:recipeInstructions[*].text

1 Like

Thank you for your investigation and the validation references. After work today, I will spend some more time with them.
Since many of the recipes that I’d like to clip are from this publication, I will send them an email and ask if they could fix their formatting. I don’t know if that’s part of what I get from them as a subscriber, but it’s worth a shot.
A better solution is probably to dig into whatever Readability uses to parse JSON-LD and see if it can be updated to strip disallowed characters (assuming that newlines aren’t allowed by this format).

1 Like