How to extract the lines word and image containing a specific tag with dataviewjs

PencilMing · January 24, 2024, 8:34am

Extract the lines word and image containing a specific tag

I got a JS code from online that can locate and extract the lines containing a specific tag in the text, but in practice I found it cannot display images. My attempts to fix it using GPT were unsuccessful, so I have to seek help from the forum.

The expected effect is shown in the image. I hope it can detect tags line by line and split the content for separate display.

The source code is as follows:

From my understanding, I don’t know if it’s possible to detect the tag before “!” and put it in the tag column of the dataview table. Put the image between “!” and “]]” in the dataview image column for display. And put the text after in the remarks column. Can it achieve my expected effect?

Things I have tried

The attempted solution can only achieve the following effect:

The current code is:

// 整个代码是先筛选符合标签的笔记，再筛选笔记中的行内容是否具有标签，然后将文件名和行显示在表中。 
// 参考用//注释开关指定查询范围
//tag1限定笔记文件，tag2限定行内容

// 获取指定文件夹中的所有md文件,支持二级目录
const files = app.vault.getMarkdownFiles().filter(file => file.path.includes("日记"))
// 预设标签1，从而限定笔记文件
const tag = "#转场"
// 将指定文件夹中,带有标签1的文件筛选出来
const taggedFiles = new Set(files.reduce((acc, file) => {
    const tags = app.metadataCache.getFileCache(file).tags
    if (tags) {
      let filtered = tags.filter(t => t.tag === tag)
      if (filtered) {
        return [...acc, file]
      }
    }
    return acc
}, []))

//创建一个包含文件名和包含所需标记的行的数组
let arr = files.map(async(file) => {
  // 读取限定后的文件，并将其所有内容作为字符串获取
  const content = await app.vault.cachedRead(file)
  //将所有内容转换为行的数组，并提取包含标签“#tag2”的行 
  let lines = await content.split("\n").filter(line => line.includes("#转场")) 
    // 删除行中的标记和前置空格,以便于在表中显示。其中“-”是为了删除行中的“-”，“#tag2”是为了删除行中的“#DEF”，“  ”是为了删除行中的空格。
  for(var i=0; i < lines.length; i++) { 
    lines[i] = lines[i].replace(/- /g, '');
    lines[i] = lines[i].replace(/  /g, '');
    lines[i] = lines[i].replace("#转场", '');
  }
  //删除被上步骤替换后形成的空行
  console.log(lines)
  // 返回包含标记的行的文件名和行的数组 
  return ["[["+file.name.split(".")[0]+"]]", lines]
})

//解析promises并构建表 
Promise.all(arr).then(values => {
console.log(values)
// 过滤掉没有“#tag2”的行
const exists = values.filter(value => value[1][0])
//构建表，显示文件名和行 
dv.table(["File", "Lines"], exists)
})

I would appreciate it if anyone has suggestions on how to modify the code to separately detect the tag, image and text in each line, and display them in different columns or formats in the dataview table. Thank you in advance for your help!

holroy · January 25, 2024, 3:20pm

It’s possible to read the file contents using stuff like await app.vault.read(file), and there are various example in this forum on doing this to extract sections under a given heading and so on. These examples also showcase how to split the content into lines, and working on each line on its own. When on the line level, all your requests should be doable using a suitable set of regex’s.

A better solution, in my opinion, would be to change the lines you want to work with into either bullet points or tasks. This would allow dataview to get the text a lot easier, and you can focus your energy more on extracting the needed information from that line itself, rather than manipulating files asynchronously (which can be a nightmare in itself).

Either solution would however at some point present some images to be rendered, which has proven in the past to be something which might or might not work, and it’s not entirely well understood why it happens. There has been numerous cases on embedding stuff from dataview with various features. Luckily, some of the image embedding seems to work out rather nicely (this is also a topic frequently discussed in this forum).

PencilMing · January 29, 2024, 4:13am

Thank you , i will try it.

system · April 28, 2024, 4:13am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.