Shell script: Concatenate by tag

This command-line script iterates through the current folder, hunting for all Markdown files that contain a certain substring (a tag, in the example). It then concatenates all files into a single file outside of the vault (in the user’s home directory in this example).

It precedes each file’s content with the filename, converted into title case (as a Markdown H1).

#!/bin/bash

tag="#book/ThinkingFastAndSlow"

find . -type f -name "*.md" -print0 | xargs -0 grep -l "$tag" | while read -r file; do
    if [ "$file" != "$0" ]; then
        title=$(basename "$file" ".md" | awk '{for(i=1;i<=NF;i++) $i=toupper(substr($i,1,1))tolower(substr($i,2))}1')
        echo "# $title"
        cat "$file"
        printf "\n\n"
    fi
done > ~/concatenated_files.md

EDIT: I’ve converted the script into Ruby. It now generates a simple table of contents at the top of the file.

require 'pathname'

def title_case(title)
  title.split.map(&:capitalize).join(' ')
end


# Set the tag to search for
TAG = "#book/ThinkingFastAndSlow"

# Find all files that contain the tag
files = Dir.glob("./*.md").select do |file|
  File.read(file).include?(TAG)
end

# Sort files alphabetically
files.sort!


# Create the table of contents
toc = "# Table of contents\n\n"
files.each do |file|
  # Remove the tag and .md extension to get the title
  title = Pathname.new(file).basename(".md").to_s.sub(TAG, '').gsub('_', ' ')

  # Add the title to the table of contents
  toc += "- #{title_case(title)}\n"
end

# Concatenate the files into a single document
content = ''
files.each do |file|
  # Remove frontmatter, tags, and aliases from file content
  text = File.read(file)
  text.sub!(/^---\n.*?---\n/m, '')
  text.gsub!(/#{TAG}/, '')
  text.gsub!(/\[\[(.+?)\|.+?\]\]/, '\1')

  # Remove .md extension to get the title
  title = Pathname.new(file).basename(".md").to_s.sub(TAG, '').gsub('_', ' ')

  # Add title as heading and file content to final document
  content += "# #{title_case(title)} \n\n#{text}\n---\n\n"
end

# Combine the table of contents and document
output = "#{toc}\n---\n\n#{content}"

# Write the output to a file
File.write(File.expand_path('~/concatenated_files.md'), output)

2 Likes