Get Obsidian vault usage statistics from Git

I have been using the Git plugin since April 2024 to do hourly backup of my vault (only if there has been change). I think that it would be insightful to have stats on the content based on the fine granularity of all those git commits.

Do you know of any way to see daily stats of added, edited and removed content using Git or Github?
I would also like to see which notes (files) I have been modifying the most (occurrences and amount of content).
This would help me reflect of what are the most important notes in my workflow and what are the other notes I seem to never use.

If you have other recommendations that doing it through Git, just feel free to share :innocent: I also have Obsidian Sync running since the around the same time and I expect it to have similar data hidden somewhere

The closest thing you can get in Github is the Insights → Code frequency chart. It shows lines added and lines deleted aggregated by week. For example:

But you can use command line git to generate a log of changes that you can then process via some scripting language, and generate a result in form of markdown table or even mermaid diagrams.

For example, I used AWK to get git’s output and generate code compatible with mermaid, that then pasted into Obsidian to get this chart (showing only commits since 2025-09-01):

Unfortunately mermaid cannot generate a legend. In this chart the gray bars are the number of files renamed per day. Most of the days they are zero, but mermaid puts the base on -1, so a gray bar appears even for zero. Red is the number of deleted files, orange is the number of edited files and green is the number of new files.

The code of the chart is:

```mermaid
---
config:
  xyChart:
    height: 800
  themeVariables:
    xyChart:
      plotColorPalette: "#84994F, #FCB53B, #B45253, #a7a7a7"
---
xychart-beta horizontal
    title "File Activity per Day (Stacked)"
    y-axis "Total Files"
    x-axis ["2025-09-03", "2025-09-05", "2025-09-08", "2025-09-09", "2025-09-10", "2025-09-11", "2025-09-12", "2025-09-13", "2025-09-14", "2025-09-15", "2025-09-16", "2025-09-17", "2025-09-18", "2025-09-19", "2025-09-20", "2025-09-21", "2025-09-22", "2025-09-23", "2025-09-24", "2025-09-25", "2025-09-26", "2025-09-27", "2025-09-28", "2025-09-29", "2025-09-30", "2025-10-01", "2025-10-02", "2025-10-03", "2025-10-04", "2025-10-06", "2025-10-07", "2025-10-08", "2025-10-09", "2025-10-10", "2025-10-11", "2025-10-12", "2025-10-13", "2025-10-14", "2025-10-16", "2025-10-17", "2025-10-19", "2025-10-20", "2025-10-21"]
    bar "Added" [29, 40, 9, 107, 22, 2, 13, 10, 2, 18, 12, 12, 18, 36, 2, 18, 22, 20, 11, 31, 18, 2, 3, 27, 22, 22, 32, 16, 8, 18, 14, 12, 12, 29, 9, 9, 5, 9, 7, 11, 41, 29, 12]
    bar "Modified" [25, 25, 6, 102, 19, 2, 11, 9, 2, 16, 9, 10, 15, 31, 2, 15, 18, 16, 9, 25, 15, 2, 3, 22, 12, 18, 23, 13, 7, 14, 11, 11, 12, 15, 6, 9, 4, 7, 5, 11, 30, 27, 11]
    bar "Deleted" [5, 6, 2, 72, 5, 1, 3, 3, 1, 9, 4, 2, 5, 9, 1, 6, 5, 2, 2, 7, 8, 1, 2, 9, 3, 4, 7, 3, 3, 4, 2, 3, 5, 5, 1, 1, 1, 1, 1, 4, 7, 13, 1]
    bar "Renamed" [1, 3, 0, 63, 1, 0, 2, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 4, 0]
```

and the script that generated that data is the following, It accepts two optional parameters for the start and end dates to show.

#!/bin/bash

# FILE: git_activity.sh
#
# Generates a Mermaid 'xychart-beta' syntax for repository file activity.
# This version simulates STACKED bars by calculating cumulative totals.
#
# USAGE:
#   ./git_activity.sh [START_DATE] [END_DATE]

# Get dates from the first two arguments
START_DATE=$1
END_DATE=$2

# --- 1. GIT LOG COMMAND ---
GIT_CMD=("git" "log" "--name-status" "--pretty=format:DATE %ad" "--date=format:%Y-%m-%d")

if [ -n "$START_DATE" ]; then
    GIT_CMD+=("--since=$START_DATE")
fi

if [ -n "$END_DATE" ]; then
    GIT_CMD+=("--until=$END_DATE")
fi

# Execute the git log command and pipe its output
"${GIT_CMD[@]}" | \
# --- 2. FIRST AWK (Process data into a table) ---
# (This block is unchanged)
awk '
BEGIN {
    OFS="\t";
}
/^DATE/ {
    current_date=$2;
    if (!(current_date in all_dates)) {
          all_dates[current_date] = 1;
    }
}
/^[AMDRC]/ {
    status = substr($1, 1, 1); 
    if (status == "A" || status == "C") {
        added[current_date]++;
    } else if (status == "M") {
        modified[current_date]++;
    } else if (status == "D") {
        deleted[current_date]++;
    } else if (status == "R") {
        renamed[current_date]++;
    }
}
END {
    # Print table header
    # print "Date\tAdded\tMod\tDeleted\tRenamed";
    # Print data rows
    for (d in all_dates) {
        print d, "\t", added[d]+0, "\t", modified[d]+0, "\t", deleted[d]+0, "\t", renamed[d]+0;
    }
}' | \
# --- 3. SORT (Order by date) ---
# (This block is unchanged)
sort | \
# --- 4. SECOND AWK (Format for Mermaid with STACKED logic) ---
# (This block is MODIFIED)
awk '
BEGIN {
    # Print Mermaid chart header
    print "xychart-beta horizontal";
    print "    title \"File Activity per Day (Stacked)\""; # Updated title
    print "    y-axis \"Total Files\"";
   
    # Initialize data strings for the stacks
    dates = "";
    data_total = "";      # Corresponds to "Added" slice
    data_mod_del_ren = "";  # Corresponds to "Modified" slice
    data_del_ren = "";    # Corresponds to "Deleted" slice
    data_ren = "";        # Corresponds to "Renamed" slice
}
# Skip the header row from the previous awk
NR == 1 { next }
# Process each data row
{
    # Store values for clarity (and ensure they are numeric)
    a = $2+0; # Added
    m = $3+0; # Modified
    d = $4+0; # Deleted
    r = $5+0; # Renamed

    # Calculate the cumulative stacks
    # These values are what get plotted
    val_total = a + m + d + r;
    val_mdr = m + d + r;
    val_dr = d + r;
    val_r = r;

    # Build the date string
    dates = (dates == "" ? "" : dates ", ") "\"" $1 "\"";
    
    # Build the data strings
    data_total = (data_total == "" ? "" : data_total ", ") val_total;
    data_mod_del_ren = (data_mod_del_ren == "" ? "" : data_mod_del_ren ", ") val_mdr;
    data_del_ren = (data_del_ren == "" ? "" : data_del_ren ", ") val_dr;
    data_ren = (data_ren == "" ? "" : data_ren ", ") val_r;
}
END {
    # Print all data series for Mermaid
    # The order is CRITICAL for stacking: draw the largest first.
    # The labels ("Added", "Modified"...) now represent the
    # visible "slice" of the stack.
    print "    x-axis [" dates "]";
    print "    bar \"Added\" [" data_total "]";
    print "    bar \"Modified\" [" data_mod_del_ren "]";
    print "    bar \"Deleted\" [" data_del_ren "]";
    print "    bar \"Renamed\" [" data_ren "]";
}'

NOTE: as far as I know, mermaid does not stack bars, so I had to “fake” them, by adding up data to generate longer bars “behind” the ones drawn later.

1 Like

Great! Thanks @JLDiaz I’ll definitely try your script.
I am not familiar to mermaid, maybe I’ll use another python library to come up with visualization.

Yes, of course. Instead of awk you can use python to process git’s output, and instead of mermaid you can directly plot with matplotilb or other python library.

I used mermaid in this case because it is a language directly integrated in obsidian, so you don’t need any extra tool to visualize the result, but the barchart provided by mermaid has severe drawbacks and little customization options.

Indeed, with the help of Claude, I created a python script which can generate a variety of outputs. See it at GitHub - jldiaz/gitstats

One interesting output is the plotly one, because it is html+js based and allows for interactive exploration of the data:

1 Like