Bases: data source from single CSV, JSON, special markdown file instead of note properties

Background:

I really like the bases feature, as it is very convenient for users who are not familiar with coding. However, since it tries to read notes from the entire vault, I am concerned that it may become slow when the number of notes exceeds, for example, 10,000.

While such a large number of notes is uncommon in typical use, it can be quite common when working with databases, where it is normal to have tens of thousands of records, each containing only a small amount of information. In these cases, it often makes more sense to store the data in CSV or JSON format, rather than using plugins to transform those records into thousands (or even millions) of individual notes.

If I am correct, repeatedly opening and closing a large number of small files introduces significant overhead compared to reading the same data from a single file. For example, when I want to learn a subject, I might need to handle a large dataset, and performance could be a key factor.

Requested Feature:

Would it be possible to optimize the bases feature to handle large datasets more efficiently, perhaps by allowing it to read directly from structured files such as CSV or JSON? As far as I can see, the technical effort required for such support would be minimal.

Thanks.

7 Likes

This feature request is legitimate but the background is wrong. Obsidian does not read every file every time. There are multiple layers of caches (in memory and on disk).

I still think that if you have a very large database with complex queries you are better off using a different product.

I hadn’t considered the potential benefits of caching, which could indeed help performance significantly. However, I still believe it would be beneficial to allow the bases feature to operate directly with CSV or JSON files—including both reading and modifying them.

The requested feature would be valuable not only for handling large databases, but also because CSV and JSON—especially the former—are convenient and efficient formats for storing, visualizing, and modifying structured data.

The reason I want to use Obsidian is to take advantage of its other features—such as tags, links, and backlinks—to create a richer, more interconnected knowledge base.

1 Like

By the way, it also occurs to me that using CSV or JSON could be considered as an additional way to store Obsidian notes. This could help avoid an overwhelmingly large number of individual note files and make it easier to organize notes by different topics. It seems that this is how Notion-like apps handle similar cases.

Moreover, if the team believes that such an option might limit the flexibility of the software, it could still be implemented alongside a simple function to convert between these formats and individual Markdown notes in both directions. In this way, the proposed extra storage format could, to some extent, serve as a ā€œzipā€ or archive of multiple related notes.

One of the Major point of Obsidian is that we are working directly with plain text markdown files locally.
Some people choose obsidian vs other software because of this.

It has pros and cons and but that’s the path we have chosen.

I understand your point about sticking to Markdown formats. However, it may be worth investigating the main drawbacks of this choice and exploring ways to mitigate them, at least to some extent.

When comparing Obsidian with other software such as Notion, it seems that supporting the organization of multiple entries within more sophisticated structures—such as CSV-like two-dimensional tables or JSON-like data—rather than keeping each entry in an isolated individual file, could be beneficial in many ways. This would be particularly advantageous for large datasets, where using individual Markdown notes for each entry is likely to be inefficient, degrade performance, and make features such as sorting, searching, and filtering more difficult to implement efficiently.

If the team intends to investigate the possibility of storing data in more sophisticated structures to further improve performance and scalability, I’d like to point out that—although I have proposed using formats like CSV and JSON to achieve something akin to an ā€œarchiveā€ or ā€œzipā€ā€”it could also be possible to develop a strategy for converting such CSV or JSON content into a form that still uses Markdown. After all, these formats are also text-based, so embedding them within Markdown and parsing them as needed should be technically feasible.

It is noteworthy that my suggestion for Obsidian to develop more sophisticated storage structures is not essential.

On the other hand, I would like to reiterate that my primary request is for the Bases feature to support working directly with CSV or JSON files. I believe this enhancement could be implemented with relatively minimal effort and would lead to significant improvements in managing and interacting with CSV or JSON files. In my opinion, this would not only help in handling large datasets efficiently but also improve flexibility and usability for a wide range of users.

Considering that the current Bases feature already offers ways to export to CSV, it seems logical to extend this capability to reading and modifying CSV or JSON files as well.

I strongly support having more choices. Combined with the formula, it can meet the needs of mild spreadsheets. If kanban views are introduced in the future, users are also likely to record very brief temporary content on the kanban.

1 Like

Allowing a single markdown file to, in some way, be represented in the Base by multiple rows would also work - I don’t think CSV or JSON files are strictly needed to get at the underlying feature request.

Personally, I really like Bases but also from an organizational perspective I’d like to be able to have a single file be able to provide multiple rows for a Base.

1 Like

You could allow a single note with a large Markdown table in it as an option for the backing data for Bases. This way you gain the efficiency of not having thousands of one-line notes files, and you stick with standard Markdown format.

This idea of backing Bases with CSV or Markdown tables is critical to me. I have lots of things I keep track of that are just a few words in each item. I just don’t use Obsidian for them at all, as I don’t want to have to have thousands of separate files with just a couple of words in each file. I would love to use Obsidian for these things.

1 Like

I created an account just to comment on this topic.

For my site, I store all sorts of content inside JSON files that get slurped up by Astro - my favorite movies, games, fiction and non-fiction books, my career history, and more.

When I first realized that the bases feature could help me manage my site’s data, some of which is markdown/mdx, I was floored that I wouldn’t have to build any sort of system for managing my site’s data.

But then I realized, it wouldn’t work on the data that’s stored in JSON and my heart sank. Nor does it recognize MDX files though that’s a separate issue.

Think about it – with JSON/CSV the data you’d be extracting out of markdown files is already ready to go in a data structure. This is like the easiest win ever.

Please please reconsider!

1 Like

I agree. Direct CSV and JSON support in Bases is a highly pragmatic request.

1 Like

Csv -files (Comma Separated Values) are plain text files and use comma ā€œ,ā€ instead of pipes ā€œ|ā€ to store and visualize data as tables.

CVS should be a alternative way to display tables and be the perfect addition to bases.
Csv doesn’t need vault data, so it might be quicker as well.

1 Like

Virtual notes for Bases - Feature requests - Obsidian Forum

Alternative way around the same problem.

+1 for this feature.

Making JSONs or CSVs a data source option for bases would be super helpful for plugins with built-in ā€œlibrariesā€ – to filter and sort them without having to create a note per entry.

1 Like

Use case or problem

I want to use bases but I don’t want to have one file per row in my vault.

Base rows currently directly and only represent files.

Proposed solution

Base files often contain only properties and no or very little content. So I was thinking of having a single file with multiple blocks where each block creates a row in a base.

Those block could be frontmatter blocks (---) or there could be a separate block type (```base-row).

There could still be content between those blocks, semantically associating the content with the block above it.

2 Likes

Use case or problem

I want to be able to leverage the full power of the Bases plugin at a local level (standalone .base file or embedded in a note). This will allow me to use a data source that is only relevant to the Base itself and save us from littering the vault and its index with small notes that have no value on their own.

Proposed solution

MVP would be to allow a source to be defined in a Base. The source would effectively function as the front matter source for what would have been individual notes. The source can be inlined, or in a separate file. No self-referential internal linking in the source would be supported, but internal links to other notes in the vault would still be supported.

An explicit contract could be defined with users that a separate index would be maintained for the local source and will not be usable in any scope outside of the Base itself. This would be done for clarity and to avoid confusion as to why entries in a Base are not visible in Quick switcher, search results, graph view, etc.

The file property could be null, or the provide the same information as the standalone Base or the note the Base is embedded in. My suggestion would be to leave it as null since it would both be less confusing to use and serve as a reminder that this is not a normal Base.

Inline Source

The source key could be an array of front matter objects.

formulas:
  Thing: link(subject, subject.asFile().properties.aliases[0])
properties:
  note.rating:
    displayName: Rating
views:
  - type: table
    name: All
    order:
      - rating
      - formula.Thing
    columnSize:
      note.rating: 100
source:
  - { rating: "ā­ļøā­ļøā­ļøā­ļøā˜†", subject: "[[A Restaurant]]", tags: ["food/bbq"] }
  - { rating: "ā­ļøā­ļøā­ļøā˜†ā˜†", subject: "[[A Movie]]", tags: ["entertainment/horror"] }
  - { rating: "ā­ļøā­ļøā˜†ā˜†ā˜†", subject: "[[A TV Show]]", tags: ["entertainment/scifi"] }

File Source

The source key could be a path in the vault to a JSON or YAML encoded file. The file contents would be an array of front matter objects.

Reviews.base
formulas:
  Thing: link(subject, subject.asFile().properties.aliases[0])
properties:
  note.rating:
    displayName: Rating
views:
  - type: table
    name: All
    order:
      - rating
      - formula.Thing
    columnSize:
      note.rating: 100
source: Reviews.json
Reviews.json
[
  { rating: "ā­ļøā­ļøā­ļøā­ļøā˜†", subject: "[[A Restaurant]]" },
  { rating: "ā­ļøā­ļøā­ļøā˜†ā˜†", subject: "[[A Movie]]" },
  { rating: "ā­ļøā­ļøā˜†ā˜†ā˜†", subject: "[[A TV Show]]" }
]
Reviews.yaml
  - rating: "ā­ļøā­ļøā­ļøā­ļøā˜†"
    subject: "[[A Restaurant]]"
    tags:
      - "food/bbq"
  - { rating: "ā­ļøā­ļøā­ļøā˜†ā˜†", subject: "[[A Movie]]", tags: ["entertainment/horror"] }
  - { rating: "ā­ļøā­ļøā˜†ā˜†ā˜†", subject: "[[A TV Show]]", tags: ["entertainment/scifi"] }

Current workaround (optional)

None, this functionality cannot be approximated.

Related feature requests (optional)

None.

I just thought of another source file format that people in an academic or corporate setting could probably take advantage of: CSV. The obvious catch would be ensuring there is a header row and that slightly more complex fields (e.g. aliases, tags) would probably not be supported.

Related: Upgrade tables, to be like BASES / Single file base view? ?