Knowledge Processing Pipelines

techflo · March 8, 2021, 12:51pm

Sharing you my personal Knowledge Processing Pipeline:
Instapaper → Readwise → Obsidian

Recently I struggled with discovering quality content to pipe through.
What’s your way to fill your pipeline?

davecan · March 8, 2021, 3:40pm

I don’t use a particular “collection pipeline” because I try to avoid the Collector’s Fallacy: The Collector’s Fallacy • Zettelkasten Method

Within my (zettelkasten-based) system I either incorporate content directly into one or more evergreen notes, or, if the content is large enough / significant enough, I create a source/reference note for that particular source and then spin out literature notes from it. These are all contained in a folder dedicated to that source. I’ve also configured Obsidian to place image attachments into a subfolder in the current folder.

End result is I have a structure like this for processing sources:

- /vault root
  - /sources
    - /smith2019 Some source title
      - /attachments
      - smith2019 Some source title
      - Some lit note from this source
      - Another lit note from this source
      - ...

When I’m done the note smith2019 Some source title is mostly an outline of links to the various literature notes.

Those then are linked from / incorporated into evergreen notes.

Because of this, there is no “pipeline” of content in the traditional sense of having dedicated “flows” from various external sources. I find those approaches brittle and its easy to spend more time on trying to erect (and maintain!) the perfect system rather than actually using Obsidian to take notes. And once that does break down it is difficult to disentangle that local failure from a failure of the global system that has been erected, which leads to antipathy and loathing of the system itself.

So instead I focus on a system within Obsidian that is flexible enough to adapt to any need. If something is worth incorporating it is either incorporated directly or goes through the above process, regardless of where it comes from. (I say within Obsidian but I do use Alfred workflows to assist, rather here I mean there is nothing resembling the flow you describe ie. no dedicated pipelines or flows from external apps for content capture – because again, collector’s fallacy is something I’ve fallen victim to far too many times and am committed to not falling for that trap again)

Dr. Allosso’s comments here align extremely well with my thinking on this subject:

fcy · March 9, 2021, 12:17am

Why not start as a “evergreen note” and evolve it? I’ll admit evergreen notes didn’t click for me yet so this question might make no sense or be super basic.

I’m just starting out but I’ve been creating notes as I learn something and updating them. For example, I want to improve my paperless system so I’m watching a series of videos on it, as I watch I’m taking notes on a Paperless note which I think will eventually be my-paperless evergreen note.

fcy · March 9, 2021, 12:35am

I think the link to the collector’s fallacy link answered my question with this:

Until we merge the contents, the information, ideas, and thoughts of other people into our own knowledge, we haven’t really learned a thing.

The evergreen note is the “our own knowledge”.

techflo · March 9, 2021, 8:14am

Thank you so much for sharing that detailed post about your process!
I loved the video of Dr. Allosso you shared to me. Also the link giving information about the tendency of the Collectors Fallacy gave me a very good input on your thoughts as well.

So i suppose you are not having the problem of discovering content out of the noise of the internet? As the creator economy is picking up quite fast, I am flooded by so much content, that I feel like searching for the needle in a big haystack.

davecan · March 10, 2021, 4:28pm

@fcy That is correct, I like to keep some level of distinction between my thoughts and the thoughts of others. I also do this by distinguishing between literature notes and evergreen notes directly in the note title, which makes it easy when scanning to tell which thoughts belong to others and which are mine. (“mine” often means a note with a set of literature notes capturing others’ thoughts and letting them argue or agree with each other while I find themes, patterns, etc from their discussion)

I don’t run everything through that source-to-literature process – the heuristics I use to make the decision include the size/length of the material and the number of core principles that I can see from a cursory read. I’ve cited a single passage from a book in an evergreen note with no corresponding source/lit note, and I’ve run a 3 paragraph online comment through this process because it was written by a professor and contained at least 3 valuable concepts each of which I converted into literature notes. Since the literature notes each refer back to the source note, which in turn links to the original source material (in the case of the comment I just copied it into the source note as well as linked to it, in case it is deleted one day) I have traceability.

It’s also actually much simpler in the logistics than it may sound. I’m experimenting with Alfred workflows and have one that lets me easily create the source note from anywhere on my Mac, so I can process it immediately or just let it sit for review at some undetermined future date.

@techflo I am absolutely overwhelmed by the content available! But I’ve been burned so badly by the collector’s fallacy before that I try to be much more cautious in what I allow in to consume my time.

Currently I have 70 source notes sitting there in various stages of processing, some mostly processed and many others not touched yet. I’ll get to them when they seem important enough and I have time. Also quite a few of those were captured when I was earlier in my Obsidian + Zettelkasten journey, and I could probably toss out 2-3 dozen now as not actually worth my time. So in that sense I do “collect” a bit but I’m not collecting the content and hoping for connections, rather I “collect” a source that seems interesting and let it sit there, sort of like in an inbox, and occasionally I look over the list and decide to read something interesting. Or I’ll decide it is no longer interesting and just delete the entire folder without processing it at all.

One source that is mostly done and has a lot of material in its 250 pages has 37 literature notes so far (which feels excessive and I may pare it down a bit), another that is less than half done (but is 900 pages of dense material that I need to study for a major exam) has 56 literature notes. (which doesn’t feel excessive, as a prior exam I took in the same field had me generate 5000 flashcards, with a lot of duplication of course)

As I process them some of those may be combined and condensed.

fcy · March 10, 2021, 9:11pm

@davecan your process is very interesting, I’m curious to see what it looks like at the file level specially the example you gave about processing a comment through the full-fledge processing pipeline. I don’t have dense book materials to study but I get a lot of my knowledge from website articles, and talks. Do you put those under your sources folder too? Does a website article get its own subfolder too? What do you name these notes?

davecan · March 12, 2021, 8:48pm

@fcy Like I said it really depends on the value of the source. If I only get one or two ideas from a source I may just create/update evergreen notes with the new info and cite that source directly. But if the source has the potential to make multiple additions to my knowledge base then I may choose to run it through the process.

Anything can get this treatment. I can run a book, journal article, blog article, video, podcast, anything through the process. A source is a source is a source.

The structure is simple, here’s an example:

Vault root
- Evergreen notes
  - ...
- Source & lit notes
  - astley1987 Never Gonna Give You Up (202103121030)
    - astley1987 Never Gonna Give You Up (S.202103121030)
    - Strangers to love are averse to making commitments (L.202103121032)
    - Intensity of love directly correlates length of relationship (L.202103121035)
    - A true love will never give you up, let you down, run around or desert you (L.202103121038)

(that’s not a real example, but I find humor often makes concepts more memorable)

The source note (denoted by (S.YYMMDDHHMM)) is the dumping ground for raw notes as I process the source. As I find themes in those raw notes I chunk them together into groups and give the group a meaningful name (phrase based where possible, capturing the concept) and then highlight that chunk and extract it into a separate literature note using the Note Refactor plugin, which helpfully replaces the chunk with the link to the new note and links from the new note back to the source note for me. The end result is often that the source note largely turns into an outline / index note containing an annotated list of literature notes (denoted by (L.YYMMDDHHMM)) which are then linked to and referenced by various evergreen notes, as appropriate. (some aren’t yet, they just sit there waiting to be used)

(admittedly there is other “junk” in that source note – question prompts to consider, rough notes that didn’t make the lit note cut, etc – along with basic biblio info like type, author, year)

Essentially this is a way to extract core principles, claims, arguments, conclusions, etc from a source and then remix them together in evergreen notes, e.g. an evergreen note may say something like this principle is an abstraction of principles from authors A and B, see: [] and [[Y]]; but see [[Z]] which disagrees on the basis of blah.

Regarding the example of the reddit comment, this is actually a case where I was correct in copying the entire comment contents into the source note because I now see the original comment and the entire user account has since been deleted by the author.

Here was the contents of the comment, in its entirety:

Inflation is the big boogeyman that people are scared of but don’t quite understand. Inflation is often seen as something that cuts into your future spending and over time reduces your spending power until you’re carting around a wheelbarrow of cash to buy a loaf of bread. This is the classical Mills/Smith/Ricardo understanding of money. And while this effect is present in some areas (and often to a smaller effect than Fed published inflation rate), it’s generally thought more recently that even in inflationary environments, we can see increased standards of living with less spending. 60 years ago, $100000 bought more bread, a larger house, and a “fancier” car than it does now, but the standard of living bought with $100000 now is much greater than it bought 60 years ago. The house bought has better climate control, the appliances are beyond even the wildest dreams, and cars are safer, more fuel efficient, and can even drive itself

This is mostly due to the idea that in a healthy economy, the value of assets produced in the economy is outpacing inflation. When money is created during the loan creation process (fun fact, only about 10% of money in the US economy was created by the federal government, yet still people are under the impression that the Fed prints money), the assets bought, the companies formed, the investments into new technologies are all providing more value to our standard of living than the net reduction in our spending power due to the creation of money. standard of living value comes in many ways. We drive down costs of current amenities by more efficiently producing them, and we produce new standard of living amenities.

The general thing to know is that a dollar for dollar comparison of future money vs today money isn’t apt. Spending power in some ways goes down. While yes, it costs more to buy a loaf of bread now than it did 60 years ago, but in other areas, we have driven down production and distribution costs and greatly driven up the quality of the living bought. Adjusting for inflation is a balance of accounting for the increased price of some goods, taking advantage of innovations that reduce the cost of some of your current amenities, and incorporating new amenities created in a healthy economy into your life in a well thought out retirement income plan.

This comment thread was converted into two literature notes:

[[Future goods tend to provide higher standards of living despite inflationary dollar erosion  (L.2102122023)]]

[[Only about 10% of the money in circulation is printed by the Fed; the rest is produced by banks through lending (L.2102122031)]]

The first one was an interesting concept I wasn’t aware of from my prior (limited) economics coursework.

The second one I already knew but it was useful to attach the specific number to the concept.

In this particular case I could have just created / updated existing evergreen notes with the relevant ideas and quoted the 3 paragraphs from the comment and provided a link to it, in each evergreen note. But I didn’t have one, so I made it a source note. The flexibility of the process means it doesn’t really matter which way I choose as there’s no real downside to doing it this way, other than cluttering up my sources folder.

(I also today sorted through it and created actual index notes for my sources folder, so I now have 6 separate index notes with grouped links to the sources that are in various stages of processing, grouping them into broad groups like: business and tech strategy, knowledge digestion & writing, cybersecurity & computer science, systems thinking/reasoning, law/economics/sociology/politics. This way I can decide “I want to process content on topic X” and go to that group and pick the source I want to review / continue to review)

These are not linked from any evergreen notes yet, but are sitting there waiting to be found in a search.

Essentially what I’ve done is extracted the core claims/arguments/conclusions as atomic notes from a single source and made them available as freely-referenceable notes.

Hope that helps clarify. Let me know if you have any other questions.

fcy · March 12, 2021, 9:51pm

Wow this write up is great, thank you for going the extra light year!

I’m bogged down by small details like what to call each folder and note (I need to learn more systems thinking), which was what prompted me to ask the question initially. I think when I saw a sources/fancyAuthor1684 phd book title (exaggerating to make a point) I thought I couldn’t apply to the Daring Fireball article I just read but I can just follow the pattern to organize anything.

- Sources
  - gruber2021 Google's Outsized Share of Advertising Money
    - gruber2021 Google's Outsized Share of Advertising Money
    - Privacy-invasive user tracking is to Google and Facebook what carbon emissions are to fossil fuel companies

The IDs you append at the end feel like Zetellkasten, do you use them to link or it is mostly for searching and easily seeing the grouping? Do you have them on your Evergreen notes?

I’m going to keep asking , feel free to take a break from me with no hard feelings.

Edit: I forgot to clap for the Ricky Roll example!

davecan · March 12, 2021, 10:19pm

Glad it helps and ask away!

Yes it can be applied to essentially any type of source. The only question is whether or not the juice is worth the squeeze. Take that DF article for example, it could be summed up in a sentence or two in a single evergreen note (in which case no need to run it through this workflow, just write the idea in the evergreen note and slap the URL in the note as the source citation and move on) or if your interests call for extracting a lot of concepts from it then it can absolutely be run through this process without any significant friction.

The IDs you append at the end feel like Zetellkasten, do you use them to link or it is mostly for searching and easily seeing the grouping? Do you have them on your Evergreen notes?

Yes and yes.

A lot of people who build a zettelkasten or ZK-like system opt to use IDs and timestamp-based IDs are the most natural to implement. But they always put them at the front. That is horrible from a UX standpoint IMO because it interferes with readability in the doc, leading to the need to create aliases constantly which just pollutes the edit view with a lot of unnecessary noise.

The pro of using an ID though is that you virtually eliminate collisions. Since I want to be able to work with Obsidian files from outside Obsidian (e.g. Alfred workflows) I want to be able to do so without collision risk, so I need IDs.

Solution is to move them to the end.

The prefix attached to the IDs denotes the type of note:

S. source note
L. lit note
E. evergreen note

S & L notes are in the sources folder, E notes are in the evergreen notes folder.

With this I now have:

virtually zero chance of collisions in titles
links can be placed in without gnarly aliasing
links are easily readable even in edit mode (no YYMMDDHHMM speed bump when scanning)
can immediately see at a glance whether the link is to my own thoughts (E.) or someone else’s (L.) or to a source note (S.) which again is basically an index note

Because of the way I title the lit notes (phrase-based, following Andy Matuschak’s guidance, same as I do with evergreen notes) the line between lit note and evergreen note blurs, and some of my topic outlines freely link to both evergreen and lit notes as appropriate. This is where having the prefix attached to the ID (which again, is itself a suffix to the title) shines because now I can see at a glance which are my own thoughts and which are someone else’s. It helps me mentally keep things straight when I have a group of links on a topic because I can see things like “this line of ideas I have in this note is almost exclusively from one author” etc.

I also use a few emoji / unicode characters sparingly, at the front of titles, ✧ topic name for index notes on a tightly grouped set of notes, and ✨ moc name for a broader ranging MOC / index note that typically links to 1…N index notes.

fcy · March 12, 2021, 10:48pm

I already started organizing my vault following your ideas, let’s see where I land. I never liked the IDs when I was reading about ZK notes but your solution makes them great. I’m going to steal the suffix IDs and the prefix letters.

One more question that came up, what do you do if you revisit a source in a later date? Do you create a new source folder with a new ID or append to the existing one?

davecan · March 13, 2021, 1:13am

Hey cool glad it helps you.

I haven’t revisited a source per se yet, but I would probably just update the original.

On the IDs, funny enough while going through and organizing my sources earlier today I found a situation where I had the exact same source twice. This is actually completely tolerable because of the IDs – I create source notes typically using Alfred which I have set to create the folder and the source note within the folder all with the same name, both with the timestamp ID. So if I had captured the source, did some work in the note, then captured it again without realizing I had already done some work in it, and I didn’t have the IDs, then Alfred could have overwritten the original and my work would have been lost. As it stands they are both intact and I can just delete the extraneous unworked one with no issue, or merge them if I for some reason worked in both at different times somehow.

Alfred and Keyboard Maestro are both stupidly useful with Obsidian. I have Keyboard Maestro shortcuts to create the IDs for me easily: Ctrl-E automatically moves the cursor to the end of the title, adds a space, and adds (E. + timestamp + ) then back into the note. Ctrl-L for lit notes. So my workflow with making lit notes is usually (1) highlight chunk in source note (2) run note refactor plugin (3) press Ctrl-L. Boom now I have a lit note, in the same folder as the source, with (L.YYMMDDHHMM) at the end of the title, with a footer link back to the source note, and linked from the source note which I start cleaning up into an outline over time. It’s so simple.

Note that the downside of this is increased operational complexity, in that the number of notes generated goes up significantly (which is not necessarily a downside!) but each note becomes more of a standalone atomic unit that can be shuffled in different ways into other notes more easily. So it increases complexity (more notes in searches, for example) but also increases flexibility, as long as you accept the overhead of potentially having some duplication in your notes e.g. when multiple authors say much the same thing. (in which case I put the author’s name at the end of the title as well, e.g. def. Common Term Abc (author) (L.YYMMDDHHMM) so it is clearly disambiguated.

And sometimes a lit note is so good I just “elevate” it to an evergreen note by changing the L. to an E. and moving it into my evergreen notes folder. Usually this is because I foresee appending more to it later anyway.

The key to me is that this approach provides a lot of flexibility while still being a strong scaffolding framework in which to operate. So I can break the rules per se without breaking the system, it just accommodates it because the workflow handles 80% or more of my use cases almost out of the box and the above tweaks here and there don’t really break anything at all.

fcy · March 13, 2021, 1:35am

I think updating the original makes sense, and with this system new literature notes can be added when revisiting and the index updated. I guess my question came from thinking about the ID as a date, but the timestamp is only used as a simple UUID not really meaningful as a date. Thinking of if just as an identifier (like an auto-increment number) without meaning lends to the the update original source.

I’m just at the begininning but I already created a Keyboard Maestro macro to give me the YYYYMMDDHHMM IDs when I type zzkdid, I like your automation it is impressive.

I think the standard way of treating the notes makes up for the extra operational complexity, at least for me. I often find my self stuck trying to decide the best place to put something in. I was just reading this post today about the much talked about book How To Take Smart Notes and the principle 5 is exactly about this:

Principle #5: Standardization enables creativity

davecan · March 13, 2021, 4:32pm

Cool glad it is working out for you!

re: standardization, I saw a good quote yesterday (no known source, randomly found in a comment online) that resonates deeply with me and matches exactly what you are seeing:

Complex systems breed simple behavior; simple systems breed complex behavior.

I spent a LOT (seriously, a LOT) of time grappling with systems and workflows, enough perhaps to practically write a book on this stuff lol (for example, my single outline note on note taking principles is 6 printed pages containing almost 70 links to evergreen and lit notes on various principles I’ve extracted, and I have a similar separate note comparing various note taking systems, and several deep diving into Luhmann’s process… I went nuts lol) and eventually iterated to this solution which (so far) works well, providing enough structure to add value and meaning without being so complex that it unnecessarily constricts my actions. It rewards tweaks and adaptations and “rule breaking” rather than forcing me to fight against myself.

Yes the IDs really are there just to ensure no collisions when I create a file from somewhere outside Obsidian. Obsidian would detect a collision if I created a duplicate inside Obsidian, but when creating from outside e.g. Alfred it is a risk, so the IDs remove that risk. I don’t really attach any significance to the date, but it does have a secondary benefit of providing a search vector if needed.

For example, if I’m looking for a note on the deep-rooted philosophical meanings of the complex lyrics of Silento’s classical masterpiece, and I know I wrote the note in the year 2082, I could simply use fuzzy search for something like philnae(.82 and it would find it fast. But that’s not a primary goal at all.

For reference, here’s my Keyboard Maestro macro for adding the lit note ID:

fcy · March 13, 2021, 8:22pm

I am into that quote! I’m currently thinking about how to organize all my stuff (both in Obsidian like we’ve been talking about) and other family documents – currently looking into Johnny Decimal for these, but I didn’t even get to categories I like yet.

I often I get overwhelmed because I have this tendency of wanting to solve the problem right away. Thinking about the future outcome of creating the system helps to calm down, take a break, and continue some other time.

escher · March 13, 2021, 9:23pm

Exactly this. So I keep one store of information that I found interesting but don’t have the time/motivation to merge into obsidian.

It’s a nice place to browse and rediscover interesting items that I found surfing the internet, but when it comes down to it I’m compressing the information into my own words in obsidian. Highlights and all.

davecan · March 14, 2021, 5:17pm

It’s good to have a general idea of the future direction but not to try to plan too concretely for a future you when you don’t definitively know the needs and forces that will apply to future you.

Agile development has some informative guidance. Essentially you want to focus on the near-term need and establish workflows and apply patterns that enable you to pivot as needed. Instead of over-engineering up front you accept that you can’t plan for everything, you accept a bit of chaos and messiness, and you plan for the ability to adapt rather than planning for specific adaptations.

This concept occurs in many contexts:

agile product development roadmapping
Don Knuth’s famous adage that premature optimization is the root of all evil
military fog of war is accepted as something that must be managed in a conflict
Lean Startup methodology advocates tiny experiments with validated learning
etc.

This heavily informed my approach, which led me to the result described in this thread. This is why I started with a single flat structure with no folders and evolved to the workflow and approach I have now.

I also struggled mightily with the desire to apply structure from the start. And in fact I did experiment with this, using the pre-faceted MOCs of Nick’s LYT framework as well as experiment with things like Johnny Decimal. Every time I got an ick feeling pretty quickly that led me back to the organic approach. There’s nothing particularly wrong with those approaches if they work for someone, though I do think the organic bottom-up approach is better. I had to force myself each time to stop going down that road and return to the simple and trust that the power would emerge over time.

I have an evergreen note titled Complex systems emerge from simpler ones that is completely accurate in describing my system. If I were to hand someone my system they would be overwhelmed, but it grew up organically one step at a time in response to my actual needs, with a constant eye towards ensuring I didn’t box myself in to any particular approach as I went.

The result is I have multiple levels of “MOCs” and index / outline notes for “categories” and themes that apply to my interests and grew organically out of my corpus, rather than establishing a top-down structure that forces me to (subconsciously) try to apply every note into the top-down hierarchy chosen by someone else.

We should develop our own taxonomies, not follow the taxonomies of others. The point of the ZK approach is to build up our own mental models of the world and the things that interest us, and enable us to think more deeply and better express our own unique voices. Forcing ourselves to follow someone else’s taxonomy and schema suppresses our creativity.

Just my opinions anyway.

davecan · March 15, 2021, 12:04am

As another example of the flexibility of this process, I’m currently processing a set of some 15 or so articles from a website as a single source, since they all revolve around teaching a particular set of closely-related techniques.

It didn’t start out this way. I started by processing a single article but quickly realized that I wanted to process many of the articles. I initially planned to create a source per article but the more I read (and flipped through articles on the site to find out what they were like) I realized creating 15+ “sources” and trying to interlink the various literature notes (across folders) would be cumbersome, so I just kept taking notes in the original source note as I clicked through to each article. To keep things straight I’m attaching links to the specific articles in each applicable section of the notes.

Once I’m done reviewing all the articles and producing the outline in this single note I’ll begin chunking them together in meaningful themes and groups that work for me in my contexts and then convert those into phrase-based literature notes capturing the core principles and processes.

Then I’ll rename the source note to something more meaningful and write a statement at the top explaining that it is syntopically derived from multiple source articles.

This works because the fundamental workflow is simple enough to flex to accommodate it, while still providing a strong framework in which to perform the work.

KrisSinclair · March 19, 2021, 7:00pm

What is the video series you are watching? I’m interested in improving my Paperless workflow as well.

fcy · March 19, 2021, 8:10pm

@KrisSinclair I’m watching MacSparky’s Paperless Field Guide. I enjoy his field guides, there is always something I learn from them.

It doesn’t require any prior knowledge so it goes into a lot of details that often seem trivial to me, I’m not complaining about the style just setting the expectation.