Reading and referencing in Emacs Org-Mode.

TODO: I have an interesting thought, I need to write it here before I have forgotten it. In computers you may have “bookmarks” and “recentf”, and the ratio between them is the ratio between exploitation and exploration. There should be a healty ratio, not sure which one, but perhaps about 2 (0.5). If recentf is too big, you are exploring too much and not developing. If recentf is too small, you are not exploring enough.

This document starts as a Yet Another HOWTO on keeping your references in Emacs and Org-Mode, but I have a feeling that it might grow into something bigger.

Referencing is a big pain for a scientist. It is painful for two reasons.

Firstly, it is a complex task by itself; when preparing an article, a scientist not just needs to consume a lot of relevant material, he needs also to filter through a lot of material that is less relevant for current work, but might turn out to be useful later.

Secondly, people who want to profit from scientists’ work while contributing very little to the ecosystem are trying to use various political, economical, and informational compulsive measures to keep scientists restricted in their access to knowledge.

What is “knowledge”? I initially wanted to ask this question as “what is research?” or even “what is a research article?”, but those three seemingly different questions turned out to have the same answer.

To imagine “knowledge”, consider such a popular thing as a “neuron”. Actually, not the real neuron, but the neuron as it is presented on “Machine Learning” courses. It is a “node” with one output and many inputs. If you think about it, it looks very much like a scientific statement, which is a sub-statement of a larger “thought”, and is partitioned into many sub-statements. “Nodes” also have so-called “weak links”, that is, references to other nodes which cannot be described using a “part of” relationship, but are rather “associated”.

1. TODO Body

1.1. Reading list [52/58]

  7. (
  16. org-ref
  17. org-bib-mode
  53. obsidian
  54. nano publication
  55. open citations
  58. otter-rss

1.1.1. Fireforg

Seems to be the first found reference for bibliography is org-mode. Its most prominent feature seems to be importing bibliographic data from webpages via org-protocol capture.

It generates org-headings in a prescribed file, with bibtex-code entry pasted into the heading body, and the same metadata being saved as org properties of the same headline.

It uses Zotero as a link between Firefox and Bibtex, since, apparently, at that time Firefox did not support xdg-style links “org-protocol://”. Or, maybe, the big idea is that Zotero has some code for automatically extracting some metadata from webpages via its database of shims with different paper databases.

Kind of interesting, if the “ground truth” of your reading list is an org file.

Not developed since 2009, so now is probably only of historic interest.

1.1.2. Emacs Conference talk of 2022

The workflow there is surprisingly similar to Fireforg. Zotero is still used as the tool to manage books and articles, which are then exported through org-roam-bibtex to an org-roam node with bibtex properties encoded as org properties.

I guess they are then “transcluded” using org-transclude to the final document?

In any case, I consider org-roam’s approach of using a separate SQLite database and mandatory IDs to be fatally flawed. (But I should check, maybe something has happened since my last time looking at org-roam.)

Also, some plugins are mentioned (added to Literature Review), with which I had terrible experience.

1.1.3. My own old approach, used for ICFP 2020

  1. Report file

    I had a bib file, in the directory with the org file for the Report. That file would use an org command #+bibliography, which does I have not idea what. I had a manually typed-in command #+latex_header: \addbibresource{bibliography-bib.bib}, which, for some obscure reason, was exported to TeX like \addbibresource{/home/lockywolf/full-path-to/bibliography-bib.bib}, I have no idea why.

    I also had two lines at the end of the file:

    #+bibliography: bibliography-bib plain limit:t
    #+latex: \printbibliography

    I have no clue why they were not exported automatically, but it is nice that it is so easy to bodge in org.

    I also had to write the following code at the end of the file to get the table of contents:

    #+TOC: headlines 3
    #+latex: \tableofcontents

    The .bib file, at least, in the directory with the report, was a normal “biblatex” file, with which, I remember, I had a lot of trouble understanding when I have to write braces “{}”, and when parentheses “”.

  2. Global configuration

    The problem with org and latex is that they are intertwined a lot with the daily activities. In principle, building both PDFs and HTML pages should be done in a dedicated Emacs, with no environment effects. However, I don’t believe anybody is actually doing that.

    So, in my case, there were three important pieces of setup: org.el, tex.el, and bibtex.el.

    1. tex.el

      For speed entry, I used cdlatex, which I still use, and both in plain cdlatex-mode, and org-cdlatex-mode, which itself does not seem controversial.

      How did TeX settings influence org settings?

      Here is my reftex setup:

      (use-package reftex
        :demand t
        :ensure t
        ((LaTeX-mode . turn-on-reftex)
         (LaTeX-mode . turn-on-bib-cite))
        (require 'bib-cite)
        (setq reftex-plug-into-AUCTeX t)
        (setq reftex-auto-recenter-toc t)
        (setq reftex-revisit-to-follow t)
        (setq reftex-revisit-to-echo t)
        (setq bib-cite-use-reftex-view-crossref t)
        (setq reftex-default-bibliography (list lockywolf/bibtex-bibliography-bib-file))
        (setf reftex-ref-style-default-list '("Default" "Cleveref"))

      So, reftex is Emacs’s built-in feature for cross-referencing, which is so amazing that I don’t understand how it works. I did successfully use it from time to time, but forgot immediately after leaving the document, so high hopes should be suppressed. However, I think that it will still be used in the new setup, since, you know, it is still in Emacs.

      That bib-cite thing is a NIH-style tool which AUCTeX authors created for working with references, but I still do not know whether it is good.

      The only thing I remember is that I used reftex-citation (C-x [), and it helped me insert “some” references. There is also reftex-reference, which should work for references the same way reftex-citation works for citations. How did I format them for org-mode?

      Repeated note to myself: reftex is not part of AUCTeX.

    2. bibtex.el

      In this file I tried to make some sense of Emacs’ bibtex and biblatex support.

      So, this is my configuration for bibtex:

      (use-package bibtex
        :demand t
        :ensure t
        (setq bibtex-dialect 'biblatex)
        (setq bibtex-autokey-year-length 4)
        (setq bibtex-autokey-name-year-separator ":")
        (setq bibtex-autokey-year-title-separator ":")
        (setq bibtex-autokey-titleword-length 20)
        (setq bibtex-maintain-sorted-entries t)
        (setq bibtex-biblatex-entry-alist (seq-concatenate 'list
                                                            '("ArtifactSoftware" "Software Entity"
                                                              (("author") ("title")
                                                               ("year" nil nil 0) ("date" nil nil 0))
                                                              ( ("version") ("note")
                                                                ("url") ("urldate")

      Emacs’ bibtex mode “just works”, except for some reason I needed to add entries for software into the list of entry types. There is also “bibtex-utils” package, which I never got to learn.

      I also tried to use Ebib as a bibliography manager, and I have learnt a bit from its ideology. So, the important thing is that Ebib is a display for an aggregate of biblatex-formatted .bib files, showing and sorting according to authors or years of publication.

      What is more interesting is that it supports additional fields for notes and PDFs. So it is not just a reading list, it actually understands the need for annotating.

      It is supporting two modes of adding notes: a file and a directory. Well, keeping all notes in the same file, seemingly, only makes sense if you have notes a few sentences long, otherwise your file would grow insanely big.

      But keeping all the notes for all PDFs in a single directory also sounds strange. We have directories and symlinks for managing sets in computing. Why would I need to keep all notes in the same directory?

      On the other hand, I do keep all TODOs in the same file? But even that is not true. I have at least four TODO files: laptop, laptop-autogenerated, mobile, mobile-autogenerated.


      Anyway, this has potential for being fun, if “notes” files are actually org-noter files.

    3. org.el


      (("C-c l" . org-store-link)
       ("C-c L" . org-insert-link-global)
       ("C-c o" . org-open-at-point-global))
      (require 'ox-bibtex)
      (require 'org-bibtex-extras)
      (require 'org-ebib) ; Allows opening ebib links in C-c C-o
      (org-link-set-parameters "cite" :follow 'org-ebib-open)
      (org-mode . (lambda () (setf reftex-cite-format
                                   '((?o . "cite:%l")
                                     (?h . "\\cite{%l}")))))

      So, ox-bibtex is in org-contrib, and I suppose, is obsolete now? It implements that #+BIBLIOGRAPHY: /home/user/Literature/foo.bib plain option:-d code, which only includes the:


      and converts all cite:foo to \cite{foo}.

      That reftex customisation is quite important actually. It is used to query a bib file for keys to be used in citations. In this setup reftex is not used for references or index, only for citations. (Right?)

1.1.4. Other thoughts.

The important thing here is the difference between ox-bibtex and ol-bibtex. They are not the same thing. ox-bibtex defines a cite:foo link, for using citations in org documents.

ol-bibtex does import-export to and from actual bibtex text files. I am not sure how ol-bibtex links are exported when exporting to html, I need to check how it works when reviewing packages independently. But the main use for this package is, seemingly, to keep a list of papers in org, and export into bib files when assembling a paper.

I suspect that one would make an org-file with a “reading list” for a project, populate it with books as they appear on the horizon. Those books might be pointing toward, say org-noter files with review?


The idea here is, seemingly, to have “ground truth” in org files, not in bibtex files. Each heading is a book, which has bibliographic data recorded as org properties. Importing and exporting is done semi-manually, in the sense that you can export the required headings into a bib file, which, presumably, you would only do when compiling a paper, but it does not see to be possible to set an org file as a bibliographic database with exports organised completely automatically.

Importing stuff into org is also done semi-automatically. The code will help you to yank a piece of biblatex as an org-heading, but not much more.

Citation exports seem to work using ox-bibtex, that is, using the cite: format. The format using “ordinary links” is not yet mentioned.


This is a howto by Arne Babenhauserheide. He is using a specific LaTeX style, which he is adding to org-latex-classes, as well as a few custom packages.

He is explicitly loading reftex-cite-only, not full reftex, but it is clear that he is going to use reftex for citation inclusion.

He is using org-mode-reftex-search, which is an old function found on the org-mode mailing list, This function, if I am not mistaken, should make a jump to the notes for an original cited document. The code is missing, I guess, Arne copied it from the mailing list into his .init.el.

It is interesting that there we are encountering the concept of “notes file” once again.

Also, it is interesting that he is using minted for org code listings, which, in turn, uses Pygments. I never bothered to make them work, and nowadays there is a whole new machinery for org, called engraved, which should make it possible to colourise both latex, and html, using Emacs means only.

Another useful thing that this HOWTO has is #+BIND: variable value syntax. It lets one override some variables for exports, which is especially useful when there is no #+KEYWORD: syntax for this variable, and when using file-local variables is imperfect, such as when you need different values for editing and for exporting the file. (You also need to set org-export-allow-bind-keywords to t.)


In section, there is an interesting trick on how to make references (not citations) work:

(setf org-export-latex-hyperref-format "\\ref{%s}") will make intra-document references work correctly.

Otherwise, they define a strategy that is basically like ox-bibtex, defining custom links for each citation type.


What I have learnt from his article.

  1. He also uses Zotero for article management. I guess, I need to study it once more. He use the same BetterBibTex that is recommending.
  2. He faithfully uses org-ref (makes me doubtful)
  3. He has a single file with Bibtex metadata in PROPERTIES, and with annotations in bodies. I guess, things are imported there using ol-bibtex. How exactly that page and the bib file are kept in sync I do not know.
  4. He is having three blogs on his site: blog, journal, log and a “wiki”.

    1. Blog is for “longreads”
    2. Journal is for shitpost
    3. Log is for habit tracking
    4. and there is also what I call “howtos”, that he puts in a “wiki”.

    This seems more complicated than my setup of just two categories, “notes” and “howtos”. I have tried to switch from paper into keeping my records in a journal for a very long time, but always failed.


This is Dennis Ogbe’s setup. He is using ebib+helm-bibtex+org-ref

The interesting bits in his setup are the following:

  1. He uses multiple .bib files merged into a single database.
(defun do.refs/update-db-file-list ()
  "Update the list of bib files."
  (let ((db-list (do.refs/get-db-file-list)))
    (setq reftex-default-bibliography db-list)
    (setq bibtex-completion-bibliography db-list)
    (setq ebib-preload-bib-files db-list)))

So, his “ground truth” is still biblatex, but he has a way to group “Points of Knowledge” into categories by different bib files.

  1. He has separate dirs for PDFs, and notes.
(defvar do.refs/db-dirs nil "A list of paths to directories containing all my bibtex databases")
(defvar do.refs/pdf-dir nil  "The file for the entry with key <key> is stored as <key>.pdf")
(defvar do.refs/notes-dir nil "The note for the item with key <key> is stored as <key>.org")
(defvar do.refs/pdf-download-dir nil "The path to directory we download PDF files.")

Some things I immediately dislike here. Firstly, rigid notation for naming PDF files. I like calling my files with full names. In general, keeping as much info as possible in file names is good. For example, I have PDF files on my drive called like 2023-09-01_Various-Authors_GNU-Maxima-manual-for-version-5.47.11_2023.pdf, and I like it this way. Moreover, I kind of like it where it is, I do not want to specifically put it somewhere.

  1. He uses autokeying.

And I do not like the default authoryear autokeying of bibtex.el, because it is too easy to forget what jackson2001 means. I want my keys to be full names with dates: various2023maximaManualForVersion5.47.11. Since I am going to use some automated machinery to cite those papers, long keys should not matter.

As a side-note, there should be a way to make autokeying better:

(setq bibtex-autokey-year-length 4)
(setq bibtex-autokey-titleword-separator "-")
(setq bibtex-autokey-name-year-separator "-")
(setq bibtex-autokey-year-title-separator "-")
(setq bibtex-autokey-titleword-length 16)
(setq bibtex-autokey-titlewords 8)
  1. But I like that I am, once again, seeing a repeated pattern: PDFs, notes, bib-database.

There is one more interesting bit there:

(defun do.refs/ebib-add-annotated (arg)
    "Advice for `ebib-import-file' that automatically creates a
  copy of the imported file that will be used for annotation."
    (interactive "P")
    (let ((filename (ebib-get-field-value "file"
                                          ebib--cur-db 'noerror 'unbraced)))
      (when filename
        (let* ((pdf-path (file-name-as-directory (car ebib-file-search-dirs)))
               (orig-path (concat pdf-path filename))
               (annot-path (concat pdf-path
                                   (file-name-sans-extension filename)
                                   (file-name-extension filename t))))
          (unless (file-writable-p annot-path)
            (error "[Ebib] [Dennis] Cannot write file %s" annot-path))
          (copy-file orig-path annot-path)))))

  ;; add the above after the original call is done.
  (unless (and (boundp 'do.refs/add-annotated) (not do.refs/add-annotated))
    (advice-add #'ebib-import-file :after #'do.refs/ebib-add-annotated)))

See! He also has a file which will be used for annotations only.

  1. The rest of his ebib configuration is fairly straightforward. Note that he is using the “note” field, but not the “annotation field”. (But maybe that is an old version of ebib?)
  2. He uses bibtex-completion. While bibtex-completion deserves its own chapter, I need to instantly write down something here:
(setq bibtex-completion-find-additional-pdfs t)

This snippet means that apart from key.pdf, bibtex-completion will also consider PDF files named key-*.pdf for completion.

  1. He also uses org-ref, and from his code I did not understand how!

And, in fact, he does not even explain much about hit org-ref use-cases at all. The one thing that is worth noting, however, is that he hooks org-ref to use ivy-bibtex to insert citations.

  1. He uses a self-written Emacs function to extract citations from a document and generate a one-shot bib file for publication.

This is where reftex comes into play. He really only uses reftex to extract all citations from a latex project sort-uniq them, and generate a bib file.

  1. Thoughts

    I am generally finding his setup fairly consistent. Maybe we can say that, hey, reftex is not good at doing auxiliary operations with papers (such as opening a PDF), so bibtex-completion is better.


This document has not been written linearly. In particular, this blog post is praising org-ref, which, at the time of writing this sentence, I have already reviewed. Nevertheless, as a Russian proverb is saying, “repetition is the mother of study”.

Let’s start:

(org-babel-load-file "")

Wow, okay, he is using org-based loading. I remember @wasamasa converting his dotEmacs to org, but I always doubted this approach. But okay.

There are a few more things mentioned in this article:

  1. Citation styles. What he is calling a “citation style”, is what I call “citation formatting” in the text of the article. That is, is the item cited by name, or by author, and is year included, or not. So, basically, irrelevant stuff, which, I guess, must have given him a lot of pain when submitting research articles.
  2. List of tables.
  3. List of figures.

What is not mentioned, although, I expected it to be, is formatting of the bibliography. That is, how exactly items are presented at the end of the paper.


An article, which exhibits technologically nothing new, same old bib:key, and note:key org-links, which open either a bib file, or a notes file. But what is valuable in this file, is his description of his workflow; it is written with exceptional clarity, and describes the knowledge acquisition process in great detail. Let me try to repeat it here by myself:

  1. He has a .bib file for each paper, downloaded from a bibliography database, or a journal website. (These files are concatenated together to make one large database file.)
  2. He has a “” file, a single file, where each heading is either a category, or a paper. Bodies of paper nodes contain, well, notes about those papers.
  3. He creates headings in that org file with reftex, using a clever citing format.
  4. Entries in the org file are tagged meaningfully.
  5. His files have the name of key.pdf, and he also changes keys in .bib files to key.
  6. He also heavily relies on reftex.


The question I see here is the following: How do you structure your “projects” when doing science? Papers “database” can consist of objects of different granularity, and “projects” also can be of varying granularity.

If a paper is read and understood as a part of a project, shall time invested be contributed to that project or to reading in general? Copying logbook entries in org-mode is annoying, even if possible.


The interesting property of this setup is that rather than keeping paper metadata in org properties, it is kept in a babel block.

Which makes it easy to tangle the bibliography into a single file, which can be included into latex.


In general, his setup is quite similar to the setup from Emacs Conference talk of 2022 .

  1. He does use Zotero as a reading-list management tool. First red flag :D, but anyway.
  2. He also used org-roam for a “personal wiki”.
  3. He seems to export the Zotero database into a single .bib file with “Better Bibtex”.
  4. His “innovation” seems to be using “deft” for searching notes in a directory.
  5. He is using org-noter, which, again, I am starting to like less and less.

Okay, his setup is consistent, but I keep getting annoyed with a few things, that, in my opinion, mar all of those setups.

  1. Using an sqlite database for backlinks is insane.
  2. Using a single directory for notes is insane.

At least he manages to conjugate notes and references, so in some sense his setup is almost the most consistent from among what I have seen.


This is a bibtex-completion (helm-bibtex / ivy-bibtex) based setup. Its prominent feature is the the use of citar as a citation-management tool.

Otherwise it is not too remarkable.


This blog has another interpretation of the “rtcite:” link.


One more suggestion of using reftex, with not much detail. The interesting bit is using latexmk for building the pdf, which is now a must-have for me too, but I already have it.


They guy has discovered org-ref, and is frustrated by it. How familiar.

What he does right, however, is mentioning that org-ref is for formatting citations. Of course, he only cares about LaTeX, so the HTML part is missing.


He has an excellent suggestion of recording all readings in an org file. So that each entry would be a boot, and the bodies would be his comments.

The problem is that I cannot process audiobooks too much, they are too quick for me. And for text books I usually need a much-much more detailed notes file.


Okay, this is interesting and has some “meaty” stuff.

The first meaty piece is this script:

It lets you extract annotations from a PDF file into org-mode. This script is, seemingly, not round-trip, but even one way is useful as a source of inspiration.

The more and more I am thinking that PDF metadata should be stored where it should not be lost, ideally. In the pdf file itself.


This sketch is very short, but it is nice to see that people are considering more or less the same options for citing and referencing that I do.


A fairly standard setup of org-ref and bibtex-completion, with an attraction point of making a dedicated list of books to read, along with page count in org columns.

Fun? Maybe, but not for me, I guess.

After all, I like structuring my life according to projects/tasks, of which books would be parts. Having a dedicated reading list sounds contrived. Also, what about web pages there?


This is a good intro to oc.el from a developer point of view.

I will definitely need to re-visit it when implementing my own bibliography system.


A classical link that is the father of this document.

This is the way I have been using citations in org for a long time, and, I think, this is what is implemented in ol-bibtex.

“bib” links are exported as proper citations, and the bibtex file needs to be carried around manually.

This is how my setup used to work before considering this review.


This StackOverflow question has a nice MWE for the new org citation machinery.


The most important quote from there “It took me several hours to get the system working”.



The “source of truth” here is, which is exported to bibliography.bib using ox-bibtex.

org-bibtex-extras lets one annotate the reference.

Citations are inserted with reftex, via org-reftex-citation, which even has some sort of “intelligent completion”.

Links should be done, presumably, with ol-bibtex.

I guess, this setup has a lot to learn from. In particular, it would be, maybe, nice to tweak it slightly in the following way:

Make a “” file for each review, done with “org-noter”. The top heading would be compliant with ox-bibtex, and would produce a tiny bibtex file. Those files would be joined together to make a bib bibliography, which could be used with, say, reftex, or bibtex-completion.

Possibly, the “new” citation machinery would be even able to jump to those “notes” by the means of the “activate” processor.

One extra feature from there worth learning is the ox-extras, which allows ignoring headlines but not children. This allows making fake headlines for, say, bibliography and list of figures.

Definitely worth re-visiting.


This is a giant, and also very interesting description of one’s life in org.

I am very impressed by his approach, thoroughness, and meticulousness.

However, I don’t think that his way of working really fits me, for several reasons.

  1. His task state machine is too complicated. I just won’t be able to track all those state changes.
  2. His capture machinery requires being able to capture all tasks that might occur, and I just don’t think I can implement that.
  3. His machinery does not seem to be able to integrate well with distributed PIM. I do need to do a lot of stuff on my phone, and sometimes even on the “cloud”.
  4. I am spending a lot of time on actually doing the planning. Like, I am trying to decide how to progress further in my life, and which steps would be more valuable, and which urgent. His way seems to expect a fairly standard way of progressing through tasks.
  5. I am often not in the state when I can do a task. Like, very often. If I may (very roughly) say that I have “capability X± δ X”, and tasks have “difficulty Y”, a lot of my tasks have difficulty “X+(δ X/2)”.

I am finding it extraordinary that he is actually billing his clients based on org clocking data. This is very impressive.

I don’t really believe that his bbdb stuff works. It is just not advanced enough.

His guide on abbrevs and skeletons is also something to learn from.

To sum up: I strongly recommend reading and learning from his example.

However, I did not learn much about either reading or referencing from his treatise.


The key thing in this setup is using elfeed to fetch a list of new papers from arxiv.

It seems tremendously useful, not having to deal with the arxiv interface, and such, however, I am not reading a lot of arxiv even though I do use some paper from there. I guess, it is really useful for people where a single paper can be read in a few hours, not weeks?

After the introduction of the arxiv fetching machinery, his setup essentially converges to a “bib-file with file: field”, and uses bibtex-completion to open those pdfs when needed.


An interesting, although a very bloated configuration.

Moreover, he even suggests using org-roam-bibtex, which is a king of all of them emacs org-related packages, and tries to integrate so many of them that my head is spinning.

What is interesting is that his “ground truth” seems to be coming from org-roam, which is not what most people do.

What makes this article stand out is the introduction of the citar package, which is one of the newer oc.el citation processors, and can probably be used instead of helm-bibtex.

He also uses Zotero, which I, again, probably need to study.

He mentions embark and marginalia, which are probably worth looking at, even though, maybe, not for referencing or researching directly.

Again, the most important contribution of this essay is probably the availability of external oc.el processor citar.


Okay, this is a slight variation on the subject of making notes.

Still requires a fixed directory, but now at least it does not require sqlite.


A nice intro into oc.el. Recommends using Zotero for actual bibliography management, with export to biblatex with the “Better BibTeX” plugin.


Okay, I clearly would never use Kitchin’s setup verbatim, but there is no reason not to learn from him.

  1. Snippets. So far very few examples from here mentioned snippets, perhaps only one, which used abbrevs. I also do not see why the author uses yasnippet instead of org-tempo or abbrevs.
  2. Many custom keybindings. I never got used to them, so whatever.
  3. Hydras. A pop-up interface for hotkeys. As usual, hydra is competing with, say, transient. But I really think that such a functionality must be built into Emacs, not provided externally.
  4. Spell checking. He just has some tweaks to hunspell, which is better than nothing, but, as usual, only works for English, and only for a single language in a document. Also only works for single words (because English!). So this is important, but will not work in a straightforward way.
  5. He is keeping his database in a bib file. He adds records automatically, based on DOI, using a custom function, which, I suspect will not work in my case. But maybe that is what zotra should do.
  6. He is also using Elfeed to subscribe to feeds. Noteworthy. A bit like:
  7. He has an “org-db”, a, seemingly, full-text search for org-mode files. Interesting, if he is so keen on bloated stuff, why not just org-roam? But, on the positive side, it’s good that that org-db of his does not depend on a single directory for all notes.

1.2. Literature review [0/7]

  1. Bibus (a mysql-based bibliography manager)
  2. Org-Mode Official Manual
  3. Ebib Official Manual
  4. Fireforg
  5. reftex Reftex Manual
  6. reftex-cite
  7. ox-bibtex
  8. ol-bibtex, which used to be called org-bibtex
  9. helm-bibtex/ivy-bibtex/bibtex-completion
  10. org-inlinetask
  11. org-ebib
  12. org-ref
  13. org-bibtex-extra
  14. ebib
  15. citar
  16. amsreftex
  17. org-transclusion (ELPA)
  18. org-roam
  19. org-sidebar
  20. zotero
    1. better-bibtex
    2. zotfile
  21. Qiqqa
  22. Mendeley
  23. JabRef
  24. org-roam-bibtex
  25. org-pdf-scrapper
  26. refdb-mode
  27. citeproc.el
  28. citeproc-org.el
  29. org-bib-mode
  30. orca (not the screen reader)
  31. zathura
  32. zotra
  35. denote ::
  36. bibtex-actions
  37. org-ref-cite

1.2.1. Bibus

I am not convinced of the benefits of using MySQL for bibliography management.

1.2.2. refdb-mode

RefDB is a standard used by libraries to exchange bibliographic data. After brief skimming, and mentioning that the most recent version is from 2008, I suspect that unless you are really running a library, it is not worth using.

1.2.3. Pure Bibtex / Biblatex

Okay, now we are getting somewhere.

Pure biblatex files are weirdly formatted database files with entries for “papers”, which are called “entries”. I am tempted to call them “entities”, because why would not I add there any kind of vaguely related stuff, such as theorems?

Let us see some examples:

  1. Boring fields
            author = {Michael Metcalf and John Reid and Malcolm Cohen},
            title = {Modern Fortran Explained},
            year = 2018,
            month = 10,
            doi = {10.1093/oso/9780198811893.001.0001},
            url = {},
            isbn = 9780198811893,
            journal = {Oxford Scholarship Online},
            publisher = {Oxford University Press}

    Most stuff here is fairly straightforward, except using braces to delimit phrases with spaces. This is a special property of Biblatex (as opposed to Bibtex). But never ever use old bibtex, it is just outdated.

    Now let us make a fancier example.

  2. Interactive fields
          author = {},
          title = "Test Article",
          year = 1000,
          DOI = "1.1/5.86",
          file = "Full Text:testauthor.pdf:PDF",
          URL = "",
          crossref = "DBLP:conf/testconf/1000",
          timestamp = "Tue, 06 Nov 1000 16:59:25 +0000",
          biburl = "",
          bibsource = "dblp computer science bibliography,
          xdata = {},
          note = {},
          annotation = {},
          abstract = {},
          keywords = {}

    This example is more interesting, because it has some interactive fields.

    So, the file fields is fairly easy, it is just the path to the PDF of the article. What are note, annotation, abstract, xdata, keywords?

    1. keywords

      Okay, keywords should be used for tags, I guess? Where do I get those tags? Surely they can’t be coming from bibsources?

    2. xdata

      Is not xdata a link to the external piece of data?

    3. note, annotation, and abstract

      ebib people believe that annotation is a long-ish text, basically what I consider to be “reverse-engineering”, however, putting one into a biblatex field sounds insane. Maybe it should be a path to an annotation file?

      And what is note? is claiming that note is used for “various remarks”.

    4. external note

      External note is a pseudo-header created by ebib, to keep a full-fledged file with notes. This seems important, because I want to reverse-engineer poorly-written PDFs into something readable, so this “external note” is where potentially org-noter could go.

  3. @String and @Preamble

    @String is a special syntax for abbreviations, used like @string{DEK = {Donald E. Knuth}}.

    @Preamble is a special syntax that is prepended to the bibliography when used with latex, and might not be too useful for our purposes. Example: @preamble { "Maintained by " # maintainer }.

  4. Extensions to Emacs support for biblatex.

    There are quite a few. Too many to fit into the margins. However, aside from those which clean up files, it is too early to think about them, until a working pipeline is established.

  5. Using Biblatex as ground truth

    The question is: “do you want to use biblatex as ground truth for your readings?”.

    Ebib seems to imply that. Use a bib file as a repository of all the papers and books you ever encounter. Convert them to PDF, and annotate with org-noter.

    How would you attach PDF files? It is certainly possible to write paths to them in ebib, but renaming them would make those paths invalid.

    How would you mark read/unread files? Why do I have to keep all external notes in a single directory?

    All of that does not sound to promising.

    Anyway, let us go on.

1.2.4. org-inlinetodos

Inlinetodos are added with C-c C-x t, and are really nice. It is good that I watched that video.

1.2.5. reftex

Reftex is built into Emacs, and is a mode for managing references, citations, labels, and index in LaTeX. Let us see if it can be abused for helping us do research in org-mode.

Seemingly, it assumes that you are editing a LaTeX project, and it can index files in that project, look for labels, and offer them for auto-completion, as well as lookup citations in the bib file.

In order for that to work, we need to agree on a certain pattern on using labels, and on what goes into the bib file.

As a side-note, the previous paper I did in org-mode didn’t have any labels.

C-c ( creates a label. C-c ) lets you choose a label, and inserts it. C-c [ inserts a citation with a key from a bib file.

I believe, there are some customisation variables in Emacs, which would let us insert org targets, links, and, for example, cite:bla links, although I am not sure it is the best way to do it.

Reftex has features for quick navigation within the project, C-c =, which in org can be partially substituted by (1) imenu, (2) collapsing org trees with TAB, (3) grep and isearch.

Reftex has support for index. Use C-c < to index an entry.

I think that indexing is quite underused in 2023. I have never seen scientific papers include an index, and I seldom use index in a PDF book, if I can search instead. However. I am thinking that an index, or, rather, a glossary, is that tool of data organisation that can be best used to establish cross-document relationships. As an example, let us assume someone is studying a certain subject, or, more realistically, tries to break through a difficult paper, and is trying to build his own way to have a solid foundation needed to understand a paper. One would necessarily read several books on the same subject, as well as subjects building on top of previous subjects, and establish inferential links.

A book has a natural representation of a graph with nodes-headings, paragraphs, and theorems. However, wandering over many not isomorphic graphs is stressing for the brain. Marking certain places as “interesting” would help to establish relationships between the graphs, and a certain “concept” node could server as a point of link concentration for them. This is why I am saying about a glossary (with bodies), not just an index.

  1. Jump to definition.

    Okay, in a LaTeX document, there might be more than one interpretation of what is “jump to definition” that programmers are so used to.

    Firstly I will mention one that is not that frequently thought of by programmers: jumping to a word definition in a dictionary, or a translation in a foreign language dictionary. So far I do not know whether such a feature exists in Emacs, let alone reftex.

    Reftex can show references of labels in the document itself, using C-c &.

    I wonder, why does not it just directly plug into Emacs xref framework? The one called with C-. and C-S-.?

  2. Cooperating with Org

    reftex has a variable called reftex-default-bibliography, which should be pointing to a bibliography (say, also pointed to by ebib?), from which you might use citations in non-LaTeX buffers. It is probably required to customise reftex’s reftex-cite-format, or something like that. I actually did that for an “older” setup.

    Can I use it in the “newer” setup with labels too?

1.2.6. ol-bibtex / org-bibtex

So, the idea of ol-bibtex is that you keep a reading list, together with bibtex metadata, in an org file.

The nice thing here is that you can

  1. use org’s search machinery to filter headings by properties and tags,
  2. group books/articles by subtrees, say:
* Differential Equations
** Paper 1 :interesting:
** Paper 2 :boring:
* Measure Theory
** Paper 3 :interesting:
** Paper 4 :boring:

Not sure how org-bibtex machinery will work with tree-like headings. org-bibtex does not seem to export org tags as a “keywords” field in the resulting org file. (There seems to be a parameter for that org-bibtex-tags-are-keywords)

Bodies are also not exported neither as a “note” field, nor as an “annotation” field.

In general, I am doubtful that having all annotations for all papers in an org file is practical.

On the other hand, org-noter seems to integrate with this approach not too badly, as it expects a heading to keep all annotations for a file.

  1. TODO Understand org-bibtex-key-property
  2. END

    There is some magic for “global links”, but I am not sure how it works.

    org-bibtex also allows capturing links to bib files, with nice-ish navigation. Schulte et al. 2012

    Again, I am not sure how I would use this, since, it seems, org-bibtex’s approach is to keep everything in org files, and only export things to bib when necessary. But, I guess, this can be useful when you are reading someone’s paper, say, from Arxiv, in the tex source, and it includes a bib file.

    There is a function (org-bibtex-search) that I have not yet fully understood, which is searching for “bibliographical entries” in agenda files. I guess, again, the workflow that this package is suggesting is to use an org file for each new project list the required reading in those files, add those files to agenda file list, and then, when there is a dedicated “reading time”, search for “bibliographic entries” in the agenda?

1.2.7. TODO org-bibtex-extras.el

Okay, I still have not understood what this package does. It should somehow improve org exports to html, but I am still not sure how exactly.

1.2.8. ox-bibtex

This is a very straightforward package. It defines org links in the format of cite:key, which are later formatted either as html links, or as latex citations. You can also add the #+bibliography: plain t code piece, which will either use latex machinery to format bibliography, or use bibtex2html.

I have to stress, it expects a bibtex-file bibliography, not an org-file. Perhaps, when if we want to keep all our “ground truth” as org-files, we can add a hook to convert those files into bib files with org-bibtex.

Also, maybe, even a three-stage process might be useful: keep a bibliography in a file, use org-transclude to only include the required bibliographic entries into an org file, then export this file into a bib file and then org-export will be able to build both an html, and a pdf from it.

1.2.9. Org-Mode’s own citation machinery, oc.el

This section comes from org#Citation handling. This machinery in Org is relatively new, circa 2021.

So, firstly, we are getting one more NIH way of adding citations. That is, instead of relying on standard Emacs’ reftex, org developers introduced yet another key combo to insert citations, C-c C-x @. Why exactly is beyond comprehension, since reftex was written by Carsten Dominik, the guy who wrote the actual org-mode. It surely would have been much easier to update reftex, rather than write one more extra system doing the same thing.

It is looking at a file pointed to by #+bibliography: File.bib, which is at least a little consistent with the older ways.

Citations look like [cite:@key] links, and why exactly they need the @ symbol, I do not know, but in this case it works as expected, as we can keep our older ox-bibtex’s cite:key links without opening function conflicts. C-c C-o works, and opens an entry in File.bib.

oc.el supports some kind of “citation styles”, which are, I guess, useful to some people.

Exporting citations is handled by the keyword #+cite_export: basic author author-year.

Using the csl processor makes the exporter format everything manually, both in html, which is, I guess, okay, or, at least, I have not found it to be much worse than bibtex2html, but in LaTeX it looks super weird.

Writing #+cite_export: biblatex makes more sense, at least it is writing out \addbibresource{File.bib} and replacing citations with \autocite{key}. Exports to html, however, are not supported, and the exporter just prints LaTeX commands in place of proper references. I guess, it might not be too hard to patch it to do basically what ox-bibtex did in the past, exporting via bibtex2html.

Note (!) export to html, conflicts with ox-bibtex. If your config still includes ox-bibtex, you might want to remove it, or somehow tweak and debug.

So, in general, I have mixed feelings about all this new citations machinery in org. It is good that there is now a system with pluggable backends, but so far, documentation is lacking, and making it do what you expect, is not straightforward.

In particular, I wanted to write a disappointed comment here, but looking in the source code for oc.el, found the variable: org-cite-export-processors. This variable does not do what your expect it to do from the name. You might actually want to set it to something like this:

(setf org-cite-export-processors  ((beamer natbib)
                                (latex biblatex)
                                (t csl)))

Which means that beamer will process beamer files using pure TeX, latex-pdf will use biblatex, and html and all the seldom used formats will use csl. Looks almost as good as the “old” approach with ox-bibtex and bibtex2html.

A backend for org-bibtex, seemingly, still needs to be written. I can understand why they rejected reftex. After all, reftex is bibtex-only. What I do not understand is why they introduced a new kind of link, inserted with a separate keybinding, C-c C-x @, rather than just reusing org-link machinery. For example, there could be a cite@: link, and, depending on the value of #+bibliography:, C-c C-l would insert a citation from that database, using normal tab completion.

But anyway, it seems that with the new version, it should be possible to rewrite the old citation mechanism using just oc.el. This would make bibtex2html not needed, as well as ox-bibtex.

1.2.10. bibtex-completion

~bibtex-completion∼ lives here:

Org-bibtex users can also specify org-mode bibliography files, in which case it will be assumed that a BibTeX file exists with the same name and extension bib instead of org. If the bib file has a different name, use a cons cell ("" . "bibfile.bib") instead:

Really? org-bibtex? Not ol-bibtex? I asked a question

Apparently, bibtex-completion is neither bibtex, nor completion. It is an abuse of the completion framework to be used as a GUI to the bibliographic database.

  1. Basic usage
    1. Set bibtex-completion-bibliography
    2. Run M-x ivy-bibtex RET to enter the ivy-bibtex UI
    3. Type some keywords to narrow papers.
    4. Type M-o to see possible actions.

    The actions are kind of like:

    1. insert citation
    2. open pdf
    3. open bibtex file
    4. open note (create if needed)
    5. browse URL
    6. query online resources for bibliographic data

    What makes this citation tool better than plain org’s C-c C-x @ is that you can filter bibtex results not just by key, but also by other attributes.

  2. My thoughts about it.

    Okay, I have to admit that on the first glance this tool looked extremely like “I did it for myself, get lost”. It still does, but now I do understand a little more about the logic behind it.

    What I do not like about the usage practice behind this package is that it is still indexing papers by key. I don’t want to remember keys. It also does prescribe a specific directory structure for bibliography and notes, which is also annoying.

    However, there are some things I do like.

    So, for him, there is a “global database” of papers. I think you can call it “units of thought”. And he wants them indexed one way or another.

    So, his workflow is like “Aha, I remember there was some paper which did something like…”, and he wants to search for it using a shortcut. (In this case, ivy-bibtex.)

    I wonder why he doesn’t like the reftex citation machinery.

    Okay, this does make a bit of sense.

    I wonder if it is possible to use this machinery to index not just scientific papers, but, say memes (for internet arguments)?

    Again, the problem here is that you need to “remember-mode” a paper manually. What if I want to keep papers structured by projects? And in general, what if I do not want to keep all papers in one directory?

    Moreover, a “notes” thing-y is more likely to be a directory, not a file. After all, you might want to attach additional data, and even compile some projects “affiliated” with a paper.

    But anyway, it clearly seems that the person doing this project knows what he wants, and is implementing it.

    I am still hesitant to say that ivy is really necessary here. I suspect that IDO and reftex could maybe just as well work, but so far, so good.

1.2.11. org-ref

Okay, org-ref is, on the first glance, hugely over-engineered, but since it is omnipresent in so many org setups, I need to study it too.

  1. The intro is here:
  2. The manual is there:

From the intro it basically follows that org-ref was designed to facilitate citing and cross-referencing in Org-Mode. The intro itself now even says the citing part is, seemingly, outdated, as org-mode has a new citation mechanism.

What about the cross-references (intra-document links)?

Firstly, let us remember that org-mode has “targets”, that is pieces of text in double-angular brackets. <<target>>. That “target” will be interpreted as a “label” in LaTeX. The links to those targets can be created like target. Here is the link: 1.2.11.

Let us see how org-ref improves on it.

  1. Looking into it.

    Firstly, let us note that the authors of org-ref, and bibtex-completion have, seemingly, found each other eventually. Now org-ref uses bibtex-completion-bibliography to find the bibtex database.

    So, typing org-ref-insert-link by default fails with org-ref-insert-link: Symbol’s function definition is void: nil. Well, the quality of this package is garbage. (Not that it is unexpected.)

    You can run (org-ref-insert-link-hydra/body), and it looks nice and consistent, but every operation there results in Symbol’s function definition is void: nil.

    Yeah, I remember that something like that stopped me from studying org-ref last time I tried, about 4 years ago.

    Okay, maybe we need to load (require 'org-ref-ivy)? It is mentioned in the manual, as “optional”, but maybe it is not actually optional?

    Well, now org-ref-insert-link does not throw an exception, and it can open the ivy window, but despite bibtex-completion-bibliography pointing to a correct bib file, it is saying 0 org-ref-ivy BibTeX entries:. Moreover, if you press TAB twice, it throws an exception: assoc: Wrong type argument: listp, "".

    Also, note that you can run org-ref-bibtex-hydra/body, which is not org-ref-insert-link-hydra/body even in non-bibtex buffers, where it fails spectacularly.

    Also, the default suggested keybindings conflict with the org’s default ones, which were present in it since forever. Namely, org-ref suggests overloading C-c ], which is an agenda-related command.

    After running (require 'org-ref) once again, I did manage to make that Hydra do a few non-trivial insertions.


    citing, I guess, is useful any more, but cross-references might be useful?

    How are they different from just overriding rendering for org-links?

    NOTE: org-ref changes the way org-store-link behaves. If you use it on an org <<target>>, org-ref forces it’s own version of storing links.

    Those org-ref links are obviously LaTeX-inspired. I guess they are mostly useful at exporting time?

  2. Exports in org-ref

    I tried the following code:

    * Body
    Hello <<target>>
    Hello2 label:target2
    *** Test

    and it generates the following latex:

    Hello \label{org6c95ed7}
    Hello2 \label{target2}
    <div id="outline-container-org69ffdbd" class="outline-2">
    <h2 id="org69ffdbd"><span class="section-number-2">1.</span> Body</h2>
    <div class="outline-text-2" id="text-1">
    Hello <a id="org7a92615"></a>
    Hello2 <a href="target2">target2</a>
    <div id="outline-container-org59d8e71" class="outline-4">
    <h4 id="org59d8e71"><span class="section-number-4">1.0.1.</span> Test</h4>
    <div class="outline-text-4" id="text-1-0-1">
    <a href="#org7a92615">1</a>
    <a href="target">target</a>
    <a href="target">target</a>
    <a href="target2">target2</a>
    <a href="target2">target2</a>

    We can see that the HTML export is not worth even mentioning, it just makes no sense. The LaTeX export does make a bit of sense, except it makes you create special targets, which are links by themselves.

    On the other hand, we can see that the default org export makes a pretty decent job for this simple task. It does not give those cross-references a fancy presentation, but at least the links work.

  3. Reading org-ref’s manual.

    The manual is here:

    Let us start:

    Initially I thought this would make at least the citation part of org-ref obsolete. I no longer think that is the case though.

    Not very promising.

    One of the goals of org-ref is to provide complete coverage of natbib/biblatex citation commands, with syntax that is close to what you would write in LaTeX, and that is close to what you would read in the LaTeX documentation.

    Hm… The goal is noble, for sure, but HTML is not even mentioned, which looks suspicious.

    An early criticism of org-ref was its limited capability to support prenote/postnote syntax, especially for multiple citations.

    Wow, okay. Makes zero sense to me, but I guess someone might need it.

    If you have no need for cross-references (either you don’t use them, or vanilla org syntax is adequate), or if you don’t like using ivy, and don’t want to roll your own citation inserter, then you may not need org-ref.

    Hmm… very promising.

    Note: You may need to set org-latex-prefer-user-labels to t if you refer to things by their “name” for the export to use the name you create.

    This is actually great. I didn’t know about org-latex-prefer-user-labels or org-html-prefer-user-labels.

    The command org-ref does a lot for you automatically. It will check the buffer for errors, e.g. multiply-defined labels, bad citations or ref links, and provide easy access to a few commands through a side-window buffer.

    Okay, this command throws an exception for me, so I just have to believe.

    Here the manual stops, and he started describing various things he has written “just for himself”, which do make a bit of sense, but in my opinion have no reason to be in a “referencing package” whatsoever.

  4. Some tools worthy of scavenging.

    I suspect that I would really have to scavenge this package for some “useful code”, but leave most of it unused.

    1. refproc

      org-ref also supports exporting cross-references to other formats using ./org-ref-refproc.el. This library also works by pre-processing a copy of the buffer to convert org-ref cross-reference links to org-syntax before exporting to the target backend. This even support cleveref style links with automatic prefixing and sorting. Compression of the references is not yet supported.

      I do like “cleveref”, so it might, indeed, be the piece of code to use.

      Use it by:

      (setf org-export-before-parsing-hook '(org-ref-refproc))

      I think it might be the only part of this package I will end up using.

    2. doi-utils

      Loaded by org-ref by default, although I guess there is no need in that. It is a wrapper for the DOI json api. It might be useful, I guess, for “remembering papers”.

      In fact, this, I guess, is my first contact with the enormous world of “bibinfo” retrieving packages for Emacs. I have not even mentioned bibretrieve, although I did try it in the past, but it is just one of the giant number of packages. On Github I saw a user who ported Zotero’s import-export backends to Emacs. Maybe it is the ultimate solution.

  5. Summary

    Org-ref is not, and never will be usable.

    Despite its name and stated goal, it not an “org referencing” package, it is a bunch of helper functions that the package author has written for his own convenience.

    Using and loading it makes no sense. What might make sense though is to scavenge that package for some useful functions, probably not even by loading org-ref, but by copying the relevant code, because org-ref is an invasive package.

    It did let me think about proper cross-referencing though, and also let me think that I probably do need some export pre-processing for “cleveref”-like behaviour.

1.2.12. ebib

I will try to avoid copying all of the Ebib’s manual. In this subtree I will try to outline the main scenarios for it.

So, ebib supports adding citations to org, markdown, and latex, from within its own database.

This is a big drawback. Ebib has to be running in background.

It supports importing bib data using a package called “ibiblio”, and, I guess, if you have some other backend to fetch biblatex sources, into the bib file itself, it can deal with that too.

It has all the papers in the same giant list, with no categories, but it does support tags and keywords (Of which I do not remember the difference. I guess, tags are non-thematic, and keywords are thematic?)

It also supports one reading list, so I can imagine a workflow of being “a little bit” strict to oneself, and adding all reading material to the bibtex database, and interesting items to the reading list.

But how would I make org-exports work painlessly? Well, I guess, I can point org to the same bibtex file?

If I were not violently cross-referencing stuff I am reading, and if I were not violently reverse-engineering files of books, I could be able to force myself to do this kind of rigour.

1.2.13. org-bib-mode

A variation on the theme of “org file is ground-truth, and I export to .bib what I need”.

The interesting thing here is using pdf drag-and-drop, as well as a “map” for the org file, “org-imenu”.

Since there is at least another option for mapping org files, org-sidebar, I am confused why this is even a thing.

1.2.14. org-ebib

Small, simple, straightforward. Do (org-link-set-parameters "cite" :follow 'org-ebib-open) to make cite: links open in ebib. Nice if you use ebib.

1.2.15. TODO evince

1.2.17. TODO denote

Denote is a package whose main purpose is, seemingly, to keep backlinks to org files without the use of external databases.

  1. TODO makes some test how it works.
  2. END

1.2.18. amsreftex

Amsreftex is a way to keep a bibliography database right in LaTeX, with no bibtex and similar stuff involved.

Great news, but late for about 40 years.

Moreover, we want a reading list in org, or a vendor-neutral database, not a tex-specific thing.

On the other hand, amsrefs maybe still should be considered as a “ground truth” for the bibliography database.

Not sure it is good.

1.2.19. org-transclusion

Okay, org-transclusion

So, the idea is very noble, you should be able to include different pieces of org files into other org files.

Imagine the following:

You have a book on, say, a university course of Probability Theory. And you are reading a book on post-grad Probability Theory.

You might want to reverse-engineer the first book, and wherever you see a reference of, like “see Bla page 1 theorem Y”, you could write a transclusion command and obtain that theorem in your own text.

What might be even more fun is writing custom bibliographies, while transcluding entries from a “big database”.

So, suppose your setup is the following: you have many small org files with citations for various projects… and you org-transclude them into a single org file for a “large database”?

And then you use citar to import data from that “large” database.

Then each “project” will have “own” papers, that is, crushed during its progress, and “alien” papers, crushed during making other projects.

But you still need periodic upkeep for this “transcluded” database.

1.2.20. org-sidebar

This package ended up here not so much because it is relevant to referencing, but because it is relevant to general task management.

If in your setup one org file corresponds to one project, and if it is possible to keep track of this project by just working with this file, and if you have upcoming events, and similar things, associated with this project, it might be worth having a look at it.

It provides a “sidebar” of two buffers, with tasks and scheduled events, relevant to current org buffer.

I suspect that using “speedbar” would make more sense?

1.2.21. org-roam


org-roam certainly seems to have been done by people who do research.

It is basically a personal wiki with links and backlinks. p It does rely heavily on the ID property of the headings, to create an sql index, and backreferences. It is also incompatible with a lot of standard org’s machinery.

On the positive side, it does seem to do what I have always wanted to do: generate a map of links.

When I started deciphering mathematical papers, I really wanted to have a system to link different concepts together. The problem is (1) when I am reverse-engineering a paper, I am most likely not doing it in the org-roam directory, (2) I seldom invent my own concepts, and when I do, they are uploaded to my website, and hence are also not getting into the org-roam directory.

On the other hand, it seems that org-roam can really be used as a backend for the citation machinery of org. I already mentioned the following “ground truth” backends:

  1. A single bibtex file.
  2. Many bibtex files in a directory.
  3. Many scattered bibtex files over many directories.
  4. Single org-file (exportable into bibtex with ox-bibtex)
  5. Many org-files, also exportable, in one directory.
  6. Many scattered org files.

The option number 5, seemingly, can be extended to have all those files in an org-roam directory, and, I guess, bodies can have annotations for the papers, created with org-noter.

1.2.22. TODO otter

1.4. Use-Cases and Use-Scenarios

1.4.1. Mental Discipline

“Doctor, when I do it like this, it hurts!” “Then don’t do it. Next!”

1.4.2. A piece of knowledge (document), and its branches

After reviewing 17 out of 50 documents, I started to get some thoughts about what a research system should do.

I am tempted to say that a basic unit of the system is an “article”. An article can have several forms:

  1. PDF
  2. HTML
  3. TeX
  4. Semantic markup, such as org or markdown, which is the most useful option.

The interesting thing is that these kinds of content can be transformed one into another. Semantic markup can be freely compiled into PDF or HTML, and PDF can be converted into Semantic markup using OCR ( does amazing things).

We need a system which can track at least those three kinds of content.

But that is not enough.

In addition to “different presentations”, a piece of knowledge has metadata. Supposedly, we can deal with biblatex’s fields to “describe” a document exhaustively.

Among the metadata, at least “annotation” is a very useful field which need to be written for each processed paper. You might call it “lightly embedded” into the brain, because often the depth needed to write a decent annotation is still not profound enough to understand all of the paper. You can speculate that an “annotation” is what “Mathematical Reviews” or “ZbMATH” are doing. I guess, if a written annotation is not to be openly published, you can just type in a URL into the annotation field.

  1. Most papers are incomprehensible and what to do with it

    One important thing to note about modern science is that most of the papers are written either to pass irrelevant review tests, or at writers own pleasure with no quality control. Therefore we can safely assume that most of the papers are garbage.

    This characterisation is not to denigrate the work that has been invested into them, as the authors are playing by extant rules. But this means that almost no paper is ready to be consumed as a good software library, with a well-defined interface and layered design.

    Reading papers is essentially like reverse-engineering binaries. Those were written for the machine, not for you. And therefore we need to use tools that are frequently seen in binary analysis and bytecode debugging.

    1. Instrumentation

      Admittedly, papers are slightly better than binary, they are, after all, written in a human language, so the decompilation part can be skipped. But we need a thing that in dynamic languages is called “instrumentation”.

      In fact, there is nothing new in instrumentation applied to texts. It is called “interlineation”, and consists of inserting text in-between the lines of the text that is being studied. When most of the studies were humanities, especially theology-related, this seems perfectly natural for people, but for some reason nowadays people seem to have largely forgotten this approach.

    2. How to do interlineation if source is available?

      That is already a big question? Even if we have LaTeX source, this is not a trivial task. While having LaTeX source lets us edit the document at will, we cannot:

      1. Throw away the old document (as it may have links to it).
      2. Blindly write text in between the sentences of the original document, as it might break indexing and page navigation, and if not made somehow visually different from the old text, might confuse the reader.

      As a quick-and-dirty approach, I have just defined an environment in LaTeX, which is displaying its internal text in grey.


      This is not a very good approach though, as it is not cleanly working with LaTeX’s paragraphs.

      A better approach is to enumerate all thoughts in a document, giving each though a separate numbered clause, and writing the explanation for it in the clause body below the clause text. (Yes, I know, this is a little messy, to distinguish a “clause body” and a “clause text”, but I have no better wording.) See some thoughts on this subject here:

      In this write-up I do not want to spend a lot of effort on describing how to transform a bad paper first into an interlineated papper, and later into a good paper. For this I have a separate article, that is not yet finished: How to write papers in LaTeX.

    3. How to do interlineation if source is not available?

      That is an even bigger of an issue, is it not?

      I am giving the following pairwise incomparable options:

      1. Reverse-engineer your paper with mathpix or other OCR.
      2. Use org-noter to attach annotations to certain pieces of the document.
        1. and leave it as-is
        2. and burn-in the notes as PDF sticky notes or highlighted text
        3. and burn-in the notes as actual interlineary text into the pages, increasing page sizes to be greater than A4
      3. Reverse engineer the paper into a set of image tiles, possibly on the basis of intensity analysis of the lines of text, and typeset your annotations between the tiles, thus keeping the A4 size, but potentially losing some of the page navigation.

      So far, solutions implementing options 2.2, 2.3, and 3 are unknown to me. Options 1 and 2.1 are incomparable, because they require an incomparable amount of work. Option 1 is far more flexible, but option 2 allows you to start annotating right away.

    4. Annotating HTML

      This is an interesting use-case. I have not seen papers written in HTML originally, with an exception of SRFI documents of the Scheme Community Process. HTML opens a lot of opportunities for annotation which are better than those of TeX, such as text expandable on click (which is much better than text-on-hover, or text-on-sticky-notes). Still, there probably will be a need for at least three versions of the paper: original, annotated, and improved.

    5. Why instrumentation is not a good answer

      For the same reason Richard Stallman started the Free Software movement.

      Wasting time on reverse-engineering computer games and device drivers, even though it is also stupid, at least has some motivation behind it, after all, computers run binary.

      There is no reason why articles, especially those which are published as TeX on Arxiv, or those which are published at author’s expense along the OpenAccess model, should be set in stone once a “release” is done.

      Articles should follow the software development model, with pull-requests, patch review, automatic testing for consistency, and a set of guidelines on what is an API/ABI breakage and versioning.

      Moreover, retracting a paper should not merely be a stamp of disapproval, but a peer-reviewed patch, which highlights exactly the place where there is a flaw, with the typology of the flaw indicated, so that automatic search for similarly-flawed articles can be conducted.

  2. Summing up this section.

    When making a database of “pieces of knowledge”, we need an entry to have at least the following fields or field groups:

    1. Bibliographic metadata
    2. Original PDF/HTML
    3. Original TeX (empty unless Arxiv)
    4. Reverse-engineered TeX
    5. Annotated TeX/HTML
    6. org-noter notes
    7. Annotated PDF (with burned-in notes)
    8. Improved PDF/HTML
    9. Semantic Version (org, or sTeX)

1.4.3. Linking pieces of knowledge

If we have a database of “articles”, a database of pieces of knowledge, we, quite naturally, might want to interlink them.

This would be mimicking the Web, or human (or, rather, artificial) brain, or some other semantic network.

  1. Dependent articles

    Sometimes articles are released as “version 2.0”, and books quite often get a “Second Edition”. Another example of a dependent article is a solution book for a problem book, or a conference presentation for a paper.

  2. Bibliographic references

    That is what bibtex was originally for. If you have tex sources for many article, with bib files included, you can draw a network of citations. I am not sure how exactly you would do that for articles for which bib files are not available, which is the case for most articles other than Arxiv ones, so usefulness of this feature is dubious.

    What I do want, however, is to be able to cite articles from the database using a hotkey, similar to reftex, and assemble a bib file for later upload to Arxiv.

  3. Reading lists

    Reading list quite naturally go hand in hand with the concept of a “project”. What is a “project”? It is hard to define a project precisely, but for theorists and for humanities scholars, a “project” is most likely to include a set of books or articles to read, and a set of claims to prove or discursively argue for or against. (For experimental disciplines things are more involved.)

    From the paragraph above, it is already quite visible that Org-mode is quite naturally mapping the concept of a project.

    When you have a project, say, you want to prove a certain theorem in Engineering Communications Theory (imaginary field), you might want to grind through a set of articles studying this field, which are usually on Arxiv, so you can annotate them in-place, and more importantly, place indexing markers in some interesting places.

    Very often you will not be able to understand some theorems from a paper without background reading, so very soon you will, quite naturally, arrive to a graph of concepts. (I am not sure whether it can be called a “knowledge graph”, as I have seen that term used to describe a specific thing.) A theorem from a paper would require some (linked) reading to be understood. That “linked reading” would be in some other paper or book. If that book is not available as source, linking is likely to be done to the annotation file, or an annotated pdf.

    So, a “project” will be a “concept graph”, which will be referring to the concepts of the underlying papers/books somehow. Making this graph is, seemingly, much easier than making a bibliographic citation graph, because, even if you have zero metadata about the paper or book you are reading, you are very likely to read through at least the table of contents, and re-coding the table of contents into a file is negligible in time, compared to the time needed to understand the concepts themselves.

    Aha! I have mentioned something without explicitly saying. A Table of Contents is one of the most natural ways of breaking a paper into a skeleton, similar to org-mode’s outline. See the next paragraph.

  4. Concept maps

    So, I have mentioned a few ways for grinding through scientific material, which eventually should lead to the creation of a new piece of knowledge.

    A “project” is a set of articles to read, and a set of concepts to define. Ideas for new concepts arise from consumed article, and the need to read more articles arises from the need to understand concepts, from reading an “incoming” list, and from citations by other articles.

    When we want to visualise what is going on, we will quite naturally see three kinds of links between “Pieces of Knowledge”. (I am abusing notation here. From now on, a “Piece of Knowledge” is not just an article, it may be any piece of text that deserves independent study, for example, a chapter, or a section.)

    These three kinds of links are:

    1. Constituent links: a chapter is linked to its sections. Unidirectional.
    2. Soft links: a theorem requires some background knowledge to be understood. “To understand this statement, I needed to read that place in that book”. Might be bi-directional, for example, if a theorem is described in two places, and understanding it might require reading both explanations. (See Scheme’s letrec.)
    3. Indexing links: Two “Pieces of Knowledge” are describing the same concept, but I did not actually need to read one to understand the other.

    How exactly a “Concept Map” would map onto a “Ready-made article” is a debatable subject. In some sense, its value is that of the debugging symbols for a binary program. It should greatly improve understanding, but most probably will not happen to be the skeleton of the final paper.

1.5. References

  1. Ludwig Wittgenstein, Tractatus