Reading and referencing in Emacs Org-Mode.
This document starts as a Yet Another HOWTO on keeping your references in Emacs and Org-Mode, but I have a feeling that it might grow into something bigger.
Referencing is a big pain for a scientist. It is painful for two reasons.
Firstly, it is a complex task by itself; when preparing an article, a scientist not just needs to consume a lot of relevant material, he needs also to filter through a lot of material that is less relevant for current work, but might turn out to be useful later.
Secondly, people who want to profit from scientists’ work while contributing very little to the ecosystem are trying to use various political, economical, and informational compulsive measures to keep scientists restricted in their access to knowledge.
What is “knowledge”? I initially wanted to ask this question as “what is research?” or even “what is a research article?”, but those three seemingly different questions turned out to have the same answer.
To imagine “knowledge”, consider such a popular thing as a “neuron”. Actually, not the real neuron, but the neuron as it is presented on “Machine Learning” courses. It is a “node” with one output and many inputs. If you think about it, it looks very much like a scientific statement, which is a sub-statement of a larger “thought”, and is partitioned into many sub-statements. “Nodes” also have so-called “weak links”, that is, references to other nodes which cannot be described using a “part of” relationship, but are rather “associated”.
1. TODO Body
1.1. Reading list [0/1]
- https://www.emacswiki.org/emacs/CategoryBibliography
- http://kitchingroup.cheme.cmu.edu/blog/2014/05/13/Using-org-ref-for-citations-and-references/
- https://tincman.wordpress.com/2011/01/04/research-paper-management-with-emacs-org-mode-and-reftex/
- http://socialdatablog.com/emacs-org-mode-as-outliner-bibliography-and-citation-manager-working-with-zotero-too.html
- https://www.pogol.net/emacs-org-mode-as-outliner-bibliography-and-citation-manager-working-with-zotero-too
- http://academia.stackexchange.com/questions/1273/use-cases-of-org-mode-as-a-scientific-productivity-tool-for-academics-without-pr
- http://www.draketo.de/english/emacs/writing-papers-in-org-mode-acpd
- https://emacsconf.org/2020/talks/17/
- https://www.epatters.org/wiki/open-science/org-mode.html
- https://wiki.lihebi.com/org.html
- https://lepisma.xyz/wiki/emacs/org-mode/references.html
- https://ogbe.net/emacs/references
- https://rgoswami.me/posts/org-note-workflow/
- https://twitter.com/NPRougier/status/1561947424213573633 org-bib-mode https://github.com/rougier/org-bib-mode
- https://emacs-china.org/t/org-ref-bib/2947
- https://www.mail-archive.com/emacs-orgmode@gnu.org/msg136567.html
- https://unixbhaskar.wordpress.com/2023/04/11/bibliography-management-in-emacs-with-bibtex/
- https://bastibe.de/2014-09-23-org-cite.html
- https://blog.karssen.org/2013/08/22/using-bibtex-from-org-mode/
- https://nickgeorge.net/science/org-ref-setup/
- https://paul-nameless.com/emacs-org-mode-100-books.html
- https://karl-voit.at/2015/12/26/reference-management-with-orgmode/
- https://gitlab.inria.fr/compose/include/compose-bibliography
- https://github.com/jkitchin/org-ref org-ref
- org-helm-ref
- https://irreal.org/blog/?p=8775
- https://cachestocaches.com/2020/3/org-mode-annotated-bibliography/
- https://rebeja.eu/posts/managing-bibliography-using-emacs-org-mode-and-org-ref/
- http://gewhere.github.io/org-bibtex
- https://lists.gnu.org/archive/html/emacs-orgmode/2021-04/msg00445.html
- https://www-public.imtbs-tsp.eu/~berger_o/weblog/2012/03/23/how-to-manage-and-export-bibliographic-notesrefs-in-org-mode/
- https://stackoverflow.com/questions/73790997/emacs-org-mode-latex-export-doesnt-export-bibliography
- https://kristofferbalintona.me/posts/202206141852/
- https://emacs.stackexchange.com/questions/71817/how-to-export-bibliographies-with-org-mode
- https://viveks.info/org-mode-academic-writing-bibliographies-org-ref/
- https://soham.dev/posts/org-bibliography/
- https://orgmode.org/manual/Citations.html
- https://orgmode.org/manual/Bibliography-printing.html
- https://orgmode.org/worg/archive/fireforg.html
- http://gewhere.github.io/org-bibtex
- https://lists.gnu.org/archive/html/emacs-orgmode/2007-10/msg00612.html
- https://www.emacswiki.org/emacs/EbibMode
- https://www.emacswiki.org/emacs/BibTeX
- https://vimeo.com/99167082
- https://zettlr.com/readability
- Zotero
- JabRef
- Mendeley
- https://list.orgmode.org/3613.1329506279@alphaville/T/
- https://orgmode.org/worg/org-tutorials/org-latex-export.html#sec-17-1
1.1.1. Fireforg
Seems to be the first found reference for bibliography is org-mode. Its most prominent feature seems to be importing bibliographic data from webpages via org-protocol capture.
It generates org-headings in a prescribed file, with bibtex-code entry pasted into the heading body, and the same metadata being saved as org properties of the same headline.
It uses Zotero as a link between Firefox and Bibtex, since, apparently, at that time Firefox did not support xdg-style links “org-protocol://”. Or, maybe, the big idea is that Zotero has some code for automatically extracting some metadata from webpages via its database of shims with different paper databases.
Kind of interesting, if the “ground truth” of your reading list is an org file.
Not developed since 2009, so now is probably only of historic interest.
1.1.2. Emacs Conference talk of 2022
The workflow there is surprisingly similar to Fireforg. Zotero is still used as the tool to manage books and articles, which are then exported through org-roam-bibtex to an org-roam node with bibtex properties encoded as org properties.
I guess they are then “transcluded” using org-transclude to the final document?
In any case, I consider org-roam’s approach of using a separate SQLite database and mandatory IDs to be fatally flawed. (But I should check, maybe something has happened since my last time looking at org-roam.)
Also, some plugins are mentioned (added to Literature Review), with which I had terrible experience.
1.1.3. My own old approach, used for ICFP 2020
- Report file
I had a bib file, in the directory with the org file for the Report. That file would use an org command
#+bibliography
, which does I have not idea what. I had a manually typed-in command#+latex_header: \addbibresource{bibliography-bib.bib}
, which, for some obscure reason, was exported to TeX like\addbibresource{/home/lockywolf/full-path-to/bibliography-bib.bib}
, I have no idea why.I also had two lines at the end of the file:
#+bibliography: bibliography-bib plain limit:t #+latex: \printbibliography
I have no clue why they were not exported automatically, but it is nice that it is so easy to bodge in org.
I also had to write the following code at the end of the file to get the table of contents:
#+TOC: headlines 3 #+latex: \tableofcontents
The
.bib
file, at least, in the directory with the report, was a normal “biblatex” file, with which, I remember, I had a lot of trouble understanding when I have to write braces “{}”, and when parentheses “”. - Global configuration
The problem with org and latex is that they are intertwined a lot with the daily activities. In principle, building both PDFs and HTML pages should be done in a dedicated Emacs, with no environment effects. However, I don’t believe anybody is actually doing that.
So, in my case, there were three important pieces of setup:
org.el
,tex.el
, andbibtex.el
.tex.el
For speed entry, I used cdlatex, which I still use, and both in plain
cdlatex-mode
, andorg-cdlatex-mode
, which itself does not seem controversial.How did TeX settings influence org settings?
Here is my
reftex
setup:(use-package reftex :demand t :ensure t :hook ((LaTeX-mode . turn-on-reftex) (LaTeX-mode . turn-on-bib-cite)) :config (require 'bib-cite) (setq reftex-plug-into-AUCTeX t) (setq reftex-auto-recenter-toc t) (setq reftex-revisit-to-follow t) (setq reftex-revisit-to-echo t) (setq bib-cite-use-reftex-view-crossref t) (setq reftex-default-bibliography (list lockywolf/bibtex-bibliography-bib-file)) (setf reftex-ref-style-default-list '("Default" "Cleveref")) )
So,
reftex
is Emacs’s built-in feature for cross-referencing, which is so amazing that I don’t understand how it works. I did successfully use it from time to time, but forgot immediately after leaving the document, so high hopes should be suppressed. However, I think that it will still be used in the new setup, since, you know, it is still in Emacs.That
bib-cite
thing is a NIH-style tool which AUCTeX authors created for working with references, but I still do not know whether it is good.The only thing I remember is that I used
reftex-citation
(C-x [
), and it helped me insert “some” references. There is alsoreftex-reference
, which should work for references the same wayreftex-citation
works for citations. How did I format them for org-mode?Repeated note to myself: reftex is not part of AUCTeX.
bibtex.el
In this file I tried to make some sense of Emacs’ bibtex and biblatex support.
So, this is my configuration for bibtex:
(use-package bibtex :demand t :ensure t :config (setq bibtex-dialect 'biblatex) (setq bibtex-autokey-year-length 4) (setq bibtex-autokey-name-year-separator ":") (setq bibtex-autokey-year-title-separator ":") (setq bibtex-autokey-titleword-length 20) (setq bibtex-maintain-sorted-entries t) (setq bibtex-biblatex-entry-alist (seq-concatenate 'list bibtex-biblatex-entry-alist (list '("ArtifactSoftware" "Software Entity" (("author") ("title") ("year" nil nil 0) ("date" nil nil 0)) nil ( ("version") ("note") ("url") ("urldate") ("lastaccessed")))))) )
Emacs’ bibtex mode “just works”, except for some reason I needed to add entries for software into the list of entry types. There is also “bibtex-utils” package, which I never got to learn.
I also tried to use Ebib as a bibliography manager, and I have learnt a bit from its ideology. So, the important thing is that Ebib is a display for an aggregate of biblatex-formatted .bib files, showing and sorting according to authors or years of publication.
What is more interesting is that it supports additional fields for notes and PDFs. So it is not just a reading list, it actually understands the need for annotating.
It is supporting two modes of adding notes: a file and a directory. Well, keeping all notes in the same file, seemingly, only makes sense if you have notes a few sentences long, otherwise your file would grow insanely big.
But keeping all the notes for all PDFs in a single directory also sounds strange. We have directories and symlinks for managing sets in computing. Why would I need to keep all notes in the same directory?
On the other hand, I do keep all TODOs in the same file? But even that is not true. I have at least four TODO files: laptop, laptop-autogenerated, mobile, mobile-autogenerated.
Ok.
Anyway, this has potential for being fun, if “notes” files are actually
org-noter
files.org.el
Keys:
(("C-c l" . org-store-link) ("C-c L" . org-insert-link-global) ("C-c o" . org-open-at-point-global)) (require 'ox-bibtex) (require 'org-bibtex-extras) (require 'org-ebib) ; Allows opening ebib links in C-c C-o (org-link-set-parameters "cite" :follow 'org-ebib-open) (org-mode . (lambda () (setf reftex-cite-format '((?o . "cite:%l") (?h . "\\cite{%l}")))))
So,
ox-bibtex
is inorg-contrib
, and I suppose, is obsolete now? It implements that#+BIBLIOGRAPHY: /home/user/Literature/foo.bib plain option:-d
code, which only includes the:\bibliographystyle{plain} \bibliography{foo}
and converts all
cite:foo
to\cite{foo}
.That reftex customisation is quite important actually. It is used to query a bib file for keys to be used in citations. In this setup reftex is not used for references or index, only for citations. (Right?)
1.1.4. Other thoughts.
The important thing here is the difference between ox-bibtex
and ol-bibtex
.
They are not the same thing.
ox-bibtex
defines a cite:foo
link, for using citations in org documents.
ol-bibtex
does import-export to and from actual bibtex text files.
I am not sure how ol-bibtex
links are exported when exporting to html, I need to check how it works when reviewing packages independently.
But the main use for this package is, seemingly, to keep a list of papers in org, and export into bib files when assembling a paper.
I suspect that one would make an org-file with a “reading list” for a project, populate it with books as they appear on the horizon.
Those books might be pointing toward, say org-noter
files with review?
1.1.5. http://gewhere.github.io/org-bibtex
The idea here is, seemingly, to have “ground truth” in org files, not in bibtex files. Each heading is a book, which has bibliographic data recorded as org properties. Importing and exporting is done semi-manually, in the sense that you can export the required headings into a bib file, which, presumably, you would only do when compiling a paper, but it does not see to be possible to set an org file as a bibliographic database with exports organised completely automatically.
Importing stuff into org is also done semi-automatically. The code will help you to yank a piece of biblatex as an org-heading, but not much more.
Citation exports seem to work using ox-bibtex
, that is, using the cite:
format.
The format using “ordinary links” is not yet mentioned.
1.1.6. http://www.draketo.de/english/emacs/writing-papers-in-org-mode-acpd
This is a howto by Arne Babenhauserheide.
He is using a specific LaTeX style, which he is adding to org-latex-classes
, as well as a few custom packages.
He is explicitly loading reftex-cite
-only, not full reftex
, but it is clear that he is going to use reftex for citation inclusion.
He is using org-mode-reftex-search
, which is an old function found on the org-mode mailing list, https://list.orgmode.org/3613.1329506279@alphaville/T/.
This function, if I am not mistaken, should make a jump to the notes for an original cited document.
The code is missing, I guess, Arne copied it from the mailing list into his .init.el
.
It is interesting that there we are encountering the concept of “notes file” once again.
Also, it is interesting that he is using minted
for org code listings, which, in turn, uses Pygments
.
I never bothered to make them work, and nowadays there is a whole new machinery for org, called engraved
, which should make it possible to colourise both latex, and html, using Emacs means only.
Another useful thing that this HOWTO has is #+BIND: variable value
syntax.
It lets one override some variables for exports, which is especially useful when there is no #+KEYWORD:
syntax for this variable, and when using file-local variables is imperfect, such as when you need different values for editing and for exporting the file.
(You also need to set org-export-allow-bind-keywords
to t
.)
1.1.7. https://orgmode.org/worg/org-tutorials/org-latex-export.html
In section https://orgmode.org/worg/org-tutorials/org-latex-export.html#sec-17-1, there is an interesting trick on how to make references (not citations) work:
(setf org-export-latex-hyperref-format "\\ref{%s}")
will make intra-document references work correctly.
Otherwise, they define a strategy that is basically like ox-bibtex
, defining custom links for each citation type.
1.1.8. https://lepisma.xyz/wiki/emacs/org-mode/references.html
What I have learnt from his article.
- He also uses Zotero for article management. I guess, I need to study it once more. He use the same BetterBibTex that https://emacsconf.org/2020/talks/17/ is recommending.
- He faithfully uses org-ref (makes me doubtful)
- He has a single file with Bibtex metadata in PROPERTIES, and with annotations in bodies.
I guess, things are imported there using
ol-bibtex
. How exactly that page and the bib file are kept in sync I do not know. He is having three blogs on his site: blog, journal, log and a “wiki”.
- Blog is for “longreads”
- Journal is for shitpost
- Log is for habit tracking
- and there is also what I call “howtos”, that he puts in a “wiki”.
This seems more complicated than my setup of just two categories, “notes” and “howtos”. I have tried to switch from paper into keeping my records in a journal for a very long time, but always failed.
1.1.9. TODO https://ogbe.net/emacs/references
This is Dennis Ogbe’s setup. He is using ebib+helm-bibtex+org-ref
The interesting bits in his setup are the following:
- He uses multiple .bib files merged into a single database.
(defun do.refs/update-db-file-list () "Update the list of bib files." (interactive) (let ((db-list (do.refs/get-db-file-list))) (setq reftex-default-bibliography db-list) (setq bibtex-completion-bibliography db-list) (setq ebib-preload-bib-files db-list)))
So, his “ground truth” is still biblatex, but he has a way to group “Points of Knowledge” into categories by different bib files.
- He has separate dirs for PDFs, and notes.
(defvar do.refs/db-dirs nil "A list of paths to directories containing all my bibtex databases") (defvar do.refs/pdf-dir nil "The file for the entry with key <key> is stored as <key>.pdf") (defvar do.refs/notes-dir nil "The note for the item with key <key> is stored as <key>.org") (defvar do.refs/pdf-download-dir nil "The path to directory we download PDF files.")
Some things I immediately dislike here.
Firstly, rigid notation for naming PDF files.
I like calling my files with full names.
In general, keeping as much info as possible in file names is good.
For example, I have PDF files on my drive called like 2023-09-01_Various-Authors_GNU-Maxima-manual-for-version-5.47.11_2023.pdf
, and I like it this way.
Moreover, I kind of like it where it is, I do not want to specifically put it somewhere.
- He uses autokeying.
And I do not like the default authoryear autokeying of bibtex.el
, because it is too easy to forget what jackson2001
means.
I want my keys to be full names with dates: various2023maximaManualForVersion5.47.11
.
Since I am going to use some automated machinery to cite those papers, long keys should not matter.
As a side-note, there should be a way to make autokeying better:
(setq bibtex-autokey-year-length 4) (setq bibtex-autokey-titleword-separator "-") (setq bibtex-autokey-name-year-separator "-") (setq bibtex-autokey-year-title-separator "-") (setq bibtex-autokey-titleword-length 16) (setq bibtex-autokey-titlewords 8)
- But I like that I am, once again, seeing a repeated pattern: PDFs, notes, bib-database.
There is one more interesting bit there:
(defun do.refs/ebib-add-annotated (arg) "Advice for `ebib-import-file' that automatically creates a copy of the imported file that will be used for annotation." (interactive "P") (let ((filename (ebib-get-field-value "file" (ebib--get-key-at-point) ebib--cur-db 'noerror 'unbraced))) (when filename (let* ((pdf-path (file-name-as-directory (car ebib-file-search-dirs))) (orig-path (concat pdf-path filename)) (annot-path (concat pdf-path (file-name-sans-extension filename) "-annotated" (file-name-extension filename t)))) (unless (file-writable-p annot-path) (error "[Ebib] [Dennis] Cannot write file %s" annot-path)) (copy-file orig-path annot-path))))) ;; add the above after the original call is done. (unless (and (boundp 'do.refs/add-annotated) (not do.refs/add-annotated)) (advice-add #'ebib-import-file :after #'do.refs/ebib-add-annotated)))
See! He also has a file which will be used for annotations only.
- The rest of his
ebib
configuration is fairly straightforward. Note that he is using the “note” field, but not the “annotation field”. (But maybe that is an old version ofebib
?) - He uses
bibtex-completion
. Whilebibtex-completion
deserves its own chapter, I need to instantly write down something here:
(setq bibtex-completion-find-additional-pdfs t)
This snippet means that apart from key.pdf
, bibtex-completion
will also consider PDF files named key-*.pdf
for completion.
- He also uses
org-ref
, and from his code I did not understand how!
1.2. Literature review [0/6]
- Org-Mode Official Manual
- Ebib Official Manual
- Fireforg
- ox-bibtex
- ol-bibtex, which used to be called org-bibtex http://gewhere.github.io/org-bibtex
- helm-bibtex/ivy-bibtex/bibtex-completion
- org-bibtex-extra
- ebib
- org-ebib
- citar
- reftex Reftex Manual
- reftex-cite
- amsreftex
- org-inlinetask
- org-transclusion (ELPA)
- org-roam
- org-sidebar
- zotero
- better-bibtex
- zotfile
- Qiqqa
- Mendeley
- JabRef
- org-roam-bibtex
- org-pdf-scrapper
- Bibus (a mysql-based bibliography manager)
- refdb-mode
- citeproc.el
- citeproc-org.el
1.2.1. Bibus
I am not convinced of the benefits of using MySQL for bibliography management.
1.2.2. refdb-mode
RefDB is a standard used by libraries to exchange bibliographic data. After brief skimming, and mentioning that the most recent version is from 2008, I suspect that unless you are really running a library, it is not worth using.
1.2.3. Pure Bibtex / Biblatex
Okay, now we are getting somewhere.
Pure biblatex files are weirdly formatted database files with entries for “papers”, which are called “entries”. I am tempted to call them “entities”, because why would not I add there any kind of vaguely related stuff, such as theorems?
Let us see some examples:
- Boring fields
@Book{Metcalf_2018_fortran, author = {Michael Metcalf and John Reid and Malcolm Cohen}, title = {Modern Fortran Explained}, year = 2018, month = 10, doi = {10.1093/oso/9780198811893.001.0001}, url = {http://dx.doi.org/10.1093/oso/9780198811893.001.0001}, isbn = 9780198811893, journal = {Oxford Scholarship Online}, publisher = {Oxford University Press} }
Most stuff here is fairly straightforward, except using braces to delimit phrases with spaces. This is a special property of Biblatex (as opposed to Bibtex). But never ever use old bibtex, it is just outdated.
Now let us make a fancier example.
- Interactive fields
@Article{testauthor1000, author = {}, title = "Test Article", year = 1000, DOI = "1.1/5.86", file = "Full Text:testauthor.pdf:PDF" URL = "https://doi.org/1.1/5.86", crossref = "DBLP:conf/testconf/1000", timestamp = "Tue, 06 Nov 1000 16:59:25 +0000", biburl = "https://dblp.org/rec/conf/testconf/Author.bib", bibsource = "dblp computer science bibliography, https://dblp.org", xdata = {}, note = {}, annotation = {}, abstract = {}, keywords = {} }
This example is more interesting, because it has some interactive fields.
So, the
file
fields is fairly easy, it is just the path to the PDF of the article. What arenote
,annotation
,abstract
,xdata
,keywords
?keywords
Okay,
keywords
should be used for tags, I guess? Where do I get those tags? Surely they can’t be coming from bibsources?xdata
Is not
xdata
a link to the external piece of data?note
,annotation
, andabstract
ebib
people believe thatannotation
is a long-ish text, basically what I consider to be “reverse-engineering”, however, putting one into a biblatex field sounds insane. Maybe it should be a path to an annotation file?And what is
note
? http://bibtex.com is claiming thatnote
is used for “various remarks”.external note
External note is a pseudo-header created by ebib, to keep a full-fledged file with notes. This seems important, because I want to reverse-engineer poorly-written PDFs into something readable, so this “external note” is where potentially
org-noter
could go.
- @String and @Preamble
@String
is a special syntax for abbreviations, used like@string{DEK = {Donald E. Knuth}}
.@Preamble
is a special syntax that is prepended to the bibliography when used with latex, and might not be too useful for our purposes. Example:@preamble { "Maintained by " # maintainer }
. - Extensions to Emacs support for biblatex.
There are quite a few. Too many to fit into the margins. However, aside from those which clean up files, it is too early to think about them, until a working pipeline is established.
- Using Biblatex as ground truth
The question is: “do you want to use biblatex as ground truth for your readings?”.
Ebib seems to imply that. Use a bib file as a repository of all the papers and books you ever encounter. Convert them to PDF, and annotate with
org-noter
.How would you attach PDF files? It is certainly possible to write paths to them in ebib, but renaming them would make those paths invalid.
How would you mark read/unread files? Why do I have to keep all external notes in a single directory?
All of that does not sound to promising.
Anyway, let us go on.
1.2.4. org-inlinetodos
Inlinetodos are added with C-c C-x t
, and are really nice.
It is good that I watched that video.
1.2.5. reftex
Reftex is built into Emacs, and is a mode for managing references, citations, labels, and index in LaTeX.
Let us see if it can be abused for helping us do research in org-mode
.
Seemingly, it assumes that you are editing a LaTeX project, and it can index files in that project, look for labels, and offer them for auto-completion, as well as lookup citations in the bib file.
In order for that to work, we need to agree on a certain pattern on using labels, and on what goes into the bib file.
As a side-note, the previous paper I did in org-mode didn’t have any labels.
C-c (
creates a label.
C-c )
lets you choose a label, and inserts it.
C-c [
inserts a citation with a key from a bib file.
I believe, there are some customisation variables in Emacs, which would let us insert org targets, links, and, for example, cite:bla
links, although I am not sure it is the best way to do it.
Reftex has features for quick navigation within the project, C-c =
, which in org can be partially substituted by (1) imenu, (2) collapsing org trees with TAB, (3) grep and isearch.
Reftex has support for index. Use C-c <
to index an entry.
I think that indexing is quite underused in 2023. I have never seen scientific papers include an index, and I seldom use index in a PDF book, if I can search instead. However. I am thinking that an index, or, rather, a glossary, is that tool of data organisation that can be best used to establish cross-document relationships. As an example, let us assume someone is studying a certain subject, or, more realistically, tries to break through a difficult paper, and is trying to build his own way to have a solid foundation needed to understand a paper. One would necessarily read several books on the same subject, as well as subjects building on top of previous subjects, and establish inferential links.
A book has a natural representation of a graph with nodes-headings, paragraphs, and theorems. However, wandering over many not isomorphic graphs is stressing for the brain. Marking certain places as “interesting” would help to establish relationships between the graphs, and a certain “concept” node could server as a point of link concentration for them. This is why I am saying about a glossary (with bodies), not just an index.
- Jump to definition.
Okay, in a LaTeX document, there might be more than one interpretation of what is “jump to definition” that programmers are so used to.
Firstly I will mention one that is not that frequently thought of by programmers: jumping to a word definition in a dictionary, or a translation in a foreign language dictionary. So far I do not know whether such a feature exists in Emacs, let alone reftex.
Reftex can show references of labels in the document itself, using
C-c &
.I wonder, why does not it just directly plug into Emacs xref framework? The one called with
C-.
andC-S-.
? - Cooperating with Org
reftex
has a variable calledreftex-default-bibliography
, which should be pointing to a bibliography (say, also pointed to byebib
?), from which you might use citations in non-LaTeX buffers. It is probably required to customise reftex’sreftex-cite-format
, or something like that. I actually did that for an “older” setup.Can I use it in the “newer” setup with labels too?
1.2.6. ol-bibtex / org-bibtex
So, the idea of ol-bibtex is that you keep a reading list, together with bibtex metadata, in an org file.
The nice thing here is that you can
- use org’s search machinery to filter headings by properties and tags,
- group books/articles by subtrees, say:
* Differential Equations ** Paper 1 :interesting: ** Paper 2 :boring: * Measure Theory ** Paper 3 :interesting: ** Paper 4 :boring:
Not sure how org-bibtex
machinery will work with tree-like headings.
org-bibtex
does not seem to export org tags as a “keywords” field in the resulting org file.
(There seems to be a parameter for that org-bibtex-tags-are-keywords
)
Bodies are also not exported neither as a “note” field, nor as an “annotation” field.
In general, I am doubtful that having all annotations for all papers in an org file is practical.
On the other hand, org-noter
seems to integrate with this approach not too badly,
as it expects a heading to keep all annotations for a file.
- TODO Understand org-bibtex-key-property
- END
There is some magic for “global links”, but I am not sure how it works.
org-bibtex
also allows capturing links to bib files, with nice-ish navigation.Schulte et al. 2012
Again, I am not sure how I would use this, since, it seems,
org-bibtex
’s approach is to keep everything in org files, and only export things to bib when necessary. But, I guess, this can be useful when you are reading someone’s paper, say, from Arxiv, in the tex source, and it includes a bib file.There is a function (
org-bibtex-search
) that I have not yet fully understood, which is searching for “bibliographical entries” in agenda files. I guess, again, the workflow that this package is suggesting is to use an org file for each new project list the required reading in those files, add those files to agenda file list, and then, when there is a dedicated “reading time”, search for “bibliographic entries” in the agenda?
1.2.7. TODO org-bibtex-extras.el
Okay, I still have not understood what this package does. It should somehow improve org exports to html, but I am still not sure how exactly.
1.2.8. ox-bibtex
This is a very straightforward package.
It defines org links in the format of cite:key
, which are later formatted either as html links, or as latex citations.
You can also add the #+bibliography: plain t
code piece, which will either use latex machinery to format bibliography, or use bibtex2html
.
I have to stress, it expects a bibtex-file bibliography, not an org-file.
Perhaps, when if we want to keep all our “ground truth” as org-files, we can add a hook to convert those files into bib files with org-bibtex
.
Also, maybe, even a three-stage process might be useful: keep a bibliography in a file, use org-transclude
to only include the required bibliographic entries into an org file, then export this file into a bib file and then org-export will be able to build both an html, and a pdf from it.
1.2.9. Org-Mode’s own citation machinery, oc.el
This section comes from org#Citation handling. This machinery in Org is relatively new, circa 2021.
So, firstly, we are getting one more NIH way of adding citations.
That is, instead of relying on standard Emacs’ reftex, org developers introduced yet another key combo to insert citations, C-c C-x @
.
Why exactly is beyond comprehension, since reftex
was written by Carsten Dominik, the guy who wrote the actual org-mode
.
It surely would have been much easier to update reftex, rather than write one more extra system doing the same thing.
It is looking at a file pointed to by #+bibliography: File.bib
, which is at least a little consistent with the older ways.
Citations look like [cite:@key]
links, and why exactly they need the @ symbol, I do not know, but in this case it works as expected, as we can keep our older ox-bibtex’s cite:key
links without opening function conflicts.
C-c C-o
works, and opens an entry in File.bib
.
oc.el
supports some kind of “citation styles”, which are, I guess, useful to some people.
Exporting citations is handled by the keyword #+cite_export: basic author author-year
.
Using the csl
processor makes the exporter format everything manually, both in html, which is, I guess, okay, or, at least, I have not found it to be much worse than bibtex2html
, but in LaTeX it looks super weird.
Writing #+cite_export: biblatex
makes more sense, at least it is writing out \addbibresource{File.bib}
and replacing citations with \autocite{key}
.
Exports to html, however, are not supported, and the exporter just prints LaTeX commands in place of proper references.
I guess, it might not be too hard to patch it to do basically what ox-bibtex
did in the past, exporting via bibtex2html
.
Note (!) export to html, conflicts with ox-bibtex
.
If your config still includes ox-bibtex
, you might want to remove it, or somehow tweak and debug.
So, in general, I have mixed feelings about all this new citations machinery in org. It is good that there is now a system with pluggable backends, but so far, documentation is lacking, and making it do what you expect, is not straightforward.
In particular, I wanted to write a disappointed comment here, but looking in the source code for oc.el
, found the variable: org-cite-export-processors
.
This variable does not do what your expect it to do from the name.
You might actually want to set it to something like this:
(setf org-cite-export-processors ((beamer natbib)
(latex biblatex)
(t csl)))
Which means that beamer will process bib files using pure TeX, latex-pdf will use biblatex, and html and all the seldom used formats will use csl.
Looks almost as good as the “old” approach with ox-bibtex
and bibtex2html
.
A backend for org-bibtex
, seemingly, still needs to be written.
I can understand why they rejected reftex
.
After all, reftex
is bibtex
-only.
What I do not understand is why they introduced a new kind of link, inserted with a separate keybinding, C-c C-x @
, rather than just reusing org-link machinery.
For example, there could be a cite@:
link, and, depending on the value of #+bibliography:
, C-c C-l
would insert a citation from that database, using normal tab completion.
But anyway, it seems that with the new version, it should be possible to rewrite the old citation mechanism using just oc.el
.
This would make bibtex2html
not needed, as well as ox-bibtex
.
1.2.10. TODO bibtex-completion
~bibtex-completion∼ lives here: https://github.com/tmalsburg/helm-bibtex
This entry is not finished, but before I make any progress with the review, I need to understand what the author means by:
Org-bibtex users can also specify org-mode bibliography files, in which case it will be assumed that a BibTeX file exists with the same name and extension bib instead of org. If the bib file has a different name, use a cons cell
("orgfile.org" . "bibfile.bib")
instead:
Really? org-bibtex
? Not ol-bibtex
?
I asked a question https://github.com/tmalsburg/helm-bibtex/issues/438
Apparently, bibtex-completion
is neither bibtex, nor completion.
It is an abuse of the completion framework to be used as a GUI to the bibliographic database.
1.2.11. TODO ebib
I will try to avoid copying all of the Ebib’s manual. In this subtree I will try to outline the main scenarios for it.
1.2.12. org-ebib
Small, simple, straightforward.
Do (org-link-set-parameters "cite" :follow 'org-ebib-open)
to make cite:
links open in ebib
.
Nice if you use ebib
.
1.2.13. TODO evince
1.3. TODO Concepts
1.3.1. Hypertext
1.3.3. Capturing
1.3.4. Tracking reading
1.3.5. Keywords
1.3.6. Search
1.3.8. Annotation
1.3.11. Indexing
1.3.12. Mathematics
1.3.13. Graphics
1.3.14. Animation
1.3.15. Table of contents
1.3.16. Spell checking.
1.3.17. Text Highlighting
1.3.18. Sticky Notes
1.4. Use-Cases and Use-Scenarios
1.4.1. A piece of knowledge (document), and its branches
After reviewing 17 out of 50 documents, I started to get some thoughts about what a research system should do.
I am tempted to say that a basic unit of the system is an “article”. An article can have several forms:
- HTML
- TeX
- Semantic markup, such as org or markdown, which is the most useful option.
The interesting thing is that these kinds of content can be transformed one into another. Semantic markup can be freely compiled into PDF or HTML, and PDF can be converted into Semantic markup using OCR (mathpix.com does amazing things).
We need a system which can track at least those three kinds of content.
But that is not enough.
In addition to “different presentations”, a piece of knowledge has metadata. Supposedly, we can deal with biblatex’s fields to “describe” a document exhaustively.
Among the metadata, at least “annotation” is a very useful field which need to be written for each processed paper. You might call it “lightly embedded” into the brain, because often the depth needed to write a decent annotation is still not profound enough to understand all of the paper. You can speculate that an “annotation” is what “Mathematical Reviews” or “ZbMATH” are doing. I guess, if a written annotation is not to be openly published, you can just type in a URL into the annotation field.
- Most papers are incomprehensible and what to do with it
One important thing to note about modern science is that most of the papers are written either to pass irrelevant review tests, or at writers own pleasure with no quality control. Therefore we can safely assume that most of the papers are garbage.
This characterisation is not to denigrate the work that has been invested into them, as the authors are playing by extant rules. But this means that almost no paper is ready to be consumed as a good software library, with a well-defined interface and layered design.
Reading papers is essentially like reverse-engineering binaries. Those were written for the machine, not for you. And therefore we need to use tools that are frequently seen in binary analysis and bytecode debugging.
- Instrumentation
Admittedly, papers are slightly better than binary, they are, after all, written in a human language, so the decompilation part can be skipped. But we need a thing that in dynamic languages is called “instrumentation”.
In fact, there is nothing new in instrumentation applied to texts. It is called “interlineation”, and consists of inserting text in-between the lines of the text that is being studied. When most of the studies were humanities, especially theology-related, this seems perfectly natural for people, but for some reason nowadays people seem to have largely forgotten this approach.
- How to do interlineation if source is available?
That is already a big question? Even if we have LaTeX source, this is not a trivial task. While having LaTeX source lets us edit the document at will, we cannot:
- Throw away the old document (as it may have links to it).
- Blindly write text in between the sentences of the original document, as it might break indexing and page navigation, and if not made somehow visually different from the old text, might confuse the reader.
As a quick-and-dirty approach, I have just defined an environment in LaTeX, which is displaying its internal text in grey.
\begin{mycomment} Interlineation. \end{mycomment}
This is not a very good approach though, as it is not cleanly working with LaTeX’s paragraphs.
A better approach is to enumerate all thoughts in a document, giving each though a separate numbered clause, and writing the explanation for it in the clause body below the clause text. (Yes, I know, this is a little messy, to distinguish a “clause body” and a “clause text”, but I have no better wording.) See some thoughts on this subject here: https://gitlab.com/Lockywolf/study_notes/-/tree/master/2023-07-02_numbered-well-structured-LaTeX/2023-04-11_improvised-method
In this write-up I do not want to spend a lot of effort on describing how to transform a bad paper first into an interlineated papper, and later into a good paper. For this I have a separate article, that is not yet finished: How to write papers in LaTeX.
- How to do interlineation if source is not available?
That is an even bigger of an issue, is it not?
I am giving the following pairwise incomparable options:
- Reverse-engineer your paper with
mathpix
or other OCR. - Use
org-noter
to attach annotations to certain pieces of the document.- and leave it as-is
- and burn-in the notes as PDF sticky notes or highlighted text
- and burn-in the notes as actual interlineary text into the pages, increasing page sizes to be greater than A4
- Reverse engineer the paper into a set of image tiles, possibly on the basis of intensity analysis of the lines of text, and typeset your annotations between the tiles, thus keeping the A4 size, but potentially losing some of the page navigation.
So far, solutions implementing options 2.2, 2.3, and 3 are unknown to me. Options 1 and 2.1 are incomparable, because they require an incomparable amount of work. Option 1 is far more flexible, but option 2 allows you to start annotating right away.
- Reverse-engineer your paper with
- Annotating HTML
This is an interesting use-case. I have not seen papers written in HTML originally, with an exception of SRFI documents of the Scheme Community Process. HTML opens a lot of opportunities for annotation which are better than those of TeX, such as text expandable on click (which is much better than text-on-hover, or text-on-sticky-notes). Still, there probably will be a need for at least three versions of the paper: original, annotated, and improved.
- Why instrumentation is not a good answer
For the same reason Richard Stallman started the Free Software movement.
Wasting time on reverse-engineering computer games and device drivers, even though it is also stupid, at least has some motivation behind it, after all, computers run binary.
There is no reason why articles, especially those which are published as TeX on Arxiv, or those which are published at author’s expense along the OpenAccess model, should be set in stone once a “release” is done.
Articles should follow the software development model, with pull-requests, patch review, automatic testing for consistency, and a set of guidelines on what is an API/ABI breakage and versioning.
Moreover, retracting a paper should not merely be a stamp of disapproval, but a peer-reviewed patch, which highlights exactly the place where there is a flaw, with the typology of the flaw indicated, so that automatic search for similarly-flawed articles can be conducted.
- Instrumentation
- Summing up this section.
When making a database of “pieces of knowledge”, we need an entry to have at least the following fields or field groups:
- Bibliographic metadata
- Original PDF/HTML
- Original TeX (empty unless Arxiv)
- Reverse-engineered TeX
- Annotated TeX/HTML
org-noter
notes- Annotated PDF (with burned-in notes)
- Improved PDF/HTML
- Semantic Version (org, or sTeX)
1.4.2. Linking pieces of knowledge
If we have a database of “articles”, a database of pieces of knowledge, we, quite naturally, might want to interlink them.
This would be mimicking the Web, or human (or, rather, artificial) brain, or some other semantic network.
- Dependent articles
Sometimes articles are released as “version 2.0”, and books quite often get a “Second Edition”. Another example of a dependent article is a solution book for a problem book, or a conference presentation for a paper.
- Bibliographic references
That is what bibtex was originally for. If you have tex sources for many article, with bib files included, you can draw a network of citations. I am not sure how exactly you would do that for articles for which bib files are not available, which is the case for most articles other than Arxiv ones, so usefulness of this feature is dubious.
What I do want, however, is to be able to cite articles from the database using a hotkey, similar to reftex, and assemble a bib file for later upload to Arxiv.
- Reading lists
Reading list quite naturally go hand in hand with the concept of a “project”. What is a “project”? It is hard to define a project precisely, but for theorists and for humanities scholars, a “project” is most likely to include a set of books or articles to read, and a set of claims to prove or discursively argue for or against. (For experimental disciplines things are more involved.)
From the paragraph above, it is already quite visible that Org-mode is quite naturally mapping the concept of a project.
When you have a project, say, you want to prove a certain theorem in Engineering Communications Theory (imaginary field), you might want to grind through a set of articles studying this field, which are usually on Arxiv, so you can annotate them in-place, and more importantly, place indexing markers in some interesting places.
Very often you will not be able to understand some theorems from a paper without background reading, so very soon you will, quite naturally, arrive to a graph of concepts. (I am not sure whether it can be called a “knowledge graph”, as I have seen that term used to describe a specific thing.) A theorem from a paper would require some (linked) reading to be understood. That “linked reading” would be in some other paper or book. If that book is not available as source, linking is likely to be done to the annotation file, or an annotated pdf.
So, a “project” will be a “concept graph”, which will be referring to the concepts of the underlying papers/books somehow. Making this graph is, seemingly, much easier than making a bibliographic citation graph, because, even if you have zero metadata about the paper or book you are reading, you are very likely to read through at least the table of contents, and re-coding the table of contents into a file is negligible in time, compared to the time needed to understand the concepts themselves.
Aha! I have mentioned something without explicitly saying. A Table of Contents is one of the most natural ways of breaking a paper into a skeleton, similar to org-mode’s outline. See the next paragraph.
- Concept maps
So, I have mentioned a few ways for grinding through scientific material, which eventually should lead to the creation of a new piece of knowledge.
A “project” is a set of articles to read, and a set of concepts to define. Ideas for new concepts arise from consumed article, and the need to read more articles arises from the need to understand concepts, from reading an “incoming” list, and from citations by other articles.
When we want to visualise what is going on, we will quite naturally see three kinds of links between “Pieces of Knowledge”. (I am abusing notation here. From now on, a “Piece of Knowledge” is not just an article, it may be any piece of text that deserves independent study, for example, a chapter, or a section.)
These three kinds of links are:
- Constituent links: a chapter is linked to its sections. Unidirectional.
- Soft links: a theorem requires some background knowledge to be understood. “To understand this statement, I needed to read that place in that book”. Might be bi-directional, for example, if a theorem is described in two places, and understanding it might require reading both explanations. (See Scheme’s letrec.)
- Indexing links: Two “Pieces of Knowledge” are describing the same concept, but I did not actually need to read one to understand the other.
How exactly a “Concept Map” would map onto a “Ready-made article” is a debatable subject. In some sense, its value is that of the debugging symbols for a binary program. It should greatly improve understanding, but most probably will not happen to be the skeleton of the final paper.
1.5. References
- Ludwig Wittgenstein, Tractatus