How to create a decent file system structure and not become insane.

I wanted to turn this file into an essay on how to make
a decent file system tree. I failed, as this task turned
out to be unimaginably harder than I had expected.
I, thus, promote this file to the status of a "living
document", where I may be adding feature as I happen to
find them convenient.

Computers are unreliable, misleading and oftentimes overtly lying. Is is possible to make computers efficiently assist you in everything you do. However, learning to do so requires managing a large amount of material and takes a lot of time. This file proposes a few guide lines that the author found helpful in managing the computer’s file system structure. Even though there are many services that provide “outsourced” management of certain kinds of computer resources (such as Google Photos, Gmail, WordPress), and they may be used when appropriate, it is still necessary to understand the underlying principles of data management. Ignoring them leads to a chaos that is hard to navigate, only in the case of online services it is offloaded in to the public computing system.

1 Certain disclaimers, combating illusions.

1.1 Everything will be slow.

Sometimes you see certain numbers advertised by equipment manufacturers, shops, service providers, et ceters. Do not believe them, test everything yourself. Your HDDs will be 1.8 Tb instead of 2Tb. Not a huge lie, is it, no more than 10%. But if you have your system planned out to byte, it’s going to be a huge waste of money and time to buy a disk that doesn’t fit your requirements.

Somebody promises you 150 Mb/s speed on a wired channel? You’re probably already aware of the fact that official numbers are exaggerated, right? So you reasonably make a discount of, how much? Like, 10%, right? 130 Mb/s? You are wrong. In an adversarial case, that is a real case that is created by various interacting components at your system, you are going to attain 1.5 Mb/s at best. Divide by 100 ever marketing promise, that’s going to lead to less disappointment.

1.2 Everything will be unreliable.

The only reliable medium nowadays is paper. Yeah, if you’re Byron or Mo Yan, your ouvres may happen to be mirrored by human memory, but I wouldn’t rely on that.

HDDs, SSDs, everything fails. Moreover, everything fails often.

1.3 The cloud is just another person’s computer.

Don’t get me wrong, having someone take care about your data is a great stress-relief. Just do not overestimate the reliability of those services. At some point you will buy a holiday trip to the country where your cloud provider is, not even blocked, just choked by the low quality of the network.

1.4 Reservation.

Everything needs to be copied five times:

  • Master copy. That is your computer and/or phone.
  • Local backup. SSD, HDD, USB-stick, home server, etc.
  • Local snapshot.
  • Oligarch’s cloud (Google, Apple, Baidu, Yandex, whatever).
  • Your own cloud.

2 What is data

2.1 Our life in data.

In this world, there is a digital model of you. It is not a single model, and it is mostly not entirely an electronic model, it may very well be on paper, or rather, papers, held by various institutions that you had happened to give your data to.

Your schools probably hold a bit of files on you, your work units, and your military service. Police, even if you haven’t done anything wrong, and just drive a car, has a profile on you. Of course, Google an Alibaba have something. You boyfriends and girlfriends, relatives and pals.

Of course, the person who holds most on you is probably your enemy, if you are honoured to have one. Ow, how exciting it would have been to find out what you enemy has on you. He’s not you. While for most people, the choice between sorting out a personal archive and having a game of Dota is not at all obvious (even for the most conscientious of us), for our enemy it is crystal-clear, nobody would be mining data on you with the same diligence.

We do not usually think about this data in terms of a file system. However, from a data engineer perspective, it is a distributed virtual system of data blocks. Not every data block is on an actual disk, but each one has some kind of an “address” via which it can be reached.

These addresses are usually incompatible, but why can’t they be made compatible? At least some of them can.

Now I have to start speaking with a bit of technicality. For example, the subsystem called “fuse” allows a programmer, with a bit of work, make a huge variety of addresses compatible with the addresses that your files have on your computers.

Is the file system system of addresses exhaustive or the best possible? Likely, no. But, surprisingly, the set of primitives that we have developed for working with files, while staying minimal, is still tremendously powerful. You can exceed this power, indeed, but this requires a giant increase in complexity.

2.2 Mind mapping.

“Mind mapping” as a name was invented by Tony Buzan to describe his own paper-based protocol for recording data.

It is often known under the name of “concept mapping”, and is frequently praised as a “totally different way of thinking”, but really is just the popular explanation of graph theory.

Which does not make it worse, of course, and in fact, noticing this immediately made me think “hand on, but my file system is also a graph”.

Indeed, under relatively mild restrictions (mind map conditions), a lot of things can be represented as graphs, and graphs easily map onto a file system structure. Conversely, this immediately leads us to the conclusion that our data can be visualised as a graph, and this can give us useful insights.

With this thought in mind, I tried drawing down the “my life” as a mind-map-like graph, placing various aspects of my life on that sketch.

Many things went there: studying, job, friends, relatives, hobbies, various government and society-related things.

And immediately it became apparent to me, that:

  1. Each program is a tiny brain; and its structure is completely different from mine. It is, thus, very counterproductive to just naively join them. (This is what you can call a DOS Way.)
  2. The file system structure typically proposed by a pre-made device (such as Pictures, Videos, Documents) is also mostly inadequate to describe such a complex creature as a human. (you can call it a “Unix Way”)

Okay, so the two most popular approaches are not working and will never work.

What is left to do?

Well, that is why this document is called a “living document”. I haven’t found the answer. However, I found a few tricks that have made my life easier.

2.3 Basic heuristics

There are a few tricks that are worth considering when designing a human-interacting system.

2.3.1 Important items should be close to where you can see them.

There is a noticeable disparity between the places we expect to see things, and where we really see them.

You may have an excellent task manager, but it will not be of use if you do not open it. And conversely, if you see unexpected things where they should not be, you are more likely to react upon them.

Imagine your wife leaving you a message inside the code file or the document you are currently working on. You are much more likely do something!

Actionable insight I am configuring my system to put reminders, and notifications right at the home directory.

2.3.2 Human attention is limited by 7 items.

The title is a little be clickbait-ish; in fact, human attention volume can be larger, say, up to 14 items, but the scale is about that large. If you are having more items in a directory, your brain will select it’s own native number of items (7-14), and will ignore the rest.

It particular, it means that it is likely that each directory you make, in general should have no more than “your natural” number of items. Self-check: my home directory (“~/”) has 28 items, and I ignore most of them, except about 7. However, I do notice all unexpected files in a directory quite quickly.

This number is trainable, as most human skills are, but not extensively. You, perhaps, can raise it from 7 to 14, but not to 50.

However, I know two ways of “tricking” this number.

The first way is to give shorter names, or even hide the directories that “you know are there”. Since you remember that they are there, your brain ignores them when it sees them, but you can still see them at the visualisation interface (make sure you have one).

The second exception to this rule is case when items are somehow dependent on each other. If the items have some natural ordering (perhaps according to some date, or a human name), you can have more than your “fixed number”.

Why am I so keen on increasing this number? Can’t we just make groups and subgroups? The answer is “not really”.

Each time you go inside a directory, you are having a context switch, which means you are loosing a bit of context. In other words, the depth of your file system tree also matters. It matters less than breadth, but still. Keep your brain data structures tight.

2.3.3 A dashboard.

A dashboard is a misleading thing. Remember the trick that I have given in the previous chapter that can be used to increase the number of items in a directory? (Adding “implicit” items that your brain ignores.)

Here we see the same effect, but in the opposite direction. If you have a dashboard, you are getting “a feeling” that you are up to date with the information, but in reality, your brain starts to ignore things it is getting used to. At least give your dashboard more contrast.

I still have one, and I do have a habit of checking upon it, but it is less useful that I hoped.

2.3.4 Notifications.

Notifications are vital, which also means that they are extremely expensive. Notifications can save you a lot of grey hair if they arrive timely and warn you about something important, but many notifications will blow your mind, they are very expensive to process.

Opinion point: This is why “free”, commercials-funded services are in reality much more expensive than those you pay for. Paid services just eat your money, you can make new. “Costless” ones are eating your life, and you are not getting a new one.

Heuristic: if you cannot keep your notifications at their place, the (bad) trick is to subscribe to too many. Yes, you are loosing important ones, and losing the ability to get early notices, but this is still better than having your life eaten by ads.

Another important point is to get notifications “when and where” you need them. It is not much help to get an important notification from your server while you are driving your car. You cannot react on it, and thus you are: (1) losing energy on processing this notification, (2) losing energy on rescheduling it, (3) maybe wasting time on mitigating it.

2.3.5 Notifications are turning into your TODO items.

Is that obvious?

Essentially, there are two ways of getting new “TODO” items into your list:

  1. Notifications
  2. Exploration

TODO items is what the skeleton of your life consists of. It is important to notice that the organism does not only consist of the skeleton. The “taste” of life, the “moments of happiness” are impossible to get planned, but if you do not have a solid skeleton, those “happy moments” have nothing to get entangled in and hooked upon.

2.3.6 Items are directories in your “virtual file system”.

This is not obvious! Why aren’t “files” those items? Informally, because files can be seen as different “faces”, “views” of the same “thing”.

In fact, you never know when things that you are experiencing in your life are going to grow in abstraction, and turn from a file into a directory. It is better to always start from directories. (WWW Consortium agrees with me:

But the point is – you never know where they will go. If you are going to a dancing party and making a directory for a ticket purchase, it may later turn into a directory for dancing textbooks and videos, or maybe into a directory of cocktail recipes, or a directory of cool dancing places.

But you would still want to also keep this directory in the “tickets” catalogue.

(This is what you need symbolic links for.)

2.3.7 But my virtual file system does not match my disks and clouds?!

Yes! And this is a problem!

I am trying to use both symbolic links, and hardlinks in order to make the system VFS (virtual file system) match my brain, not the distribution of data on hard drives and clouds.

It works not very well! Suggestions welcome! But so far I have created a fairly reasonable structure from symlinks, bind mounts, and regular copies of file trees with rsync.

2.3.8 Context and tools cannot be avoided.

Even if your file system structure is decent, you will forget where you put stuff, and you will find yourself exploring your mind map as if it is alien to you. (Sometimes this is also exciting.)

Thus… help your “future self”. Annotate everything that can be annotated, you will thank yourself a million times later.

Context will also help your automatic tools be more productive. I will say a bit more of that later.

The most obvious place to add context to your files is their name. Yes, it is not very flexible, and frankly quite bodgy, but it is the only place that is at least remotely reliable in computing.

There are other places, but they are more specialised.

One more place that is worth considering – is your file headers. You can ofter put the vital context information there.

Context includes:

  1. Creation date.
  2. Modification date.
  3. Refiling date.
  4. Publication date.
  5. Author.
  6. Language.
  7. Category. (One per file)
  8. Tags. (many per file)

2.4 File system tree

2.4.1 TODO Brain data structures

How does a human’s brain work?

We have “Projects”, “Events”, and “Categories” . Projects are limited in time and scope. Events are limited time only. Categories are limited in scope only.

There are also “tags”.

Suppose you are studying Chinese. This gives you a category “Chinese”, under which you would be creating your stuff.

Suppose you are joining the University of Edinburgh. This would give you a category “Uni”.

In year 2014, autumn semester, you are joining an introductory course in Chinese, in Edinburgh.

This course is definitely a project.

You’re studying badly, pass your exam so-so, and get the artefact, the diploma.

You leave Edinburgh, but still keep studying Chinese. In your spare time you are working on the exercises from the same textbook.

Can you write your solutions into the same project? Apparently, no, as the project is already closed.

2.4.3 TODO Modelling human life with a file system.

The heuristic here is: first build your hardware/software synchronisation, later build the semantic harness.

Things to consider:

  1. Pictures and “daily data chronology”.
  2. Other People. You will be sharing some subtrees of the file system, as if you are having some common parts of the brain.
  3. Downloaded vs Personal files. You usually do not care about loosing downloaded files that much.
  4. Repositories.
  5. Raw data. It will be badly displayed on you “file system map”, so you have to think in advance how to store it.
  6. Projects by time, status, and class.
    1. By time: these are usually projects you do not expect to last long, for example, buying a theatre ticket.
    2. By status: these are projects that occupy so much of your effort that you cannot just put them into a category. You would have <7 open ones, and would make your reminder system remind you about them as often, as posible. As they are being closed, you would reduce their level of annoyance and move either into the “topical project directories”, or into the “projects by time”. You can make symlinks too.
    3. By area/category. Some areas of life naturally occupy some significant volume of your life. Put your projects there.
  7. Official documents and government interaction. You don’t want those just laying around your file system, as those are more sensitive.
  8. Your medical infromation. This is as important as a “by status” project, but by nature is a “by category” project. This data is also sensitive.
  9. Financial information. Same as above.
  10. “Incoming” directories with stuff to read and digest. Those tend to take a lot of space, so be careful.

3 Technical detail

3.1 TL;DR

run everything with time, it’s all gonna be slow.
forget cp, it will screw up your dates and perms.
sync root
SSD’s are shit, swap kills SSDs, networks are slow. rsync root and home to a backup magnetic spinner.
annotate everything because you will forget.

3.2 Concepts

3.2.1 Backups

The main difference between 3.2.2 and backups is that backups are restorable objects.

3.2.2 Data Dumps

Data Dumps are file system subtrees or, sometimes, archives, that usually appear as a result of using a non-specialised tool for “saving” some data in a dangerous situation, instead of using a special-purpose backup tool.

They are usually non-restorable.

3.2.3 Version Control Systems

Git and friends. Try to store all your text files in a VSC, it pays off.

3.2.4 Synchronisation

Reconciling the differences between two copies. Often used as opposed to merging (two conflicting copies).

3.2.5 Merging

Taking two versions of the same file, developed separately, and combining to create a single one.

3.2.6 Volatile Storage

Storage that is frequently emptied. For example, a tmpfs.

3.2.7 Acceptable time

For a personal laptop, every non-resumable operation is limited by a 8 hours time window. Because realistically, every operation should be done at most overnight. Every resumable operation is limited by 7*8=56 hours, as that is the amount of time available during the week. Practically, a backup that is more than a week old is useless.

3.2.8 Subtree merge

When you have two “more or less similar” copies of a single directory tree, you are in a big trouble. Now you have to combine them somehow, and get a “master copy”. Not easy.

3.2.9 Automatic maintenance

A well-tuned computer need to run tasks for self-maintenance. On Windows, many people were used to defragmentation and disk checking. On Linux we still need disk checking, file system checking, and several other upkeep operations.

3.2.10 Manual maintenance

Some things cannot be done by a machine. For example, when you need to connect a backup HDD. Those tasks you need to plan in advance and enforce yourself. This is hard, but worth learning how to do.

3.3 Software

3.3.1 QDirStat

A not so bad tool to find which of your directories take up too much space.

3.3.2 ncdu

A console version of 3.3.1

3.3.3 fdupes

Do not use fdupes.

3.3.4 rdfind

Do not use rdfind.

3.3.5 fsck + badblocks

Checks your file system for errors.

# -c run_badblocks_ro -c -c run_badblocks_nondestr_rw -C 0 show_progress -f force_check -k keep_old_badblocks_list -y auto_repair_yes -t -t print_time -v verbose
echo time fsck.ext4 -c -c -C 0 -f -k -y -t -t -v /dev/sdc1

3.3.6 rmlint

<2020-11-20 Fri 13:55> I found rmlint recently. It is a bit weird, but at the end of the day turned out to be more reliable and tunable.

It is an excellent tool to use for 3.2.8. Highly recommend.

3.3.7 speedtest from ookla

exists for ARM 64, and has a huge database of servers

speedtest -s 26850 would do a test to some server in Wuxi, China

3.3.8 find

If you still do not use it – it is time to start. Learn it well, and it will help you a lot when you “kinda” know where your file should be.

3.3.9 grep

The excellent “regular expression search tool” to use for content search within files you “kind of” know where they are.

3.3.10 locate

Learn it and start using it. It’s a great tool for super fast search of “stuff that was out there somewhere”.

3.3.11 recoll

It’s an amazing, very fast and efficient desktop search tool. It takes time, maybe, days to index your drive, but contrary to Gnome’s Tracker and KDE’s beagle, it actually works. The database is huge and you probably need an SSD for it.

I do not use it that much, because with a good FS structure you can be doing find/grep many times more often, and with good context you can just get by with “locate”. But in those cases when you “do not really remember”, recoll helps you “recoll”.

3.3.12 rsync

Rsync is an extremely versatile tool with an extremely fragile syntax

The following will copy everything from root to the backup root.

Combined with rmlint, it can be used as a 3.2.8 tool. In general, it is hard to use, but much-much better than just cp or scp. Lets you resume your transfers, do incremental backups, getch backups from remote machines, and a lot of similar things.

echo time rsync --links --partial --fuzzy -arHAXyh --info=progress2 /
echo time rsync -v --archive --hard-links --acls --xattrs --inplace --one-file-system --del --fuzzy --human-readable --info=progress2 --partial --dry-run from/ to
  1. Multithreading rsync is not yet implemented

    Indeed, see bug

    This is very important for modern restrictive ISPs.

  2. Rsync does not have an “–n-tries” or something argument.

    I fake it with the following code:

    time while ! rsync <...> ; do sleep 30 ; done

3.3.13 aria2

Is a very versatile tool for downloading all kind of stuff. I recommend it. It can download through ssh too! sftp is actually ssh

time a2 –max-tries=0 –ftp-user=username –ftp-passwd=<scrubbed> s

Where a2 is an alias for alias a2=’aria2c -l /tmp/RAMFS/2021-01-06T13:37:11+08:00-aria2-download.log -x120 –min-split-size=148576 –split=120 –auto-file-renaming=false’

You need an aria2-nitro patch to allow 120 connections with 100k splittings.

Important! by default, sshd has a built-in DDOS protection setting MaxStartups 10:30:100

You want to set it to 120:30:220 or something. But be wary of a real ddos.

3.3.14 git

Keep all you literary work in git. I use magit on Emacs, console git in console, and mgit on Android for my diary synchronisation.

  1. git
  2. mgit
  3. magit

3.3.15 Syncthing

A kinda fragile, but still extremely useful tool for synchronisation of machines, and it can also protect you from a bit of regrets after deleting things automatically. Put in Syncthing things that you cannot put on git.

3.3.16 exiftool

It is that tool that lets you extract all the valuable from your photographs.

3.3.17 org-mode

This is not really about organising files, but rather about creating files, but I cannot avoid mentioning it here, because org-mode is very versatile tool, and you can build a lot of your personal information management system on top of it. You can add to your files with ease. You can also add cross-references without much difficulty.

  1. Emacs org-mode

    I use to plan my tasks on the desktop, write documents and articles.

  2. orgzly

    I use it on Android to display the Agenda (todo-list) on the main screen of my Android phone. I fetch the files with, and do a 3.2.5.

  3. Markor

    I use it on Android to edit org-mode files. They are actually Markdown files, but for my purposes it doesn’t matter. Markor is really worth having a look at, because it lets you take photos and other notes with . Meanwhile tags are created as well.

  4. orgro – is a great stand-alone viewer for org-mode

3.3.18 cron

Lets your run tasks periodically. Worth learning.

3.3.19 Dropbox (Google Drive, Baidu Pan, Yandex.Disk)

Those relatively nice tools that let you donate all your data to an oligach in the name of his business interests. It is also quite convenient and works as an additional backup for your files. The killer-feature is making the files available on your phone without synchronisation.

If you can, avoid it with the help of NextCloud. But probably you won’t be able to.

3.3.20 NextCloud

A replacement for Dropbox.

3.3.21 TODO mbsync+maildir+mu+mu4e

A way to keep your email locally and read it without internet. Is there a way to use it as a file system? Or visualise?

3.3.22 TODO vdir+carddav+vdirsyncer+khard+ebdb(in progress)

A way to keep your contacts on the local disk just the same as they are on your phone. Is there a way to use it as a file system?

3.3.23 TODO vdir-cal+caldav+vdirsyncer+khal+calfw(in progress)

A way to keep your diary records on your local disk for indexing an search. Is there a way to use it as a file system?

3.3.24 Skyperious (or other chat log backup tool)

Lets you backup Skype.

3.3.25 RSS

A relatively unified format for keeping archives of things that have

3.3.26 Conky

A tool to create a 2.3.3 on your local computer. Useful for outputting the scripts that check that all of your synchronisation machinery works.

3.3.27 etckeeper

Lets you backup your /etc with the help of 3.3.14.

3.3.28 eXtended ATTRibutes

A way to add valuable to your files. The tools are called setfattr and getfattr.

3.3.29 smartctl+smartd

You must have them configured. They are the only thing that can give you at least some thing of a warning before you HDD dies.

3.3.31 TWRP backups that are like data.ext4.win000

are actually tar files, and can be unpacked as tar xvf

3.3.32 libguestfs lets you mount Windows vhdx backups

It needs hivex and supermin. Damn, a lot of work. I build hivex from github tarballs, but redhat disapproves this.

guestmount –add yourVirtualDisk.vhdx –inspector –ro /mnt/anydirectory

I didn’t manage to mount this with qemu 5.0. Trying to build 5.4.0rc4. Also failed. And Windows 10 in vbox failed. Perhaps, the file is actually broken.

3.4 Hardware

3.4.1 A vps.

Living without a VPS is hardly possible nowadays. You need it for every task which needs a public address.

You may have to keep some data there, but it is usually expensive, and it is “another person’s machine”. So ideally, the file system should be encrypted (from the hosting company’s technicians).

3.4.2 A NAS (home server).

You need it, because you cannot keep all your data with you all the time. This is where you will keep most of your data, as well as backups.

3.4.3 Your laptop.

You have to be prepared that you may drop it and it will die. Or you root ssd will die. Or your home ssd will die. But you will still keep your useful data there.

  1. Root ssd.
  2. Home ssd.

3.4.4 Your phone.

It is worth designing your system in such a way that losing your phone is sad, but not too much. You need to backup or synchronise at least:

  1. Contacts (see 3.3.22)
  2. Logins (you can use some special service, but Android built-in is probably enough)
  3. Data (look at 3.3.15)

Encrypting your phone will likely make it a huge pain to extract data from your phone if the screen is broken. I used to belive that having a regular backup (a nandroid or a dd image) is a good idea, but not any more. Keep your stuff on your laptop or NAS, not your phone.

If it has locked forever – just break it with a hammer and destroy the memory.

3.4.5 Oligarch’s cloud (Baidu, Yandex, Google).

Is usually integrated with a messaging/scheduling service, and is thus convenient. You will need it as an interface to your friends who are controlled by the machine anyway.

3.4.6 Backup disks.

For root drives, it is enough to keep a backup root drive in every machine and do an rsync backup every night. You can just plug the drive instead of the main one if the main one dies. For the laptop, you have to carry a drive with you, because laptops usually don’t have 4 drive bays.

You need at least two backup drived for your laptop, because ssd’s tend to die “suddenly”. So you will have an ssd drive to backup your home partition every day quickly, and a more reliable hdd to backup the data overnight.

3.4.7 USB Sticks

Those tend to die extremely quickly, so have a few in your pocket, ideally, identical. It’s worth having them linux-bootable, and have a few root directories:

  1. Windows installers
  2. Windows portable software
  3. Portable documents (which you delete at each plug-in)
  4. Permanent documents (your passport photo and such), that you may urgently need to show to the police and other officers.

3.5 Canned tricks

3.5.1 TODO Mount some stuff on Android as ramfs

Doesn’t work yet for “all apps”, only root ones

while [ "$(getprop sys.boot_completed)" != 1 ];
sleep 1

su -mm -c mount -v none -t tmpfs -o size=4g,nosuid,nodev,noexec,noatime,context=u:object_r:sdcardfs:s0,uid=0,gid=9997,mode=0777 /mnt/runtime/write/emulated/0/Download/tmpfs-cleared-on-reboot > /sbin/.magisk/img/.core/mount.stdout.magisk.log 2> /sbin/.magisk/img/.core/mount.stderr.magisk.log

3.5.2 Make a data flow for your Personal Cloud

I have a separate “howto”, which is more of an example, on how to draw a data flow diagram for your cloud. Visit the article.

3.5.3 Use to see what your personal brain is like

3.5.4 Mount /tmp as tmpfs, or at least /tmp/RAMFS as tmpfs

tmpfs            /tmp/RAMFS       tmpfs  nosuid,nodev,noexec,sync,dirsync,size=4G,mode=1777         0    0

Note: I mount /tmp as tmpfs, and /tmp/RAMFS as tmpfs-noexec. /tmp needs to be exec for building packages.

3.5.5 Set up your torrent client to pick up files from ramfs (to avoid storing them)

Transmission->Edit->Preferences->Downloading->Automatically add files from-> /tmp/RAMFS

Transmission keeps .torrent files in ~/.config/transmission/torrents

3.5.6 Set up Firefox do download stuff to ramfs

I didn’t find how to do this, I basically mounted /tmp as ramfs. Firefox creates a directory called “${usename}_firefox0”

3.5.8 Kill desktop.ini

echo find . -iname 'desktop.ini' -delete

3.5.9 Remove empty files

find . -type f -empty -delete

3.5.10 Remove empty directories

Worth doing after 3.5.9

find . -type d -empty -delete

3.5.11 Remove duplicated

echo rmlint -T "df" -c progressbar:fancy --progress --no-crossdev --match-basename --keep-all-tagged --hidden --must-match-tagged ~/Incoming/ // ~/good-dir

Run on about 400 GB.

lockywolf@delllaptop:~/Incoming$ rmlint -T "df" -c progressbar:fancy --progress --no-crossdev --match-basename --keep-all-tagged --hidden --must-match-tagged ~/Incoming/ ~/BACKUP/ //  ~/books/ ~/Data/
⦃⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⦄              Traversing (585350 usable files / 0 + 0 ignored files / folders)
⦃⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⦄                  Preprocessing (reduces files to 107623 / found 0 other lint)
⦃⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⦄   Matching (59120 dupes of 44798 originals; 0 B to scan in 0 files, ETA: 50s)

==> In total 585350 files, whereof 59120 are duplicates in 44798 groups.
==> This equals 81.36 GB of duplicates which could be removed.
==> Scanning took in total  1h 29m 10.406s.

Wrote a sh file to: /home/lockywolf/Incoming/
Wrote a json file to: /home/lockywolf/Incoming/rmlint.json

3.5.13 badblocks 2.73 Tb HDD 3543 minutes ~59 hours

declare DISK
cd /root
# -b block_size -w destructive -s show_percentage -v verbose
#echo time badblocks -b 4096 -n -s -v -o /root/"$(date --iso)"-sdc.badblocks "/dev/sdc"
echo time badblocks -b 4096 -w -s -v -o /root/"$(date --iso)"-"$DISK".badblocks "/dev/$DISK"

With USB2.0 estimated time – ~75 hours? Hm…

Didn’t write down the USB3 speed.

Esata speed 0.1% – 1:50. Which makes is ~33 hours to just write one pattern?

3.5.14 Smartphone TWRP backup

Manual TWRP without “sdcard”, all partitions, via TWRP to an OTG disk takes:

3389 seconds, that is roughly ~1 hour, in total 41Gb.

Storage is not yet measured Started <2020-06-19 Fri 20:28> <2020-06-20 Sat 00:10> – Not finished 4 hours is not enough, but the phone battery is dead.

3.5.15 Measuring disk speed

echo dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
echo dd if=/dev/zero of=/tmp/test2.img bs=512 count=1000 oflag=dsync
  1. Practical USB 2.0 speed on my machine – 30 MB/s
  2. Maximal USB 3.0 speed on my machine > 109 MB/s
  3. Practical eSATA speed on my machine 91 MB/s

3.5.17 rsync

  1. rsync root->/mnt/hd 184 minutes after a long time non-backups
  2. rsync root->/mnt/hd 24 minutes after a chrome update
    echo time rsync --archive --hard-links --acls --xattrs --inplace --one-file-system --delete-before --fuzzy --human-readable --info=progress2 --partial / /mnt/hd/ --exclude='/tmp/*'
  3. rsync root->/mnt/backup_root/ (SSD->SSD) ≈ 6 min
  4. rsync home incremental backup ≈ 22Gb, 25 min

3.5.20 TODO Tag files from packages with an eXtended ATTRibute

Rewrite it with a batch like xargs or smth.

So I added the following into /sbin/installpkg at line 659 (after the untar-if):

while read -r
if [[ -f "$ROOT/$REPLY" ]]
setfattr -n user.beta.slackware.pkgtools.package \
-v "$shortname" \
-h \
done < <(cat "$TMP/$shortname")

TODO: do a package-time tagging

3.5.22 TODO Every file should be tagged:

  • by the package it comes from
  • by the program that has created it
  • by the user who last used it: uid and username

Should be doable with a kernel module, or something.

Kprobes? uprobes? systemtap?

3.5.23 TODO Meaningful tagging, automatic

Supposedly, tracker should be tagging stuff automatically. Didn’t try it.

3.6 Organised Protocol

This section is trying to roughly outline in which manner things above should be implemented. A good file system structure almost automatically implies a good backup system, because you have to fetch the “subtrees” from all the devices that you have, or/and export the data from “silos”. And if you have that tree-building system in place, you may just as well add a backup system too.

3.6.1 Automatic backups

You will likely have to backup everything that is listed in the “software” above. When doing backups you have to consider the following:

  1. When to backup. You cannot fetch data when you are offline, and too often you cannot guarantee that you may be online.
  2. How to backup. You do not want to clone your hard drive over a metered connection.
  3. How to rotate backups. If your backups occupy too much space, this space is lost.
  4. How to make sure that your backups succeed. A good backup is, by definition, unnoticeable, you shouldn’t remember it exists, unless you need it. But this means that you don’t have a reminder (by default), when your backup fails.

I generally try to support each backup procedure with two auxiliary subprograms.

One run together with the backup itself, and notifies me if something goes wrong. The backup itself does not notify me, because the backup often goes wrong. E.g., some backups I run every 5 minutes, but they succeed only once or twice a day, because only at that time the required device is nearby. But there should still be a service that sends you an email if backups are failing for too long.

The second proceduree updates my Dashboard and paints bright red things that have obsolete backups.

I have a “BACKUP” directory, on my laptop, which is roughly 2-level structured, as in: class/application, e.g. 01_Messengers/ICQ. This directory is getting “scheduled backups” every time it can, with regularity ranging from 5 minutes to 1 day.

My rough list:

  • 00_Etc-Git-All-Machines
  • 01_Browser-settings-sessions
  • 02_Hardware-Data-Stock-Setups-Manuals
  • 03_Contacts
  • 04_Organiser
  • 05_Lists-of-Apps-If-I-Forget-Where-Data-Was-Stored
  • 06_SMS-Message-Databases
  • 08_Chat-Logs
  • 10_News-Feeds
  • 11_Voice-Message-Recordings
  • 12_Call-Logs
  • 13_Directories-Lists
  • 15_Diaries-Paper
  • 99_Data-Dumps

Additional operation to do overnight:

  1. Reindex locate
  2. Reindex recoll
  3. Update softlinks on the basis of xattrs
  4. Run BOINC (in the name of the greater good)
  5. Update file tags semantically – run text OCR on PDFs, images, entitity detection, all the cool “NN-related” stuff.

3.6.2 Yearly plan.

Not everything can be done automatically. The easiest example is the portable SSD that you backup your laptop to. You need to plug it in, as modern laptops just do not have a bay for a separate drive. This is getting even more important as new laptops have the NVMe storage soldered onto the motherboard, so if you break your laptop, you cannot even take the drive out.

This section has a rough outline of what is probably worth doing.

  1. At hardware purchase.
    • Add it to your hardware database (list).
    • Do badblocks. (Takes days.)
    • Enable “smart” monitoring and schedule scans.
    • Do an “extended” smart scan.
  2. Every January.
    1. Badblocks on all devices.
    2. fsck on all file systems.
    3. Revise address book.
    4. Revise calendar for the last year.
    5. Revise file system / Run filesystem linter-mapper, revise FS.
    6. Revise your backup system, manually rotate backups that need manual rotation.
    7. Re-check your Data Flow.
  3. Every first day of month.
    1. Regenerate and print your File System Map.
    2. Backup heavy systems that take a lot of space.
    3. Update your phone. (If your phone manufaturer is not crazy and you do not risk killing your phone).
    4. Clean your phone.
    5. If you are running a “rolling release” system (e.g. Windows 10), update you system.
  4. Every Evening
    1. rsync your laptop hdd to the NAS
    2. rsync your laptop home to a spinner
    3. rsync your laptop root to a spinner
  5. Every Morning
    1. rsync home to backup ssd
    2. rsync root to backup ssd
    3. rsync your laptop root to a failsafe partition
    4. patch your systems for vulnerabilities
      1. laptop
      2. nas
      3. vps