How to create a decent file system structure and not become insane.
I wanted to turn this file into an essay on how to make a decent file system tree. I failed, as this task turned out to be unimaginably harder than I had expected. I, thus, promote this file to the status of a "living document", where I may be adding feature as I happen to find them convenient.
Computers are unreliable, misleading and oftentimes overtly lying. It is possible to make computers efficiently assist you in everything you do. However, learning to do so requires managing a large amount of material and takes a lot of time. This file proposes a few guide lines that the author found helpful in managing the computer’s file system structure. Even though there are many services that provide “outsourced” management of certain kinds of computer resources (such as Google Photos, Gmail, WordPress), and they may be used when appropriate, it is still necessary to understand the underlying principles of data management. Ignoring them leads to a chaos that is hard to navigate, only in the case of online services it is offloaded in to the public computing system.
Generally speaking, this document consists of three intertwined topics: brain modelling with a file system graph, making backups, and managing tasks. No doubt, it is much more leaning toward the author’s own style of managing personal data, but hopefully, there may be a way to reuse the ideas for the readers’ benefit.
1. Certain disclaimers, combating illusions.
1.1. Everything will be slow.
Sometimes you see certain numbers advertised by equipment manufacturers, shops, service providers, et ceters. Do not believe them, test everything yourself. Your HDDs will be 1.8 Tb instead of 2Tb. Not a huge lie, is it, no more than 10%. But if you have your system planned out to byte, it’s going to be a huge waste of money and time to buy a disk that doesn’t fit your requirements.
Somebody promises you 150 Mb/s speed on a wired channel? You’re probably already aware of the fact that official numbers are exaggerated, right? So you reasonably make a discount of, how much? Like, 10%, right? 130 Mb/s? You are wrong. In an adversarial case, that is a real case that is created by various interacting components at your system, you are going to attain 1.5 Mb/s at best. Divide by 100 ever marketing promise, that’s going to lead to less disappointment.
1.2. Everything will be unreliable.
The only reliable medium nowadays is paper. Yeah, if you’re Byron or Mo Yan, your ouvres may happen to be mirrored by human memory, but I wouldn’t rely on that.
HDDs, SSDs, everything fails. Moreover, everything fails often.
1.3. The cloud is just another person’s computer.
Don’t get me wrong, having someone take care about your data is a great stress-relief. Just do not overestimate the reliability of those services. At some point you will buy a holiday trip to the country where your cloud provider is, not even blocked, just choked by the low quality of the network.
1.4. Reservation.
Everything needs to be copied five times:
- Master copy. That is your computer and/or phone.
- Local backup. SSD, HDD, USB-stick, home server, etc.
- Local snapshot.
- Oligarch’s cloud (Google, Apple, Baidu, Yandex, whatever).
- Your own cloud.
1.5. Threat model and emergencies.
The previous section, Reservation , may seem a bit radical to many people, but it is justified by convenience more than by paranoia. When you need to access your data, you frequently need the easiest, the most convenient way of reaching it. Redundancy here is a way to achieve ease.
Naturally, redundancy is an opposite of deniability. The more copies of data you have, the harder is to clear up the evidence.
Therefore, the two biggest threats that this article is considering, are:
- Hardware failures
- The lack of connectivity
Unauthorised access is considered a problem, but of a less importance than losing data.
In particular, this manual suggests daily backups of the whole disk of your computer. The easiest way to achieve this, is to have a spare backup SSD in your backpack, and do a disk synchronisation every morning at the beginning of the day. With rsync and USB3 this should be fast enough.
This is not automatic, but making things manually is less likely to silently fail. However, this also means that everyone, who has access to your backpack, can steal all of your important data, by stealing the backup SSD, which is way more portable than your laptop.
In addition, although SSDs are fast, they also die quickly. Therefore, as second backup to a spinning magnetic medium is recommended too.
But magnetic media are slow, therefore I would recommend doing magnetic backups overnight. Although a lot of technical data on the laptop disk is changed at night, when indexers and disk upkeep utilities are doing their job, the important data would still be saved.
The magnetic disk is better to be left at home. This means that, again, a robber may get access to it, but you are at least partly ensured from losing all your data together with the laptop, which you may just forget somewhere.
1.6. Restoring.
The best backup is the one that is easy to restore.
Restoring a laptop backup, ideally, should involve only replacing the internal SSD with a backup copy.
Restoring a dead NAS should, ideally only involve replacing a dead rood ssd/sdcard with a nightly backup.
On servers, perhaps, a RAID-1 (mirror) is good enough, if you have a rebuild command written in some very accessible place.
Unfortunately, doing a backup on a smartphone is much harder. Although it is possible to make disk images with netcat and dd (google it), restoring those images may prove to be infeasible due to encryption and other digital signatures mindfuck. The answer would be to never keep anything on your smartphone that is not integrated into your main brain model. (Be it a laptop HDD or a cloud drive.)
So, when you lose/break you phone, you still have to reinstall the apps you had, but that is not that much of a problem, since most of that data is in the cloud anyway. The rest can be synced back with Syncthing.
An Oligarch’s cloud is likely to be good enough to keep your phone data (Google, Huawei, Samsung), but it is prone to banning. Therefore, having a personal cloud (even be it way less performant), allows you to quickly switch to an alternative storage, when you are banned.
The parts of your brain that are not at an Oligarch’s cloud, can be fetched from the phone over Wifi with Syncthing, as long as it gets connected to the mother-ship at least once in a while.
2. What is data
2.1. Our life in data.
In this world, there is a digital model of you. It is not a single model, and it is mostly not entirely an electronic model, it may very well be on paper, or rather, papers, held by various institutions that you had happened to give your data to.
Your schools probably hold a bit of files on you, your work units, and your military service. Police, even if you haven’t done anything wrong, and just drive a car, has a profile on you. Of course, Google an Alibaba have something. You boyfriends and girlfriends, relatives and pals.
Of course, the person who holds most on you is probably your enemy, if you are honoured to have one. Ow, how exciting it would have been to find out what you enemy has on you. He’s not you. While for most people, the choice between sorting out a personal archive and having a game of Dota is not at all obvious (even for the most conscientious of us), for our enemy it is crystal-clear, nobody would be mining data on you with the same diligence.
We do not usually think about this data in terms of a file system. However, from a data engineer perspective, it is a distributed virtual system of data blocks. Not every data block is on an actual disk, but each one has some kind of an “address” via which it can be reached.
These addresses are usually incompatible, but why can’t they be made compatible? At least some of them can.
Now I have to start speaking with a bit of technicality. For example, the subsystem called “fuse” allows a programmer, with a bit of work, make a huge variety of addresses compatible with the addresses that your files have on your computers.
Is the file system system of addresses exhaustive or the best possible? Likely, no. But, surprisingly, the set of primitives that we have developed for working with files, while staying minimal, is still tremendously powerful. You can exceed this power, indeed, but this requires a giant increase in complexity.
2.2. Mind mapping.
“Mind mapping” as a name was invented by Tony Buzan to describe his own paper-based protocol for recording data.
It is often known under the name of “concept mapping”, and is frequently praised as a “totally different way of thinking”, but really is just the popular explanation of graph theory.
Which does not make it worse, of course, and in fact, noticing this immediately made me think “hand on, but my file system is also a graph”.
Indeed, under relatively mild restrictions (mind map conditions), a lot of things can be represented as graphs, and graphs easily map onto a file system structure. Conversely, this immediately leads us to the conclusion that our data can be visualised as a graph, and this can give us useful insights.
With this thought in mind, I tried drawing down the “my life” as a mind-map-like graph, placing various aspects of my life on that sketch.
Many things went there: studying, job, friends, relatives, hobbies, various government and society-related things.
And immediately it became apparent to me, that:
- Each program is a tiny brain; and its structure is completely different from mine. It is, thus, very counterproductive to just naively join them. (This is what you can call a DOS Way.)
- The file system structure typically proposed by a pre-made device (such as Pictures, Videos, Documents) is also mostly inadequate to describe such a complex creature as a human. (you can call it a “Unix Way”)
Okay, so the two most popular approaches are not working and will never work.
What is left to do?
Well, that is why this document is called a “living document”. I haven’t found the answer. However, I found a few tricks that have made my life easier.
2.3. Basic heuristics
There are a few tricks that are worth considering when designing a human-interacting system.
2.3.1. Important items should be close to where you can see them.
There is a noticeable disparity between the places we expect to see things, and where we really see them.
You may have an excellent task manager, but it will not be of use if you do not open it. And conversely, if you see unexpected things where they should not be, you are more likely to react upon them.
Imagine your wife leaving you a message inside the code file or the document you are currently working on. You are much more likely do something!
Actionable insight I am configuring my system to put reminders, and notifications right at the home directory.
2.3.2. Human attention is limited by 7 items.
The title is a little be clickbait-ish; in fact, human attention volume can be larger, say, up to 14 items, but the scale is about that large. If you are having more items in a directory, your brain will select it’s own native number of items (7-14), and will ignore the rest.
It particular, it means that it is likely that each directory you make, in general should have no more than “your natural” number of items. Self-check: my home directory (“~/”) has 28 items, and I ignore most of them, except about 7. However, I do notice all unexpected files in a directory quite quickly.
This number is trainable, as most human skills are, but not extensively. You, perhaps, can raise it from 7 to 14, but not to 50.
However, I know two ways of “tricking” this number.
The first way is to give shorter names, or even hide the directories that “you know are there”. Since you remember that they are there, your brain ignores them when it sees them, but you can still see them at the visualisation interface (make sure you have one).
The second exception to this rule is case when items are somehow dependent on each other. If the items have some natural ordering (perhaps according to some date, or a human name), you can have more than your “fixed number”.
Why am I so keen on increasing this number? Can’t we just make groups and subgroups? The answer is “not really”.
Each time you go inside a directory, you are having a context switch
, which means you are loosing a bit of context.
In other words, the depth of your file system tree also matters.
It matters less than breadth, but still.
Keep your brain data structures tight.
2.3.3. A dashboard.
A dashboard is a misleading thing. Remember the trick that I have given in the previous chapter that can be used to increase the number of items in a directory? (Adding “implicit” items that your brain ignores.)
Here we see the same effect, but in the opposite direction. If you have a dashboard, you are getting “a feeling” that you are up to date with the information, but in reality, your brain starts to ignore things it is getting used to. At least give your dashboard more contrast.
I still have one, and I do have a habit of checking upon it, but it is less useful that I hoped.
2.3.4. Notifications.
Notifications are vital, which also means that they are extremely expensive. Notifications can save you a lot of grey hair if they arrive timely and warn you about something important, but many notifications will blow your mind, they are very expensive to process.
Opinion point: This is why “free”, commercials-funded services are in reality much more expensive than those you pay for. Paid services just eat your money, you can make new. “Costless” ones are eating your life, and you are not getting a new one.
Heuristic: if you cannot keep your notifications at their place, the (bad) trick is to subscribe to too many. Yes, you are loosing important ones, and losing the ability to get early notices, but this is still better than having your life eaten by ads.
Another important point is to get notifications “when and where” you need them. It is not much help to get an important notification from your server while you are driving your car. You cannot react on it, and thus you are: (1) losing energy on processing this notification, (2) losing energy on rescheduling it, (3) maybe wasting time on mitigating it.
2.3.5. Notifications are turning into your TODO items.
Is that obvious?
Essentially, there are two ways of getting new “TODO” items into your list:
- Notifications
- Exploration
TODO items is what the skeleton of your life consists of. It is important to notice that the organism does not only consist of the skeleton. The “taste” of life, the “moments of happiness” are impossible to get planned, but if you do not have a solid skeleton, those “happy moments” have nothing to get entangled in and hooked upon.
2.3.6. Items are directories in your “virtual file system”.
This is not obvious! Why aren’t “files” those items? Informally, because files can be seen as different “faces”, “views” of the same “thing”.
In fact, you never know when things that you are experiencing in your life are going to grow in abstraction, and turn from a file into a directory. It is better to always start from directories. (WWW Consortium agrees with me: https://www.w3.org/Addressing/)
But the point is – you never know where they will go. If you are going to a dancing party and making a directory for a ticket purchase, it may later turn into a directory for dancing textbooks and videos, or maybe into a directory of cocktail recipes, or a directory of cool dancing places.
But you would still want to also keep this directory in the “tickets” catalogue.
(This is what you need symbolic links for.)
2.3.7. But my virtual file system does not match my disks and clouds?!
Yes! And this is a problem!
I am trying to use both symbolic links, and hardlinks in order to make the system VFS (virtual file system) match my brain, not the distribution of data on hard drives and clouds.
It works not very well! Suggestions welcome! But so far I have created a fairly reasonable structure from symlinks, bind mounts, and regular copies of file trees with rsync.
2.3.8. Context and tools cannot be avoided.
Even if your file system structure is decent, you will forget where you put stuff, and you will find yourself exploring your mind map as if it is alien to you. (Sometimes this is also exciting.)
Thus… help your “future self”. Annotate everything that can be annotated, you will thank yourself a million times later.
Context will also help your automatic tools be more productive. I will say a bit more of that later.
The most obvious place to add context to your files is their name. Yes, it is not very flexible, and frankly quite bodgy, but it is the only place that is at least remotely reliable in computing.
There are other places, but they are more specialised.
One more place that is worth considering – is your file headers. You can ofter put the vital context information there.
Context includes:
- Creation date.
- Modification date.
- Refiling date.
- Publication date.
- Author.
- Language.
- Category. (One per file)
- Tags. (many per file)
2.4. File system tree
2.4.1. TODO Brain data structures
How does a human’s brain work?
We have “Projects”, “Events”, and “Categories” . Projects are limited in time and scope. Events are limited time only. Categories are limited in scope only.
There are also “tags”.
Suppose you are studying Chinese. This gives you a category “Chinese”, under which you would be creating your stuff.
Suppose you are joining the University of Edinburgh. This would give you a category “Uni”.
In year 2014, autumn semester, you are joining an introductory course in Chinese, in Edinburgh.
This course is definitely a project.
You’re studying badly, pass your exam so-so, and get the artefact, the diploma.
You leave Edinburgh, but still keep studying Chinese. In your spare time you are working on the exercises from the same textbook.
Can you write your solutions into the same project? Apparently, no, as the project is already closed.
2.4.2. TODO Directory hard links
A file system is a tree. A git repo is a DAG, Directed Acyclic Graph.
You can traverse a tree naively. You can traverse a DAG in a smart way.
However, human brain is not a tree, and not even a DAG. It’s a general-purpose DG.
You can traverse a DG too, but you need to be much smarter than usual.
How would you like to organise your brain? Keep in mind that there should be some data structures available for shared usage with other people and robots.
Let’s take a simple example.
You have a directory called ~/Music , where you put your music. You have a directory called ~/People/Mom , where you put things that are related to your mom.
For example, your Mom likes a band called Gogol Bordello, and you also like the same band. Would you put it into ~/Music/Gogol-Bordello, or into ~/People/Mom/Gogol-Bordello? The problem is exacerbated by the fact that you may need to update the names of directories.
In the file system visualiser, I am using soft links to associate directories. But in general there seems to be no good solution yet.
2.4.3. TODO Modelling human life with a file system.
The heuristic here is: first build your hardware/software synchronisation, later build the semantic harness.
Things to consider:
- Pictures and “daily data chronology”.
- Other People. You will be sharing some subtrees of the file system, as if you are having some common parts of the brain.
- Downloaded vs Personal files. You usually do not care about loosing downloaded files that much.
- Repositories.
- Raw data. It will be badly displayed on you “file system map”, so you have to think in advance how to store it.
- Projects by time, status, and class.
- By time: these are usually projects you do not expect to last long, for example, buying a theatre ticket.
- By status: these are projects that occupy so much of your effort that you cannot just put them into a category. You would have <7 open ones, and would make your reminder system remind you about them as often, as posible. As they are being closed, you would reduce their level of annoyance and move either into the “topical project directories”, or into the “projects by time”. You can make symlinks too.
- By area/category. Some areas of life naturally occupy some significant volume of your life. Put your projects there.
- Official documents and government interaction. You don’t want those just laying around your file system, as those are more sensitive.
- Your medical infromation. This is as important as a “by status” project, but by nature is a “by category” project. This data is also sensitive.
- Financial information. Same as above.
- “Incoming” directories with stuff to read and digest. Those tend to take a lot of space, so be careful.
3. Technical detail
3.1. TL;DR
- time
- run everything with time, it’s all gonna be slow.
- rsync
- forget
cp
, it will screw up your dates and perms. - sync root
- SSD’s are shit,
swap
kills SSDs, networks are slow.rsync
root and home to a backup magnetic spinner. - annotate
- annotate everything because you will forget.
3.2. Concepts
3.2.1. Backups
The main difference between 3.2.2 and backups is that backups are restorable objects.
3.2.2. Data Dumps
Data Dumps are file system subtrees or, sometimes, archives, that usually appear as a result of using a non-specialised tool for “saving” some data in a dangerous situation, instead of using a special-purpose backup tool.
They are usually non-restorable.
3.2.3. Version Control Systems
Git and friends. Try to store all your text files in a VSC, it pays off.
3.2.4. Synchronisation
Reconciling the differences between two copies. Often used as opposed to merging (two conflicting copies).
3.2.5. Merging
Taking two versions of the same file, developed separately, and combining to create a single one.
3.2.6. Volatile Storage
Storage that is frequently emptied. For example, a tmpfs.
3.2.7. Acceptable time
For a personal laptop, every non-resumable operation is limited by a 8 hours time window. Because realistically, every operation should be done at most overnight. Every resumable operation is limited by 7*8=56 hours, as that is the amount of time available during the week. Practically, a backup that is more than a week old is useless.
3.2.8. Subtree merge
When you have two “more or less similar” copies of a single directory tree, you are in a big trouble. Now you have to combine them somehow, and get a “master copy”. Not easy.
3.2.9. Automatic maintenance
A well-tuned computer need to run tasks for self-maintenance. On Windows, many people were used to defragmentation and disk checking. On Linux we still need disk checking, file system checking, and several other upkeep operations.
3.2.10. Manual maintenance
Some things cannot be done by a machine. For example, when you need to connect a backup HDD. Those tasks you need to plan in advance and enforce yourself. This is hard, but worth learning how to do.
3.3. Software
3.3.1. QDirStat
A not so bad tool to find which of your directories take up too much space.
3.3.3. fdupes
Do not use fdupes.
3.3.4. rdfind
Do not use rdfind.
3.3.5. fsck + badblocks
Checks your file system for errors.
#switches to fsck.ext4 mean the following: -c run_badblocks_ro -c -c run_badblocks_nondestr_rw -C 0 show_progress -f force_check -k keep_old_badblocks_list -y auto_repair_yes -t -t print_time -v verbose echo time fsck.ext4 -c -c -C 0 -f -k -y -t -t -v /dev/sdc1
3.3.6. rmlint
rmlint
recently. It is a bit weird, but at the end of the day turned out to be more reliable and tunable.
It is an excellent tool to use for Subtree Merge. Highly recommend.
3.3.7. speedtest from ookla
https://www.speedtest.net/apps/cli
exists for ARM 64, and has a huge database of servers
speedtest -s 26850 would do a test to some server in Wuxi, China
3.3.8. find
If you still do not use it – it is time to start. Learn it well, and it will help you a lot when you “kinda” know where your file should be.
3.3.9. grep
The excellent “regular expression search tool” to use for content search within files you “kind of” know where they are.
3.3.10. locate
Learn it and start using it. It’s a great tool for super fast search of “stuff that was out there somewhere”.
3.3.11. recoll
It’s an amazing, very fast and efficient desktop search tool. It takes time, maybe, days to index your drive, but contrary to Gnome’s Tracker and KDE’s beagle, it actually works. The database is huge and you probably need an SSD for it.
I do not use it that much, because with a good FS structure you can be doing find/grep many times more often, and with good context you can just get by with “locate”. But in those cases when you “do not really remember”, recoll helps you “recoll”.
3.3.12. rsync
Rsync is an extremely versatile tool with an extremely fragile syntax
The following will copy everything from root to the backup root.
Combined with rmlint
, it can be used as a 3.2.8 tool.
In general, it is hard to use, but much-much better than just cp or scp.
Lets you resume your transfers, do incremental backups, getch backups from remote machines, and a lot of similar things.
echo time rsync --links --partial --fuzzy -arHAXyh --info=progress2 / echo time rsync -v --archive --hard-links --acls --xattrs --inplace --one-file-system --del --fuzzy --human-readable --info=progress2 --partial --dry-run from/ to
- Multithreading rsync is not yet implemented
Indeed, see bug https://github.com/WayneD/rsync/issues/131
This is very important for modern restrictive ISPs.
- Rsync does not have an “–n-tries” or something argument.
I fake it with the following code:
time while ! rsync <...> ; do sleep 30 ; done
3.3.13. aria2
Is a very versatile tool for downloading all kind of stuff. I recommend it. It can download through ssh too! sftp is actually ssh
time a2 –max-tries=0 –ftp-user=username –ftp-passwd=<scrubbed> sftp://server.lockywolf.net:22/mnt/hd/file.txt
Where a2 is an alias for alias a2=’aria2c -l /tmp/RAMFS/2021-01-06T13:37:11+08:00-aria2-download.log -x120 –min-split-size=148576 –split=120 –auto-file-renaming=false’
You need an aria2-nitro patch to allow 120 connections with 100k splittings.
Important! by default, sshd has a built-in DDOS protection setting MaxStartups 10:30:100
You want to set it to 120:30:220 or something. But be wary of a real ddos.
3.3.14. git
Keep all you literary work in git. I use magit on Emacs, console git in console, and mgit on Android for my diary synchronisation.
3.3.15. Syncthing
A kinda fragile, but still extremely useful tool for synchronisation of machines, and it can also protect you from a bit of regrets after deleting things automatically. Put in Syncthing things that you cannot put on git.
3.3.16. exiftool
It is that tool that lets you extract all the valuable from your photographs.
3.3.17. org-mode
This is not really about organising files, but rather about creating files, but I cannot avoid mentioning it here, because org-mode is very versatile tool, and you can build a lot of your personal information management system on top of it. You can add to your files with ease. You can also add cross-references without much difficulty.
- Emacs org-mode
I use to plan my tasks on the desktop, write documents and articles.
- orgzly
- Markor
I use it on Android to edit org-mode files. They are actually Markdown files, but for my purposes it doesn’t matter. Markor is really worth having a look at, because it lets you take photos and other notes with . Meanwhile tags are created as well.
- orgro – is a great stand-alone viewer for org-mode
3.3.18. cron
Lets your run tasks periodically. Worth learning.
3.3.19. Dropbox (Google Drive, Baidu Pan, Yandex.Disk)
Those relatively nice tools that let you donate all your data to an oligach in the name of his business interests. It is also quite convenient and works as an additional backup for your files. The killer-feature is making the files available on your phone without synchronisation.
If you can, avoid it with the help of NextCloud. But probably you won’t be able to.
3.3.20. NextCloud
A replacement for Dropbox.
3.3.21. TODO mbsync+maildir+mu+mu4e
A way to keep your email locally and read it without internet. Is there a way to use it as a file system? Or visualise?
3.3.22. TODO vdir+carddav+vdirsyncer+khard+ebdb(in progress)
A way to keep your contacts on the local disk just the same as they are on your phone. Is there a way to use it as a file system?
3.3.23. TODO vdir-cal+caldav+vdirsyncer+khal+calfw(in progress)
A way to keep your diary records on your local disk for indexing an search. Is there a way to use it as a file system?
3.3.24. Skyperious (or other chat log backup tool)
Lets you backup Skype.
3.3.25. RSS
A relatively unified format for keeping archives of things that have a natural order in time. Consider exploring RSSBridge or other anything-to-rss portals to avoid Internet Giants’ trickery.
3.3.26. Conky
A tool to create a 2.3.3 on your local computer. Useful for outputting the scripts that check that all of your synchronisation machinery works.
3.3.28. eXtended ATTRibutes
A way to add valuable to your files.
The tools are called setfattr
and getfattr
.
3.3.29. smartctl+smartd
You must have them configured. They are the only thing that can give you at least some thing of a warning before you HDD dies.
3.3.31. TWRP backups that are like data.ext4.win000
are actually tar files, and can be unpacked as tar xvf
3.3.32. libguestfs lets you mount Windows vhdx backups
https://stackoverflow.com/questions/36819474/how-can-i-attach-a-vhdx-or-vhd-file-in-linux https://download.libguestfs.org/1.43-development/libguestfs-1.43.3.tar.gz
It needs hivex and supermin. Damn, a lot of work. I build hivex from github tarballs, but redhat disapproves this.
guestmount –add yourVirtualDisk.vhdx –inspector –ro /mnt/anydirectory
I didn’t manage to mount this with qemu 5.0. Trying to build 5.4.0rc4. Also failed. And Windows 10 in vbox failed. Perhaps, the file is actually broken.
3.4. Hardware
3.4.1. A vps.
Living without a VPS is hardly possible nowadays. You need it for every task which needs a public address.
You may have to keep some data there, but it is usually expensive, and it is “another person’s machine”. So ideally, the file system should be encrypted (from the hosting company’s technicians).
3.4.2. A NAS (home server).
You need it, because you cannot keep all your data with you all the time. This is where you will keep most of your data, as well as backups.
3.4.3. Your laptop.
You have to be prepared that you may drop it and it will die. Or you root ssd will die. Or your home ssd will die. But you will still keep your useful data there.
3.4.4. Your phone.
It is worth designing your system in such a way that losing your phone is sad, but not too much. You need to backup or synchronise at least:
- Contacts (see 3.3.22)
- Logins (you can use some special service, but Android built-in is probably enough)
- Data (look at 3.3.15)
Encrypting your phone will likely make it a huge pain to extract data from your phone if the screen is broken. I used to belive that having a regular backup (a nandroid or a dd image) is a good idea, but not any more. Keep your stuff on your laptop or NAS, not your phone.
If it has locked forever – just break it with a hammer and destroy the memory.
3.4.5. Oligarch’s cloud (Baidu, Yandex, Google).
Is usually integrated with a messaging/scheduling service, and is thus convenient. You will need it as an interface to your friends who are controlled by the machine anyway.
3.4.6. Backup disks.
For root drives, it is enough to keep a backup root drive in every machine and do an rsync backup every night. You can just plug the drive instead of the main one if the main one dies. For the laptop, you have to carry a drive with you, because laptops usually don’t have 4 drive bays.
You need at least two backup drived for your laptop, because ssd’s tend to die “suddenly”. So you will have an ssd drive to backup your home partition every day quickly, and a more reliable hdd to backup the data overnight.
3.4.7. USB Sticks
Those tend to die extremely quickly, so have a few in your pocket, ideally, identical. It’s worth having them linux-bootable, and have a few root directories:
- Windows installers
- Windows portable software
- Portable documents (which you delete at each plug-in)
- Permanent documents (your passport photo and such), that you may urgently need to show to the police and other officers.
3.5. Canned tricks
In this section I have collected several tricks for helping me keep my file system tidy. Most of them can be classified into two groups:
- Remove waste
- Unpack opaque containers that have data inside.
3.5.1. TODO Mount some stuff on Android as ramfs
Doesn’t work yet for “all apps”, only root ones
#!/system/bin/sh while [ "$(getprop sys.boot_completed)" != 1 ]; do sleep 1 done su -mm -c mount -v none -t tmpfs -o size=4g,nosuid,nodev,noexec,noatime,context=u:object_r:sdcardfs:s0,uid=0,gid=9997,mode=0777 /mnt/runtime/write/emulated/0/Download/tmpfs-cleared-on-reboot > /sbin/.magisk/img/.core/mount.stdout.magisk.log 2> /sbin/.magisk/img/.core/mount.stderr.magisk.log
3.5.2. Make a data flow for your Personal Cloud
I have a separate “howto”, which is more of an example, on how to draw a data flow diagram for your cloud. Visit the article.
Unfortunately, such a file is hard (if not impossible) to generate automatically, and updating it is a pain.
But even so, getting a high-level overview of what your digital brain is like, is priceless.
3.5.3. Use https://gitlab.com/Lockywolf/scsh-xattr-mindmap to see what your personal brain is like
I generate the file system map and print it on a giant three-by-one metres poster on a wall.
3.5.4. Mount /tmp as tmpfs, or at least /tmp/RAMFS as tmpfs
tmpfs /tmp/RAMFS tmpfs nosuid,nodev,noexec,sync,dirsync,size=4G,mode=1777 0 0
Note: I mount /tmp as tmpfs, and /tmp/RAMFS as tmpfs-noexec. /tmp needs to be exec for building packages.
Why would you want that?
Because files that you download steal the space on your HDD, and, what is worse, they steal your attention when you are browsing your disk and/or searching in it.
You do not want to keep any useless files for less than needed.
3.5.5. Set up your torrent client to pick up files from ramfs (to avoid storing them)
Transmission->Edit->Preferences->Downloading->Automatically add files from-> /tmp/RAMFS
Transmission keeps .torrent files in ~/.config/transmission/torrents
This is the same idea as in the previous paragraph.
Do not store anything worthless, and save time on navigating Transmission’s interface.
3.5.6. Set up Firefox do download stuff to ramfs
I didn’t find how to do this, I basically mounted /tmp as ramfs. Firefox creates a directory called “${usename}_firefox0” and downloads stuff there.
3.5.8. Kill desktop.ini
echo find . -iname 'desktop.ini' -delete
3.5.9. Remove empty files
find . -type f -empty -delete
3.5.10. Remove empty directories
Worth doing after 3.5.9
find . -type d -empty -delete
3.5.11. Remove duplicated
echo rmlint -T "df" -c progressbar:fancy --progress --no-crossdev --match-basename --keep-all-tagged --hidden --must-match-tagged ~/Incoming/ // ~/good-dir
Run on about 400 GB.
lockywolf@delllaptop:~/Incoming$ rmlint -T "df" -c progressbar:fancy --progress --no-crossdev --match-basename --keep-all-tagged --hidden --must-match-tagged ~/Incoming/ ~/BACKUP/ // ~/books/ ~/Data/ ⦃⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⦄ Traversing (585350 usable files / 0 + 0 ignored files / folders) ⦃⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⦄ Preprocessing (reduces files to 107623 / found 0 other lint) ⦃⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⌿⦄ Matching (59120 dupes of 44798 originals; 0 B to scan in 0 files, ETA: 50s) ==> In total 585350 files, whereof 59120 are duplicates in 44798 groups. ==> This equals 81.36 GB of duplicates which could be removed. ==> Scanning took in total 1h 29m 10.406s. Wrote a sh file to: /home/lockywolf/Incoming/rmlint.sh Wrote a json file to: /home/lockywolf/Incoming/rmlint.json
3.5.13. TotalCommander for Android has and sftp client
You can use it to share files over your VPS right from your phone.
3.5.14. TODO Tag files from packages with an eXtended ATTRibute
Rewrite it with a batch like xargs or smth.
With pkgtools-15.0-noarch-41, the installpkg can be modified to obtain the $SUBJ behaviour by adding
( # line 670 cd $ROOT/ grep -v '^install' "$TMP/$shortname" > "$TMP/$shortname"_noinst xargs --arg-file="$TMP/$shortname"_noinst --delim='\n' setfattr --name=user.slackware_v1.installpkg.package_name --value="$shortname" rm -f "$TMP/$shortname"_noinst )
to line 670 of /sbin/installpkg
and makepkg can be modified by adding
find ./ -type f -exec setfattr --name=trusted.slackware_v1.makepkg.package_name "--value=${TAR_NAME}" {} + # line 414
to line 414 of /sbin/makepkg
3.5.15. TODO a hardlink trick (this is unreliable)
Oftentimes I have the following task: I am working on some task that can be published later. For example, I am reading a book and writing a review, which produces two files. These files I keep in a git repository for “book reading”.
There is also a repository for the blog posts. I do not want to synchronise the same file in two different repositories. I also do not want to check in pictures into a git repository, because git is not very efficient at working with binary data.
So I make a hard link of the review file. This way the git-versioned file gets all the changes instantly, even if the file is modified on a different machine. Moreover, if I update the file from a git perspective (adding something to the review), the changes automatically get into the Google Drive directory.
There is a huge caveat here:
Some programs, notably, Emacs, by default make backup files by renaming.
That is, a file is renamed to be called filename.bak
, but it is still hard linked with the old “primary brother”.
In order for Emacs to not be bad here, you need:
(set-variable 'version-control nil) ; should be t, but breaks hardlinks (set-variable 'backup-by-copying-when-linked t) ; --//--
3.5.16. TODO Every file should be tagged:
- by the package it comes from
- by the program that has created it
- by the user who last used it: uid and username
Should be doable with a kernel module, or something.
Kprobes
? uprobes
? systemtap?
3.5.17. TODO Meaningful tagging, automatic
Supposedly, tracker should be tagging stuff automatically. Didn’t try it.
3.6. Canned tricks with speed tests.
3.6.2. badblocks 2.73 Tb HDD 3543 minutes ~59 hours
declare DISK DISK=sdc cd /root # -b block_size -w destructive -s show_percentage -v verbose # echo time badblocks -b 4096 -n -s -v -o /root/"$(date --iso)"-"$DISK".badblocks "/dev/$DISK" # echo time badblocks -b 4096 -w -s -v -o /root/"$(date --iso)"-"$DISK".badblocks "/dev/$DISK"
With USB2.0 estimated time – ~75 hours? Hm…
Didn’t write down the USB3 speed.
Esata speed 0.1% – 1:50. Which makes is ~33 hours to just write one pattern?
3.6.3. Smartphone TWRP backup
Manual TWRP without “sdcard”, all partitions, via TWRP to an OTG disk takes:
3389 seconds, that is roughly ~1 hour, in total 41Gb.
Storage is not yet measured Started
– – Not finished 4 hours is not enough, but the phone battery is dead.3.6.4. Measuring disk speed
echo dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync echo dd if=/dev/zero of=/tmp/test2.img bs=512 count=1000 oflag=dsync
3.6.5. fsck
3.6.6. rsync
- rsync root->/mnt/hd 184 minutes after a long time non-backups
- rsync root->/mnt/hd 24 minutes after a chrome update
echo time rsync --archive --hard-links --acls --xattrs --one-file-system --delete-before --fuzzy --human-readable --info=progress2 --partial / /mnt/hd/ --exclude='/tmp/*'
- rsync root->/mnt/backup_root/ (SSD->SSD) ≈ 6 min
- rsync home incremental backup ≈ 22Gb, 25 min
3.7. Organised Protocol
This section is trying to roughly outline in which manner things above should be implemented. A good file system structure almost automatically implies a good backup system, because you have to fetch the “subtrees” from all the devices that you have, or/and export the data from “silos”. And if you have that tree-building system in place, you may just as well add a backup system too.
3.7.1. Automatic backups
You will likely have to backup everything that is listed in the “software” above. When doing backups you have to consider the following:
- When to backup. You cannot fetch data when you are offline, and too often you cannot guarantee that you may be online.
- How to backup. You do not want to clone your hard drive over a metered connection.
- How to rotate backups. If your backups occupy too much space, this space is lost.
- How to make sure that your backups succeed. A good backup is, by definition, unnoticeable, you shouldn’t remember it exists, unless you need it. But this means that you don’t have a reminder (by default), when your backup fails.
I generally try to support each backup procedure with two auxiliary subprograms.
One run together with the backup itself, and notifies me if something goes wrong. The backup itself does not notify me, because the backup often goes wrong. E.g., some backups I run every 5 minutes, but they succeed only once or twice a day, because only at that time the required device is nearby. But there should still be a service that sends you an email if backups are failing for too long.
The second proceduree updates my Dashboard and paints bright red things that have obsolete backups.
I have a “BACKUP” directory, on my laptop, which is roughly 2-level structured, as in:
class/application
, e.g. 01_Messengers/ICQ
.
This directory is getting “scheduled backups” every time it can, with regularity ranging from 5 minutes to 1 day.
My rough list:
- 00_Etc-Git-All-Machines
- 01_Browser-settings-sessions
- 02_Hardware-Data-Stock-Setups-Manuals
- 03_Contacts
- 04_Organiser
- 05_Lists-of-Apps-If-I-Forget-Where-Data-Was-Stored
- 06_SMS-Message-Databases
- 08_Chat-Logs
- 10_News-Feeds
- 11_Voice-Message-Recordings
- 12_Call-Logs
- 13_Directories-Lists
- 15_Diaries-Paper
- 99_Data-Dumps
Additional operation to do overnight:
- Reindex
locate
- Reindex
recoll
- Update softlinks on the basis of xattrs
- Run BOINC (in the name of the greater good)
- Update file tags semantically – run text OCR on PDFs, images, entitity detection, all the cool “NN-related” stuff.
3.7.2. Yearly plan.
Not everything can be done automatically. The easiest example is the portable SSD that you backup your laptop to. You need to plug it in, as modern laptops just do not have a bay for a separate drive. This is getting even more important as new laptops have the NVMe storage soldered onto the motherboard, so if you break your laptop, you cannot even take the drive out.
This section has a rough outline of what is probably worth doing.
- At hardware purchase.
- Add it to your hardware database (list).
- Do badblocks. (Takes days.)
- Enable “smart” monitoring and schedule scans.
- Do an “extended” smart scan.
- Every January.
- Badblocks on all devices.
- fsck on all file systems.
- Revise address book.
- Revise calendar for the last year.
- Revise file system / Run filesystem linter-mapper, revise FS.
- Revise your backup system, manually rotate backups that need manual rotation.
- Re-check your Data Flow.
- Every first day of month.
- Regenerate and print your File System Map.
- Backup heavy systems that take a lot of space.
- Update your phone. (If your phone manufaturer is not crazy and you do not risk killing your phone).
- Clean your phone.
- If you are running a “rolling release” system (e.g. Windows 10), update you system.
- Every Evening
- rsync your laptop hdd to the NAS
- rsync your laptop home to a spinner
- rsync your laptop root to a spinner
- Every Morning
- rsync home to backup ssd
- rsync root to backup ssd
- rsync your laptop root to a failsafe partition
- patch your systems for vulnerabilities
- laptop
- nas
- vps