• 2 Posts
  • 7 Comments
Joined 2 years ago
cake
Cake day: June 6th, 2023

help-circle

  • Thanks for the lead!

    It looks like the Buddy Read feature does in fact start with a specific book and organize a group around it, but it invites me to specify all the people that will ever be in the group right away, at group creation time. I get three ways to invite people:

    • “Machine-learning powered reading buddy recommendations” - Unspecified voodoo. Three users are shown.
    • “Community members who have this book on their radar” - Probably folks that have this on their public ‘to read’ list. Three users are shown.
    • Specifying users directly by username

    This doesn’t quite fit the “I’m up for this, let me know when it starts” mechanic.

    I could create a new group & invite all three of the users with this book in their public ‘to read’ list, but I think folks treat the the ‘to read’ list very, very casually – not at the “I’m ready to commit to a reading group” level. These three users have 723, 2749, and 3771 books on their ‘to read’ lists respectively. I see that I somehow have have 46 books on mine, & haven’t been thinking of it as a ‘ready to commit to reading group’ list.





  • There are so many ways do handle backups, so many tools, etc. You’ll find something that works for you.

    In the spirit of sharing a neat tool that works well for me, addressing many of the concerns you raised, in case it might work for you too: Maybe check out git annex. Especially if you already know git, and maybe even if you don’t yet.

    I have one huge git repository that in spirit holds all my stuff. All my storage devices have a check-out of this git repo. So all my storage devices know about all my files, but only contain some of them (files not present show up as dangling symlinks). git annex tracks which drives have which data and enforces policies like “all data must live on at least two drives” and “this more-important data must live on at least three drives” by refusing to delete copies unless it can verify that enough other copies exist elsewhere.

    • I can always see everything I’m tracking – the filenames and directory structure for everything are on every drive.
    • I don’t have to keep track of where things live. When I want to find something, I can just ask which drives it’s on.
      • (I also have one machine with a bunch of drives in it which I union-mount together, then NFS mount from other machines, as a way to care even less where individual files live)
    • Running git annex fsck on a drive will verify that
      • All the content that’s supposed to live on that drive is in fact present and has the correct sha256 checksum, and
      • All policies are satisfied – all files have enough copies.

  • The benefit of using something fancier than rsync is that you get a point-in-time recovery capability.

    For example, if you switch the enclosures weekly, rsync gives you two recovery options: restore to yesterday’s state (from the enclosure not in the safe) and restore to a state from 2-7 days ago (from the one in the safe, depending on when it went into the safe).

    Daily incremental backups with a fancy tool like dar let you restore to any previous state. Instead of two options, you have hundreds of options, one for each day. This is useful when you mess up something in the archive (eg: accidentally delete or overwrite it) and don’t notice right away: It appeared, was ok for awhile, then it was bad/gone and that bad/gone state was backed up. It’s nice to be able to jump back in time to the brief it-was-ok state & pluck the content back out.

    If you have other protections against accidental overwrite (like you only back up git repos that already capture the full history, and you git fsck them regularly) — then the fancier tools don’t provide much benefit.

    I just assumed that you’d want this capability because many folks do and it’s fairly easy to get with modern tools, but if rsync is working for you, no need to change.


  • Sounds fine?

    Yes: Treat the two enclosures independently and symmetrically, such that you can fully restore from either one (the only difference would be that the one in the safe is slightly stale) and the ongoing upkeep is just:

    1. Think: “Oh, it’s been awhile since I did a swap” (or use a calendar or something)
    2. Unplug the drive at the computer.
    3. Cary it to the safe.
    4. Open the safe.
    5. Take the drive in the safe out.
    6. Put the other drive in the safe.
    7. Close the safe.
    8. Cary the other drive to the computer.
    9. Plug it in.
    10. (Maybe: authenticate for the drive encryption if you use normal full-disk encryption & don’t cache the credential)

    If I assume a normal incremental backup setup, both enclosures would have a full backup and a pile of incremental backups. For example, if swapped every three days:

    Enclosure A        Enclosure B
    -----------------  ---------------
    a-full-2023-07-01
    a-incr-2023-07-02
    a-incr-2023-07-03
                       b-full-2023-07-04
                       b-incr-2023-07-05
                       b-incr-2023-07-06
    a-incr-2023-07-07
    a-incr-2023-07-08
    a-incr-2023-07-09
                       b-incr-2023-07-10
                       b-incr-2023-07-11
                       b-incr-2023-07-12
    a-incr-2023-07-13
    ....
    

    The thing taking the backups need not even detect or care which enclosure is plugged in – it just uses the last incremental on that enclosure to determine what’s changed & needs to be included in the next incremental.

    Nothing need care about the number or identity of enclosures: You could add a third if, for example, you found an offsite location you trust. Or when one of them eventually fails, you’d just start using a new one & everything would Just Work. Or, if you want to discard history (eg: to get back the storage space used by deleted files), you could just wipe one of them & let it automatically make a new full backup.

    Are you asking for help with software? This could be as simple as dar and a shell script.

    My personal preference is to tell the enclosure to not try any fancy RAID stuff & just present all the drives directly to the host, and then let the host do the RAID stuff (with lvm or zfs or whatever), but I understand opinions differ. I like knowing I can always use any other enclosure or just plug the drives in directly if/when the enclosure dies.

    I notice you didn’t mention encryption, maybe because that’s obvious these days? There’s an interesting choice here, though: You can do normal full-disk encryption, or you could encrypt the archives individually. Dar actually has an interesting feature here I haven’t seen in any other backup tool: If you keep a small --aux file with the metadata needed for determining what will need to go in the next incremental, dar can encrypt the backup archives asymmetrically to a GPG key. This allows you to separate the capability of writing backups and the capability of reading backups. This is neat, but mostly unimportant because the backup is mostly just a copy of what’s on the host. It comes into play only when accessing historical files that have been deleted on the host but are still recoverable from point-in-time restore from the incremental archives – this becomes possible only with the private key, which is not used or needed by any of the backup automation, and so is not kept on the host. (You could also, of course, do both full-disk encryption and per-archive encryption if you want the neat separate-credential for deleted files trick and also don’t want to leak metadata about when backups happen and how large the incremental archives are / how much changed.) (If you don’t full-disk-encrypt the enclosure & rely only on the per-archive encryption, you’d want to keep the small --aux files on the host, not on the enclosure. The automation would need to keep one --aux file per enclosure, & for this narrow case, it would need to identify the enclosures to make sure it uses that enclosure’s --aux file when generating the incremental archive.)