How to split short term and long term VictoriaMetrics storage

farcaller@fstab.sh · 24 days ago

I have a dedicated vm for things that are crucial to the home network, either latency-critical or network related.

That’d be my dns resolver (I enforce it over VLANs by hijacking anyone trying to do DNS to other resolvers, like random IoT devices), homebridge for less important home automaton and my own matter controller for most important home automaton (controlling the lights).

My router of choice is RouterOS in another VM. I tried opnsense, pfsense, vyatta, and a bunch of others (even a containerized Cisco route), and I settled on ROS, because it was the only one who could do IPv6 properly (apart from Cisco, but that has other issues).

For the less important things I run them on k8s and really, there are only two bits worth mentioning as essential: ArgoCD and nixhelm. Together, they provide effortless and mostly automated software updates with very easy rollbacks. I don’t have to go and manually update every single bit of software and that saves huge amounts of time.

farcaller@fstab.sh · 1 month ago

That’s a good point. Mind that in most production environments you’d be firewalled rather hard (especailly when it comes to logs processing which oftentimes ends up having PII). I wouldn’t trust any service that tries to use DoT or DoH in there that I couldn’t snoop on. Many deployments nowadays allow you to “punch” firewall holes based on the outgoing dns requests to an allowlisted domain, so chances are you actually want to use the glibc resolver and not try to be fancy.

That said, smaller images are always good in my book!

farcaller@fstab.sh · 1 month ago

You’re nailing your goal then!

I would still steer you slightly towards documenting your architectural decisions more. It’s a good skill to have and will help you in a long run.

You have dozens of crate dependencies and only you know why they are in there. A high-level document on how your system interconnects and how the algorithms under the hood work will be a huge help to anyone who comes looking through your source code. We become better programmers not by reading the source code, but by understanding what it actually does.

Here’s a random trivia: your server depends on trust-dns-resolver. Why? Why wasn’t the stock resolver enough? Is that a design choice or you just wanted to have fun? There is no wrong answer but without the design notes it’s hard to figure your intent.

farcaller@fstab.sh · 1 month ago

This looks nice, but there’s plenty free alternatives in this space which warrants a section in the readme with the comparison to other products.

You mention ram usage, but it’s oftentimes a product of event size. Based on your numbers, your average event size is about 800 bytes. Let’s call it 1kb. That’s one million events per day. It’s surely sounds more promising than Elastic, but not reaching Loki numbers, or, if you focus on efficiency, is way behind Victoriametrics Logs (based on peeking at their benches).

I think the important bits you need to add is how you store the logs (i.e. which indices you build) and what are your trade-offs. Grep is an efficient logs processor which barely uses any ram but incurs dramatic I/O costs, after all.

Enterprises will be looking at different numbers and they have lots of SaaS products to choose from. Homelab users are absolutely your target audience and you can have it by making a better UI than the alternative (victoriametrics logs aren’t that comfortable to work with) or making resource usage lower (people run k8s clusters on RPis, they sure wonder about every megabyte of ram lost) or making the deployment easier (fire and forget, and when you come to it, it works).

It sounds like lots of things and I don’t want to be discouraging. What you started there is really nice-looking. Good job!

farcaller@fstab.sh · 1 month ago

You can enforce an always-on VPN (for at least ipsec) via an MDM profile. This kind of features isn’t found in the casual user setup options, but there’s plenty of knobs to tune in the enterprise profile configurator.

And yes, you can easily install that profile on your phone after.

farcaller@fstab.sh · 1 month ago

It is pretty bad. After this thread I tried using Element X again only to learn that its “favorites” aren’t the same as Element’s “favorites” and more so you can’t set someone a favorite in E-X, at least not of your server is Conduit. It’s just silently ignored.

farcaller@fstab.sh · 1 month ago

I would absolutely recommend a file system with snapshot capabilities for a home server. One of btrfs mirror, dm-raid (raid5) with btrfs, or zfs would work. The practical differences would be negligible at this scale and you can just pick whatever you fancy.

farcaller@fstab.sh · 1 month ago

I’ve been having sync issues with conduit lately, takes minutes for the mobile app to catch up. No way to purge old media, or to use something S3-compatible for its storage either.

Also, element x doesn’t support spaces, so if you want to bridge other chats into matrix they all are going to be messed up together.

I like matrix as a concept, but both servers and clients are in a bit of a shitshow state (same as xmpp was years ago).

farcaller@fstab.sh · 2 months ago

For the last 10 days tailscale clocked 1% battery on my phone. I honestly didn’t even consider turning it off for battery savings.

farcaller@fstab.sh · 2 months ago

If tailscale inside a container allows you to talk to it via “direct” connection and not a derp proxy, then it will offer you better service isolation (can set the tailscale ACLs for this specific service) without sacrificing performance.

Tailscale pushes for it because it just ties you in more. It allows to to utilize the ACLs better, to see your thing in their service mesh, and every service will count against the free node limit.

In practice, I often do both. E.g. I’ll have my http ingress exposed to tailscale and route a bunch of different services through it at a single tailscale node, where the access control is done by services individually. But I’ll also run a pod-to-pod tailscale between two k8s clusters because tailscale ACL is just convenient.

farcaller@fstab.sh · 2 months ago

ECC is slightly more required for ZFS because its ARC is generally more aggressive than the usual linux caching subsystem. That said, it’s not a hard requirement. My curent NAS was converted from my old windows box (which apparently worked for years with bad ram). Zfs uncovered the problem in the first 2 days by reporting the (recoverable) data corruption in the pool. When I fixed the ram issue and hash-checked against the old backup all the data was good. So, effectively, ZFS uncovered memory corruption and remained resilient against it.

farcaller@fstab.sh · 3 months ago

TIL, thanks!

farcaller@fstab.sh · 3 months ago

I had exactly the same use case and I ended up with a 40G DAC fiber for that case. It ended up cheaper than converting the whole lan to 10G.

That said, it feels like used 10G equipment is easier to come by than 2.5G for now, and if you have 2G fiber uplink and only 1G past the router then it’s a waste.

farcaller@fstab.sh · 3 months ago

Garage is trivial to get up and running and it’s more lightweight than minio nowadays.

farcaller@fstab.sh · 3 months ago

No. It’s my in-cluster storage that I only use for things that are easier to work with via S3 api, and I do backups outside of the k8s scope (it’s a bunch of various solutions that boil down to offsite zfs replication, basically). I’d suggest you to take a look at garage’s replication features if you want it to be durable.

farcaller@fstab.sh · 4 months ago

Actual public services run there, yeah. In case if any is compromised they can only access limited internal resources, and they’d have to fully compromise the cluster to get the secrets to access those in the first place.

I really like garage. I remember when minio was straightforward and easy to work with. Garage is that thing now. I use it because it’s just co much easier to handle file serving where you have s3-compatible uploads even when you don’t do any real clustering.

farcaller@fstab.sh · 4 months ago

I’ve dealt with exactly the same dilemma in my homelab. I used to have 3 clusters, because you’d always want to have an “infra” cluster which others can talk to (for monitoring, logs, docker registry, etc. workloads). In the end, I decided it’s not worth it.

I separated on the public/private boundary and moved everything publicly facing to a separate cluster. It can only talk to my primary cluster via specific endpoints (via tailscale ingress), and I no longer do a multi-cluster mesh (I used to have istio for that, then cilium). This way, the public cluster doesn’t have to be too large capacity-wise, e.g. all the S3 api needs are served by garage from the private cluster, but the public cluster will reverse-proxy into it for specific needs.

farcaller@fstab.sh · 4 months ago

any oauth (I use kanidm) and oauth2-proxy solves that and now you can easily use passkeys to log into your intranet resources.

farcaller@fstab.sh · 4 months ago

The biggest certainty is that just having an open port for an SMTP server dangling out there means you will 100% be attacked.

True.

Not just sometimes, non-stop.

True

So you don’t want to host on a machine with anything else on it, cuz security.

I don’t think “cuz security” is a proper argument or no one would be ever listening on public internet. Are there risks? Yes.

So you need a dedicated host for that portion

Bullshit. You do not need a dedicated host for smtp ingress. It won’t be attacked that much.

and a very capable and restrictive intrusion detection system (let’s say crowdsec), which is going to take some amount of resources to run, and stop your machine from toppling over.

That’s not part of the mail pipeline the OP asked for.

Here, I brought receipts. There are two spikes of attempted connections in the last month, but it’s all negligible traffic.

Self-hosting mail servers is tricky, same as self-hosting ssh, http, or whatever else. But it’s totally doable even on an aging RPi. No, you don’t need to train expensive spam detection because it’s enough to have very strict rules on where you get mail from and drop 99% of the traffic because it will be compliant. No, you don’t need to run crowdstrike for a server that accepts bytes and stores them for another server (IMAP) to offer them to you. You don’t even need an antivirus, it’s not part of mail hosting, really.

Instead of bickering and posturing, you could have spent your time better educating OP on the best practices, e.g. like this.

farcaller@fstab.sh · 4 months ago

I won’t quote the bit of your post again, but no, if you have an open smtp port then you won’t get constantly attacked. Again, I have a fully qualified smtp server and it receives about 40 connections per hour (mostly the spam ones). That’s trivial to process.

It doesn’t matter that I forward emails from another server, because, in the end, mine is still public on the internet.

If you are trying to make a point that it’s tricky to run a corporate-scale smtp and make sure that end users are protected, then it’s clearly not what the OP was looking for.

farcaller@fstab.sh · 6 months ago

How to split short term and long term VictoriaMetrics storage

farcaller@fstab.sh · 7 months ago

Self-hosted alternative to synology drive?