First congratulations for what you built, this is amazing and would probably be perfect fit for my workloads.
I read many of your technical blog posts (I kinda like to understand stuff before using it ), especially the one about how you achieved scaling (Scaling serverless Postgres: How we implement autoscaling - Neon). Very clever!
I was wondering if it was possible to scale down the DBs because I couldn’t figure how you could possibly force postgres to release memory which was acquired when more RAM was allocated. I must be missing something…
Hey! Glad to hear you’ve been enjoying the posts! Scaling down is possible — it’s also interesting from a technical perspective because you’re right that normally we’d have to release all the memory allocated in a memory slot.
However: the Linux kernel supports marking memory slots as “migratable”, so that the physical address can be changed, to move away from the memory slot, while keeping the address exposed to user software like postgres the same. There’s some more information about that here: Memory Hot(Un)Plug — The Linux Kernel documentation
Back to autoscaling: Currently scaling decisions (aside from immediate reactions to almost-OOM) are made by the autoscaler-agent, which right now only looks at load average. So as long as the load average is sufficiently lower than what it should be for the current allocation, we’ll try to downscale (provided postgres’ memory usage is low enough, validated by the vm-informant). And because we have a fixed ratio between CPU and memory, downscaling always includes unplugging some memory.
Thankfully, all the stuff around migrating memory pages is handled by QEMU + the kernel, so we don’t need to deal with it directly
I have read the documentation you linked to but I fail to see how this would help… let’s say PG process uses 3GB RAM out of 4GB available, and the downscale process “unplugs” 2G to scale down to 2G available. That is 1G which was used by PG which suddenly “disappears”.
It would most probably be catastrophic anyways to tell any process that a random part of it’s allocated memory is now unavailable, so data must be relocated somewhere else, where does this data go? is it relocated to swap?
(Looks like there is clever magic happening, maybe this would be a nice addition to your blog post )
In short: if postgres is using 3GB RAM, we won’t scale down below that (even if the load average would indicate doing so) — because yeah, suddenly making its memory unavailable would be catastrophic
(although it’s worth noting that the “catastrophe” here is probably just “postgres crashed, so your endpoint restarted” — not as bad as it could be!)
As for how it works internally, the vm-informant is responsible for checking this: Every time we want to scale down, it must be approved by the vm-informant, which will deny scaling down if doing so would might have… catastrophic consequences.
So, (un)fortunately, no magic! Always happy to answer questions about autoscaling!
So lets say at some point my PG instance uses 14GB RAM for its shared buffers, I will be stuck paying for 4 vCPU even if I’m down to 1 small query per minute; waiting for PG to eventually release these, correct?
You probably are much more expert than me regarding PG internals (shouldn’t be difficult ), but to me it makes no sense releasing cached data until :
the cache is full and it needs to make room for other data (most probably)
the associated data has been deleted from the DB (maybe)
Am I correct ? Is it the way PG shared buffers work ?
If yes, then, it would mean downscaling is unlikely to happen once the shared buffers are filled.
Ah yeah, so this would be an issue if shared buffers were large. We actually purpose-built the local file cache to solve this issue — we needed a cache similar to shared buffers (in both function and speed) that we were able to safely resize at runtime. Because of this, we tend to keep shared buffers small — around 128MiB.
In general, yup! This is why we keep it small
That’s why it’s so important for us to have a cache that we are able to forcefully downsize (which is only really possible because it’s read-only, due to our storage architecture).