Is scaling down possible?

itooo · May 10, 2023, 4:47pm

Hi Neon!

First congratulations for what you built, this is amazing and would probably be perfect fit for my workloads.
I read many of your technical blog posts (I kinda like to understand stuff before using it ), especially the one about how you achieved scaling (Scaling serverless Postgres: How we implement autoscaling - Neon). Very clever!

I was wondering if it was possible to scale down the DBs because I couldn’t figure how you could possibly force postgres to release memory which was acquired when more RAM was allocated. I must be missing something…

Cheers!

sharnoff · May 10, 2023, 5:29pm

Hey! Glad to hear you’ve been enjoying the posts! Scaling down is possible — it’s also interesting from a technical perspective because you’re right that normally we’d have to release all the memory allocated in a memory slot.

However: the Linux kernel supports marking memory slots as “migratable”, so that the physical address can be changed, to move away from the memory slot, while keeping the address exposed to user software like postgres the same. There’s some more information about that here: Memory Hot(Un)Plug — The Linux Kernel documentation

Back to autoscaling: Currently scaling decisions (aside from immediate reactions to almost-OOM) are made by the autoscaler-agent, which right now only looks at load average. So as long as the load average is sufficiently lower than what it should be for the current allocation, we’ll try to downscale (provided postgres’ memory usage is low enough, validated by the vm-informant). And because we have a fixed ratio between CPU and memory, downscaling always includes unplugging some memory.

Thankfully, all the stuff around migrating memory pages is handled by QEMU + the kernel, so we don’t need to deal with it directly

itooo · May 10, 2023, 6:59pm

Thanks for your blazingly fast answer!

I have read the documentation you linked to but I fail to see how this would help… let’s say PG process uses 3GB RAM out of 4GB available, and the downscale process “unplugs” 2G to scale down to 2G available. That is 1G which was used by PG which suddenly “disappears”.
It would most probably be catastrophic anyways to tell any process that a random part of it’s allocated memory is now unavailable, so data must be relocated somewhere else, where does this data go? is it relocated to swap?

(Looks like there is clever magic happening, maybe this would be a nice addition to your blog post )

sharnoff · May 10, 2023, 8:33pm

Ah, ok! I think I misunderstood the question

In short: if postgres is using 3GB RAM, we won’t scale down below that (even if the load average would indicate doing so) — because yeah, suddenly making its memory unavailable would be catastrophic

(although it’s worth noting that the “catastrophe” here is probably just “postgres crashed, so your endpoint restarted” — not as bad as it could be!)

As for how it works internally, the vm-informant is responsible for checking this: Every time we want to scale down, it must be approved by the vm-informant, which will deny scaling down if doing so would might have… catastrophic consequences.

So, (un)fortunately, no magic! Always happy to answer questions about autoscaling!

itooo · May 10, 2023, 11:01pm

Ahhhh I see! Makes much more sense to me

So lets say at some point my PG instance uses 14GB RAM for its shared buffers, I will be stuck paying for 4 vCPU even if I’m down to 1 small query per minute; waiting for PG to eventually release these, correct?

You probably are much more expert than me regarding PG internals (shouldn’t be difficult ), but to me it makes no sense releasing cached data until :

the cache is full and it needs to make room for other data (most probably)
the associated data has been deleted from the DB (maybe)

Am I correct ? Is it the way PG shared buffers work ?
If yes, then, it would mean downscaling is unlikely to happen once the shared buffers are filled.

sharnoff · May 11, 2023, 2:39pm

Ah yeah, so this would be an issue if shared buffers were large. We actually purpose-built the local file cache to solve this issue — we needed a cache similar to shared buffers (in both function and speed) that we were able to safely resize at runtime. Because of this, we tend to keep shared buffers small — around 128MiB.

In general, yup! This is why we keep it small

That’s why it’s so important for us to have a cache that we are able to forcefully downsize (which is only really possible because it’s read-only, due to our storage architecture).

PlantBased · May 21, 2023, 9:08pm

Interesting video and thread

I saw your video but wasn’t sure how writing to the postgres WAL (“Safekeepers”?) invalidates the file caches on compute nodes.

Thanks