How to enable GC? Tenant files not being removed, piling up

JohnCrowley · October 18, 2023, 7:40pm

Good afternoon. Thank you for this wonderful product. It is exciting.

Setup: Running Neon locally via docker
not the docker-compose quick-start. I’m supplying config values to Neon and Neon compute node containers that I built locally.

What works:

supplied my own GCS bucket configuration to pageserver.toml
supplied a [tenant_config] to .neon/tenants/<tenant_id>/config with gc_horizon, gc_period, pitr_interval values.
have Pageserver, 3 Safekeeper, Storage Broker and Compute node running well, pushing / pulling to GCP bucket.

What doesn’t work:

data files keep piling up under .neon/tenants/<tenant_id>/. Nothing removes them. Tried setting pitr_interval, gc_period, and gc_horizon low, as noted here
I added a [disk_usage_based_eviction] to my pageserver.toml file as well, but nothing gets resolved.
PageServer keeps filling up and crashing after large inserts.

Needs:

how to configure GC to work?
know the difference between GC and the disk_usage_based_eviction feature?

Logs:
I see gc_loop crop-up in the PageServer logs, but it’s as if it never runs. Not sure how to dial in the configuration values (and which ones) to speed this up. I Saw a reference to do_gc Pageserver API endpoint (in link above) to force-call GC. Not sure how to trigger this endpoint.

  1 # .neon/tenants/<tenant_id>/config                                      
  2                                                   
  3 
  4 [tenant_config]
  5 checkpoint_distance = 268435456
  6 checkpoint_timeout = "10m"
  7 compaction_target_size = 134217728
  8 compaction_period = "20s"
  9 compaction_threshold = 10
 10 gc_horizon = 64
 11 gc_period = "30m"
 12 image_creation_threshold = 3
 13 pitr_interval = "30m"
 14 eviction_policy = { kind = "LayerAccessThreshold", period = "20m", threshold = "20m" }
 15 min_resident_size_override = 1000
 16 evictions_low_residence_duration_metric_threshold = "20m"
 17 gc_feedback = true

  1 # .neon/pageserver.toml                                                                          
  2 id =1                                                                                            
  3 pg_distrib_dir ='/usr/local'                                                                     
  4 http_auth_type ='Trust'                                                                          
  5 pg_auth_type ='Trust'                                                                            
  6 listen_http_addr ='0.0.0.0:9898'                                                                 
  7 listen_pg_addr ='0.0.0.0:6400'                                                                   
  8 broker_endpoint ='http://storage_broker:50051/'                                                  
  9 disk_usage_based_eviction = { max_usage_pct = 10, min_avail_bytes = 1000, period = "10s"}        
 10 background_task_maximum_delay = '10s'                                                            
 11                                                                                                  
 12 [remote_storage]                                                                                 
 13 endpoint='https://storage.googleapis.com'                                                        
 14 bucket_name='OMITTED'                                               
 15 bucket_region='us'                                                                               
 16 prefix_in_bucket='/pageserver/'                                                                  
 17

Thank you for any guidance on this matter!

Brendan · October 19, 2023, 2:31pm

github.com

neondatabase/neon/blob/893b7bac9abb279cc0097d2c7bf640d89304a92f/pageserver/src/tenant/mgr.rs#L955


      
              )
              .await?;
              // Although we are cleaning up the tenant, this task is not meant to be bound by the lifetime of the tenant in memory.
              // After a tenant is detached, there are no more task_mgr tasks for that tenant_id.
              let task_tenant_id = None;
              task_mgr::spawn(
                  task_mgr::BACKGROUND_RUNTIME.handle(),
                  TaskKind::MgmtRequest,
                  task_tenant_id,
                  None,
                  "tenant_files_delete",
                  false,
                  async move {
                      fs::remove_dir_all(tmp_path.as_path())
                          .await
                          .with_context(|| format!("tenant directory {:?} deletion", tmp_path))
                  },
              );
              Ok(())
          }

If necessary, tenant_files_delete can be called from detach_tentant.

Neon keeps the WAL until a user-configured Point-in-Time Recovery (PiTR) window has passed. Then garbage collection process should remove all layer files that are outside of the PiTR window.

This seems like there are still tasks pending or the target_timeline_id is not all –
But I need to do some more digging.

JohnCrowley · October 19, 2023, 6:05pm

Thank you.

Think I’m getting close, but would like some confirmation. I was finding this code in pageserver/src/tenant/timeline that seems to be where the values get decided upon here, to see if latest_gc_cutoff >= new_gc_cutoff. In the second code link, new_gc_cutoff gets set aslet new_gc_cutoff = Lsn::min(horizon_cutoff, pitr_cutoff);.

If pitr_interval isn’t set (first code link), its default is “7 days”, which then returns a pitr_cutoff value equal to the latest_gc_cutoff, i.e., if latest_gc_cutoff >= new_gc_cutoff always equals each other, so Nothing to GC... always runs. I set pitr_interval in my [tenant_config] to "0s", which (first code link) defaults the pitr_cutoff return to the gc_cutoff (gc_horizon value subtracted from last_record_lsn).

I then saw GC Starting and the cutoff LSN updating and progressing. However, it completes by saying GC completed removing 0 layers, cutoff 0/2BEA920.

My concern is the PageServer disk filling with WAL that never gets GC’ed. My testing is creating several tables and inserting 100,000 → 1,000,000 records. Wondering if there is an obvious thing I’m missing here. Lastly, am I thinking correctly about gc_horizon value, in that the smaller it is, the more will be GC’ed every gc_period amount of time?

Sincerely,

John

JohnCrowley · October 20, 2023, 1:11pm

Update, in case someone is confused as well:

After following line #4181 of the gc_timeline function in pageserver/src/tenant/timeline.rs, with RUST_LOG=debug on, I was able to start seeing that GC wasn’t removing any layers because it was keeping ... because it's newer than horizon cutoff or keeping ... because it's the latest layer.

I set pitr_interval = "0sec", image_creation_threshold = 0, gc_horizon = 64, gc_period="50sec", triggering many fast GC periods, and then it started to actually remove the layers.

Curious, because many older layers are still staying around. I feel like the Delta layers are messing it up, where cutoff_horizon LSN gets updated each GC interval, but Delta layers lump everything together, making layers un-removable. Any suggestions?