OpenObserve Storage Space Issue and Sled Corruption Recovery

TLDR Mike experienced storage space issues and sled corruption when using OpenObserve. Prabhat assisted in identifying the cause and suggested using a bigger PVC or an object store like S3 to resolve the problem. Recovery of sled remains unexplored.

Photo of Mike
Mike
Sun, 11 Jun 2023 17:30:28 UTC

Hey all - digging openobserve so far. Running as a single instance with sled right now -- however, I enabled metrics collection and filled up my storage space quicker than I expected :boom: Sled complains of empty/corrupt snapshot: ```[2023-06-11T17:00:29Z INFO openobserve] Starting OpenObserve v0.4.7 [2023-06-11T17:00:29Z WARN sled::pagecache::snapshot] empty/corrupt snapshot file found thread 'tokio-runtime-worker' panicked at 'sled db dir create failed: Corruption { at: None, bt: () }', src/infra/db/sled.rs:313:50 stack backtrace: 0: 0x55b9da001b87 - <unknown>``` Is there any way to recover from this? Sled docs are... lacking :slightly_smiling_face:. I've cleared space hoping it'd self recover but no dice.

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:34:54 UTC

Hey Mike, Thanks for trying out OpenObserve.

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:36:23 UTC

> enabled metrics collection Are you capturing metrics using prometheus remote write?

Photo of Mike
Mike
Sun, 11 Jun 2023 17:36:38 UTC

Correct

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:37:07 UTC

Got it. How fast are you capturing it? How much space was consumed?

Photo of Mike
Mike
Sun, 11 Jun 2023 17:39:01 UTC

I've also added a filter to remove the noisiest metrics to prevent this in the future. Two in particular were huge: • apiserver_request_slo_duration_seconds_bucket • apiserver_request_duration_seconds_bucket Both of these were taking 3.5+ GB for a couple days of retention

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:39:22 UTC

The way it works is that, OpenObserve first stores data in WAL (json files) in an uncompressed fashion and then converts it into compressed parquet periodically.

Photo of Mike
Mike
Sun, 11 Jun 2023 17:39:26 UTC

I was using the default 10GB pvc in your reference statefulset and exhausted it within 48 hrs

Photo of Mike
Mike
Sun, 11 Jun 2023 17:40:33 UTC

ah hang on there's a second one - copy/paste error. both related to apiserver

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:41:18 UTC

oh

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:41:45 UTC

Looks like you are generating a lot of data.

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:41:57 UTC

Are you on a cloud or in your own data center?

Photo of Mike
Mike
Sun, 11 Jun 2023 17:42:16 UTC

This is a small homelab, a k3s cluster with 3 nodes, 1 master. The size surprised me

Photo of Mike
Mike
Sun, 11 Jun 2023 17:43:13 UTC

k8s v1.24.3

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:47:16 UTC

Is there a way for you to check if the data is moved from WAL or it it still in WAL in pvc?

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:48:02 UTC

I suspect that data got. in WAL and for some reason did not get cleared out of WAL and filled up space

Photo of Mike
Mike
Sun, 11 Jun 2023 17:48:05 UTC

checking - spinning up a pod attached

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:48:09 UTC

this also caused sled corruption

Photo of Mike
Mike
Sun, 11 Jun 2023 17:49:05 UTC

there's about 785MB of data in `/data/wal` mostly under /data/wal/files

Photo of Mike
Mike
Sun, 11 Jun 2023 17:51:20 UTC

mostly metrics - I suppose this was ingested data not yet persisted to the parquet format?

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:51:45 UTC

are the files in .json format ?

Photo of Mike
Mike
Sun, 11 Jun 2023 17:51:52 UTC

yes

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:52:31 UTC

perfect. You are right. This was the data that could not get persisted to parquet format.

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:52:49 UTC

With your rate of data generation you need a bigger PVC than 10 GB.

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:54:17 UTC

WAL actually works a kind of buffer. In case data is arriving very fast and you do not have enough compute power to process all of them then data is stored in WAL and slowly processed to parquet

Photo of Mike
Mike
Sun, 11 Jun 2023 17:54:36 UTC

I've filtered the excessively noisy metrics from remoteWrite to avoid sending them, but will also enlarge the PVC

Photo of Mike
Mike
Sun, 11 Jun 2023 17:55:04 UTC

can WAL be cleared / existing db recovered somehow, now that space is freed?

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:56:58 UTC

If sled is not corrupted then next time when OpenObserve starts, it will see that the data is still in WAL and will start processing it.

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 17:59:34 UTC

I am not sure of how to recover sled. Let me check that. Even for home lab if you have the option to push data to s3 (enough bandwidth and internet speed) you should use it together with sled. your performance should be fine since OpenObserve caches recent data in memory on the node.

Photo of Mike
Mike
Sun, 11 Jun 2023 18:05:07 UTC

Understood - will look at changing that to s3 or another compatible store

Photo of Mike
Mike
Sun, 11 Jun 2023 18:05:53 UTC

thanks for taking the time to chat about this, by the way. Appreciate the openness

Photo of Mike
Mike
Sun, 11 Jun 2023 18:08:46 UTC

by the way, I also created my own single-node focused helm chart based on the single-node manifest. I wanted to use helm but add a few bits of flexibility to kick the tires without the full number of components in the official chart

Photo of Mike
Mike
Sun, 11 Jun 2023 18:08:55 UTC

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 18:09:56 UTC

Ah, I see, you are that awesome guy, from reddit.

Photo of Mike
Mike
Sun, 11 Jun 2023 18:10:26 UTC

intention is not to compete with the official chart, just provide another option for simpler deployment. I'll add a bit to the readme there about PVC sizing and using an object store as best practices even for homelab :slightly_smiling_face:

Photo of Mike
Mike
Sun, 11 Jun 2023 18:20:17 UTC

Guilty as charged :slightly_smiling_face: I had set up a few dashboards and an alert rule - would love to be able to recover at least the config if possible, but if not, no big deal - can start fresh. Just let me know if you know of a way to run some kind of sled db recovery

Photo of Prabhat
Prabhat
Sun, 11 Jun 2023 18:50:47 UTC

Haven't done a sled recovery yet. Will give it a shot. First, will need to figure out, how to corrupt it though.