OpenObserve On-Premise Deployment Issue and Resolution

TLDR Andrew had trouble using OpenObserve app despite successful deployment in k8s cluster, receiving 500 errors and inability to query logs. After assistance from Andrew, they restart the router and querier, successfully fixing the issue and implementing fluentbit and the OTEL operator.

Photo of Andrew
Andrew
Sat, 07 Oct 2023 03:20:59 UTC

Hey all, I'm demoing OpenObserve on-premise, deployed in my k8s cluster (v1.28.1) using Helm (chart version 0.6.4). App is deployed behind a nginx ingress. Setup seems fine, pods are all up and healthy, but when I try to use the app, I get 500 errors in the web console. For now, focusing on logs, when I try to query logs I get a 503 error from `` with a response of `No online querier nodes`. My querier pod is up. I'll add some more details in the thread, with some log snippets

Photo of Andrew
Andrew
Sat, 07 Oct 2023 03:21:54 UTC

Router logs, RUST_LOG is set to debug

Photo of Andrew
Andrew
Sat, 07 Oct 2023 03:22:11 UTC

Photo of Prabhat
Prabhat
Sat, 07 Oct 2023 03:31:44 UTC

Looks like somehow querier is not connected to etcd

Photo of Prabhat
Prabhat
Sat, 07 Oct 2023 03:31:49 UTC

how are etcd pods doing?

Photo of Andrew
Andrew
Sat, 07 Oct 2023 03:32:02 UTC

healthy

Photo of Andrew
Andrew
Sat, 07 Oct 2023 03:32:11 UTC

Pods are up, no errors in the logs

Photo of Prabhat
Prabhat
Sat, 07 Oct 2023 03:33:23 UTC

I see logs logs from fluentbit. So UI and ingestion is working fine. Only querying fails. right?

Photo of Andrew
Andrew
Sat, 07 Oct 2023 03:33:40 UTC

Yup, the UI is just blank

Photo of Andrew
Andrew
Sat, 07 Oct 2023 03:34:04 UTC

Photo of Prabhat
Prabhat
Sat, 07 Oct 2023 03:34:28 UTC

can we have a quick call? I can DM you.

Photo of Andrew
Andrew
Sat, 07 Oct 2023 03:34:48 UTC

Sure

Photo of Andrew
Andrew
Sat, 07 Oct 2023 04:11:28 UTC

We fixed this in google meet. Discussed possibly a race condition. Restarted the router, then restarted the querier

Photo of Andrew
Andrew
Sat, 07 Oct 2023 14:53:55 UTC

Logs from fluentbit and metrics from the otel collector are flowing in quite nicely. That's about 9 hours of logs. I'm impressed by the compression to disk :slightly_smiling_face:

Photo of Prabhat
Prabhat
Sat, 07 Oct 2023 15:05:43 UTC

:tada: :rocket:

Photo of Prabhat
Prabhat
Sat, 07 Oct 2023 15:07:15 UTC

Since you are using both otwl-collwctor and fluentbit, would you want to give this OpenObserve collector helm chart. It is basically otel collector that helps you gather both logs and metrics

Photo of Andrew
Andrew
Sat, 07 Oct 2023 15:14:32 UTC

I did check out that helm chart, but in the end used the OTEL operator to set it all up. I've already had OTEL running, shifting it to OpenObserve was as easy as adding the right endpoints Fluentbit is collecting the logs from my nodes and physical infrastructure over syslog and TCP json and adding it into Openobserve as well. This is something I wasn't able to accomplish with groundcover :smile:

Photo of Prabhat
Prabhat
Sat, 07 Oct 2023 21:28:00 UTC

BTW, helm chart leverages OTEL operator, It is configured to allow for capturing logs, metrics and traces (via automatic instrumentation) for nodejs, python, java, .Net and Go.