Connectivity Testing Issues and Solutions with OpenObserve Router

TLDR Dylan experienced connectivity issues while testing OpenObserve router. Suggested by Prabhat regarding etcd issues and possibility of Istio service mesh causing connectivity problems. Reinstallation and increasing memory were also suggested but issue remained unresolved.

Photo of Dylan
Dylan
Thu, 16 Nov 2023 18:50:14 UTC

Hey guys! I'm doing some connectivity testing from our development namespace to our openobserve router which is in an openobserve ns, getting some strange behavior. Wonder if you guys have any ideas? I'm able to successfully curl test data from a pod in a seperate development ns to the router with `curl -X POST "" -H "Content-Type: application/json" -H "Authorization: Basic XXX" -d '[{"level":"info","job":"test","log":"test message for openobserve"}]'` however when I try to use `openobserve-development-router.openobserve.svc.cluster.local` I get ```curl -X POST "openobserve-development-router.openobserve.svc.cluster.local:5080/api/default/default/_json" -H "Authorization: Basic XXX" -H "Content-Type: application/json" -d '[{"level":"info","job":"test","log":"test message for openobserve"}]' curl: (52) Empty reply from server``` the router starts logging a ton of grpc errors ```[2023-11-16T18:45:07Z ERROR openobserve::service::router] : Failed to connect to host: Internal error: connector has been disconnected [2023-11-16T18:45:07Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 13.950588 [2023-11-16T18:45:07Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 13.943231 [2023-11-16T18:45:07Z ERROR openobserve::service::router] : Failed to connect to host: Internal error: connector has been disconnected [2023-11-16T18:45:07Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 13.928565 [2023-11-16T18:45:07Z ERROR openobserve::service::router] : Failed to connect to host: Internal error: connector has been disconnected [2023-11-16T18:45:07Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 13.915907 [2023-11-16T18:45:07Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 14.004568 [2023-11-16T18:45:07Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 13.919026 [2023-11-16T18:45:07Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 14.138667 [2023-11-16T18:45:07Z ERROR openobserve::service::router] : Failed to connect to host: Internal error: connector has been disconnected [2023-11-16T18:45:07Z ERROR openobserve::service::router] : Failed to connect to host: Internal error: connector has been disconnected [2023-11-16T18:45:07Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 13.976337 [2023-11-16T18:45:07Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 13.907410 [2023-11-16T18:45:07Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 13.945222 [2023-11-16T18:45:08Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/user/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T18:45:08Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/nodes/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T18:45:09Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/user/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T18:45:09Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/nodes/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T18:45:10Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/user/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} }``` and the pod goes into a OOM status and restarts. running nslookup openobserve-development-router.openobserve.svc.cluster.local gives ```Server: 172.20.0.10 Address: 172.20.0.10#53 Name: openobserve-development-router.openobserve.svc.cluster.local Address: 172.20.167.149``` and using that address seems to work fine ``` curl -X POST "" -H "Content-Type: application/json" -H "Authorization: Basic XXX" -d '[{"level":"info","job":"test","log":"test message for openobserve"}]' {"code":200,"status":[{"name":"default","successful":1,"failed":0}]}``` I'm also seeing a etcd-2 connectivity issue that is likely it's own issue: `openobserve-development-etcd-2 1/2 CrashLoopBackOff 531 (59s ago) 44h` Essentially I want to point the logstash output to `openobserve-development-router.openobserve.svc.cluster.local` to start getting real data from our microservices, but I'm not confident if the curl is failing ```output: |- if [fields][type] == "application-logs" { elasticsearch { hosts => [""] user => "" password => "${}" index => "application-logs-%{+YYYY.MM.dd}" ilm_enabled => false manage_template => false pool_max => 65536 } http { url => [""] format => "json" http_method => "post" content_type => "application/json" headers => ["Authorization", "Basic XXX"] mapping => { "@timestamp" => "%{[@timestamp]}" "source" => "%{[source]}" "tags" => "%{[tags]}" "logdate" => "%{[logdate]}" "level" => "%{[level]}" "thread" => "%{[thread]}" "class" => "%{[class]}" "line" => "%{[line]}" "msg" => "%{[msg]}" "server" => "%{[fields][service]}" "log_type" => "%{[fields][logType]}" "host_name" => "%{[host][name]}" } } }```

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 19:17:12 UTC

What version of OpenObserve are you using?

Photo of Dylan
Dylan
Thu, 16 Nov 2023 19:17:55 UTC

`0.7.0` ```- name: openobserve-{{ .Environment.Name }} chart: openobserve/openobserve version: 0.7.0 labels: service-name: openobserve is-elk: true namespace: {{ .Environment.Name }} values: - ./openobserve/values/{{ .Environment.Name }}.yaml```

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 19:18:35 UTC

brb

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 19:28:17 UTC

```[2023-11-16T18:45:07Z ERROR openobserve::service::router] : Failed to connect to host: Internal error: connector has been disconnected```

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 19:28:27 UTC

This basically indicates issue with etcd

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 19:29:10 UTC

Are you already ingesting lot of data ?

Photo of Dylan
Dylan
Thu, 16 Nov 2023 19:29:58 UTC

> Are you already ingesting lot of data ? Not yet, just sending test curls atm. Once the logstash output is sent to the router there will be a good amount of data ingested gotcha, I do have etcd-2 failing, a team member suggested looking into

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 19:30:29 UTC

yeah, etcd is turning out to be a problem

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 19:30:37 UTC

maintaining it is hard

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 19:30:46 UTC

We will look at replacing it in long term.

Photo of Dylan
Dylan
Thu, 16 Nov 2023 19:31:13 UTC

here's the error from etcd-2 ``` k --context=development -n openobserve logs openobserve-development-etcd-2 etcd 19:30:23.36 etcd 19:30:23.36 Welcome to the Bitnami etcd container etcd 19:30:23.36 Subscribe to project updates by watching etcd 19:30:23.36 Submit issues and feature requests at etcd 19:30:23.37 etcd 19:30:23.37 INFO ==> ** Starting etcd setup ** etcd 19:30:23.39 INFO ==> Validating settings in ETCD_* env vars.. etcd 19:30:23.40 WARN ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment. etcd 19:30:23.40 INFO ==> Initializing etcd etcd 19:30:23.40 INFO ==> Generating etcd config file using env variables etcd 19:30:23.42 INFO ==> Detected data from previous deployments etcd 19:30:23.58 INFO ==> Updating member in existing cluster {"level":"warn","ts":"2023-11-16T19:30:23.64264Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: member not found"} Error: etcdserver: member not found```

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 19:58:33 UTC

try deleting the etcd pod along with its pvc

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 19:58:36 UTC

that should help

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 19:58:42 UTC

its not able to join the cluster

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 19:58:54 UTC

you have only 1 etcd pod up and running right now. right?

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:00:10 UTC

so what seems to have happened is that etcd info of the crashing pod is out of sync with the cluster for whatever be the reason

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:00:30 UTC

recreating the pod along with the associated pvc should rectify that

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:01:17 UTC

The ns looks like this, only 1/2 etcd-2 seems to be failing. I'll try deleting the etcd pod and pvc like you suggested ```NAME READY STATUS RESTARTS AGE devops-shell-20231116172852 1/1 Running 0 150m openobserve-development-alertmanager-5d6679958d-dq6hk 2/2 Running 3 (46h ago) 46h openobserve-development-compactor-7cb5cf5bb-txs9v 2/2 Running 2 (46h ago) 46h openobserve-development-etcd-0 2/2 Running 0 47h openobserve-development-etcd-1 2/2 Running 0 47h openobserve-development-etcd-2 1/2 CrashLoopBackOff 552 (3m35s ago) 46h openobserve-development-ingester-0 2/2 Running 2 (46h ago) 46h openobserve-development-querier-6b446774bb-vg4cj 2/2 Running 2 (46h ago) 46h openobserve-development-router-8669cfc86d-wr45p 2/2 Running 15 (73m ago) 46h```

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:01:47 UTC

so much better. only 1 pod failing

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:02:08 UTC

delete the etcd-2 pod along with the pvc

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:02:50 UTC

`552 restarts` :eyes:

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:02:59 UTC

:laughing:

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:04:20 UTC

Let's solve the etcd problem first and then we can look at the connectvity problem

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:12:07 UTC

alright pd and pvc were terminated and the new one's are booting up

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:15:36 UTC

quite interesting, now etcd-0 has that same error. etcd-1 and etcd-0 terminated and restarted automatically ``` k --context=development -n openobserve logs openobserve-development-etcd-0 etcd 20:14:53.64 etcd 20:14:53.64 Welcome to the Bitnami etcd container etcd 20:14:53.64 Subscribe to project updates by watching etcd 20:14:53.64 Submit issues and feature requests at etcd 20:14:53.65 etcd 20:14:53.65 INFO ==> ** Starting etcd setup ** etcd 20:14:53.66 INFO ==> Validating settings in ETCD_* env vars.. etcd 20:14:53.67 WARN ==> You set the environment variable ALLOW_NONE_AUTHENTICATION=yes. For safety reasons, do not use this flag in a production environment. etcd 20:14:53.67 INFO ==> Initializing etcd etcd 20:14:53.67 INFO ==> Generating etcd config file using env variables etcd 20:14:53.69 INFO ==> Detected data from previous deployments etcd 20:14:53.89 INFO ==> Updating member in existing cluster {"level":"warn","ts":"2023-11-16T20:14:53.994342Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"","attempt":0,"error":"rpc error: code = NotFound desc = etcdserver: member not found"} Error: etcdserver: member not found```

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:16:05 UTC

:sigh: :laughing:

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:16:48 UTC

``` k --context=development get pods -n openobserve -w 1 ↵ droberts@RM-NB-DROB2 NAME READY STATUS RESTARTS AGE devops-shell-20231116172852 1/1 Running 0 167m openobserve-development-alertmanager-5d6679958d-dq6hk 2/2 Running 3 (47h ago) 47h openobserve-development-compactor-7cb5cf5bb-txs9v 2/2 Running 2 (47h ago) 47h openobserve-development-etcd-0 1/2 CrashLoopBackOff 4 (58s ago) 2m49s openobserve-development-etcd-1 2/2 Running 1 (3m57s ago) 4m13s openobserve-development-etcd-2 2/2 Running 0 5m43s openobserve-development-ingester-0 2/2 Running 2 (47h ago) 47h openobserve-development-querier-6b446774bb-vg4cj 2/2 Running 2 (47h ago) 47h openobserve-development-router-8669cfc86d-wr45p 2/2 Running 15 (90m ago) 47h```

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:16:49 UTC

You did not delete etcd-0 though. It happened on its own after you tried fixing etcd-2. is that right?

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:16:55 UTC

yep

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:18:05 UTC

Try doing the same with etcd-0

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:18:11 UTC

:+1:

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:18:12 UTC

delete pvc and pod

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:20:19 UTC

alright we're in the rebooting process

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:21:30 UTC

looking good atm ```AME READY STATUS RESTARTS AGE devops-shell-20231116172852 1/1 Running 0 172m openobserve-development-alertmanager-5d6679958d-dq6hk 2/2 Running 3 (47h ago) 47h openobserve-development-compactor-7cb5cf5bb-txs9v 2/2 Running 2 (47h ago) 47h openobserve-development-etcd-0 2/2 Running 0 95s openobserve-development-etcd-1 2/2 Running 1 (8m32s ago) 8m48s openobserve-development-etcd-2 2/2 Running 0 10m openobserve-development-ingester-0 2/2 Running 2 (47h ago) 47h openobserve-development-querier-6b446774bb-vg4cj 2/2 Running 2 (47h ago) 47h openobserve-development-router-8669cfc86d-wr45p 2/2 Running 15 (95m ago) 47h```

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:22:33 UTC

Now let's run `curl`

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:23:16 UTC

yep looking like the same output, OOMKILLED status for the router

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:23:21 UTC

lemme check the logs

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:23:35 UTC

then give it more memory

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:24:18 UTC

that should fix it

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:24:33 UTC

what is the current limit ?

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:24:50 UTC

for router

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:26:50 UTC

pretty low amount req ``` Limits: cpu: 2 memory: 1Gi Requests: cpu: 10m memory: 40Mi```

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:27:38 UTC

that is generous enough limit

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:28:21 UTC

Is it still getting OOM killed?

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:29:29 UTC

no, happily running, and like previously ```Works curl -X POST "" -H "Content-Type: application/json" -H "Authorization: Basic XXX" -d '[{"level":"info","job":"test","log":"test message for openobserve"}]' {"code":200,"status":[{"name":"default","successful":1,"failed":0}] OOM curl -X POST "openobserve-development-router.openobserve.svc.cluster.local:5080/api/default/default/_json" -H "Authorization: Basic XXX" -H "Content-Type: application/json" -d '[{"level":"info","job":"test","log":"test message for openobserve"}]' curl: (52) Empty reply from server```

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:32:01 UTC

`helm -n openobserve ls` what is the output of this ?

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:32:23 UTC

and `kubectl -n openobserve get svc`

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:32:55 UTC

locally it is ```AME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION openobserve-development openobserve 2 2023-11-14 16:15:51.923371 -0500 EST failed openobserve-0.7.0 v0.7.0 kubectl -n openobserve get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE openobserve-development-alertmanager ClusterIP 172.20.53.221 <none> 5080/TCP 47h openobserve-development-compactor ClusterIP 172.20.16.64 <none> 5080/TCP 47h openobserve-development-etcd ClusterIP 172.20.110.205 <none> 2379/TCP,2380/TCP 47h openobserve-development-etcd-headless ClusterIP None <none> 2379/TCP,2380/TCP 47h openobserve-development-ingester ClusterIP 172.20.196.128 <none> 5080/TCP 47h openobserve-development-querier ClusterIP 172.20.75.176 <none> 5080/TCP 47h openobserve-development-router ClusterIP 172.20.167.149 <none> 5080/TCP 47h```

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:33:19 UTC

`status=failed` :hmm:

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:34:02 UTC

> `status=failed` :hmm: of what ?

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:34:08 UTC

ah got it

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:34:15 UTC

I see it

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:34:23 UTC

maybe you wnat to reinstall from scratch

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:34:41 UTC

something is wrong with your installation

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:34:53 UTC

probably will be a lot easier to just reinstall

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:35:01 UTC

gotcha, will do

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:53:10 UTC

I reinstalled and now seeing ```NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION openobserve-development openobserve 1 2023-11-16 15:43:56.23242 -0500 EST deployed openobserve-0.7.0 v0.7.0``` ``` k --context=development get pods -n openobserve -w 1 ↵ droberts@RM-NB-DROB2 NAME READY STATUS RESTARTS AGE devops-shell-20231116172852 1/1 Running 0 3h21m openobserve-development-alertmanager-5d6679958d-xp2xs 2/2 Running 2 (6m32s ago) 6m40s openobserve-development-compactor-7cb5cf5bb-l89mp 2/2 Running 2 (6m32s ago) 6m40s openobserve-development-etcd-0 2/2 Running 0 6m40s openobserve-development-etcd-1 2/2 Running 0 6m40s openobserve-development-etcd-2 2/2 Running 0 6m40s openobserve-development-ingester-0 2/2 Running 2 (6m28s ago) 6m40s openobserve-development-querier-6b446774bb-x7g89 2/2 Running 2 (6m32s ago) 6m40s openobserve-development-router-8669cfc86d-gtddg 2/2 Running 3 (3m25s ago) 6m40s``` but unfortunately I'm seeing the same behavior, ```#works curl -X POST "" -H "Content-Type: application/json" -H "Authorization: Basic cm9vdEBleGFtcGxlLmNvbTpvSHpleUhIb0syeFFPRWNZ" -d '[{"level":"info","job":"test","log":"test message for openobserve"}]' {"code":200,"status":[{"name":"default","successful":1,"failed":0}] #Router dying w/OOM curl -X POST " -H "Content-Type: application/json" -H "Authorization: Basic cm9vdEBleGFtcGxlLmNvbTpvSHpleUhIb0syeFFPRWNZ" -d '[{"level":"info","job":"test","log":"test message for openobserve"}]'``` ```Router logs [2023-11-16T20:47:12Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 13.940106 [2023-11-16T20:47:12Z ERROR openobserve::service::router] : Failed to connect to host: Internal error: connector has been disconnected [2023-11-16T20:47:12Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 13.965626 [2023-11-16T20:47:12Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 13.967006 [2023-11-16T20:47:12Z ERROR openobserve::service::router] : Failed to connect to host: Internal error: connector has been disconnected [2023-11-16T20:47:12Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 74 "68" "-" "curl/7.68.0" 13.974808 [2023-11-16T20:47:12Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 13.968502 [2023-11-16T20:47:12Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 13.935848 [2023-11-16T20:47:12Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/nodes/, get message error: grpc request error: status: Unknown, message: "h2 protocol error: error reading a body from connection: stream closed because of a broken pipe", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:12Z ERROR openobserve::service::router] : Failed to connect to host: Internal error: connector has been disconnected [2023-11-16T20:47:12Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/user/, get message error: grpc request error: status: Unknown, message: "h2 protocol error: error reading a body from connection: stream closed because of a broken pipe", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:12Z INFO actix_web::middleware::logger] 127.0.0.6 "POST /api/default/default/_json HTTP/1.1" 503 58 "-" "-" "curl/7.68.0" 13.929394 [2023-11-16T20:47:12Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/nodes/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:12Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/user/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:13Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/nodes/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:13Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/user/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:14Z INFO actix_web::middleware::logger] 127.0.0.1 "GET /web/assets/AppMetrics.2a15a753.js HTTP/1.1" 200 271 "-" "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000153 [2023-11-16T20:47:14Z ERROR openobserve::service::router] : Failed to connect to host: Connection refused (os error 111) [2023-11-16T20:47:14Z INFO actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs HTTP/1.1" 503 72 "-" "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.001674 [2023-11-16T20:47:14Z ERROR openobserve::service::router] : Failed to connect to host: Connection refused (os error 111) [2023-11-16T20:47:14Z INFO actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=metrics&fetchSchema=true HTTP/1.1" 503 72 "-" "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000282 [2023-11-16T20:47:14Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/nodes/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:14Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/user/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:15Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/nodes/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:15Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/user/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:16Z ERROR openobserve::common::infra::db::etcd] lease 2174829437960054293 keep alive do keeper error: LeaseKeepAliveError("channel closed") [2023-11-16T20:47:16Z ERROR openobserve::common::infra::db::etcd] lease 2174829437960054293 keep alive error: GRpcStatus(Status { code: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", source: Some(tonic::transport::Error(Transport, hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })))) }) [2023-11-16T20:47:16Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/nodes/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:16Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/user/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:17Z ERROR openobserve::common::infra::db::etcd] lease 2174829437960054293 keep alive error: GRpcStatus(Status { code: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", source: Some(tonic::transport::Error(Transport, hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })))) }) [2023-11-16T20:47:17Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/nodes/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:17Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/user/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:18Z ERROR openobserve::common::infra::db::etcd] lease 2174829437960054293 keep alive error: GRpcStatus(Status { code: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", source: Some(tonic::transport::Error(Transport, hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })))) }) [2023-11-16T20:47:18Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/nodes/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:18Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/user/, error: grpc request error: status: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", details: [], metadata: MetadataMap { headers: {} } [2023-11-16T20:47:19Z ERROR openobserve::common::infra::db::etcd] lease 2174829437960054293 keep alive error: GRpcStatus(Status { code: Unavailable, message: "error trying to connect: tcp connect error: Connection refused (os error 111)", source: Some(tonic::transport::Error(Transport, hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })))) })```

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:53:23 UTC

:confused:

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:54:07 UTC

2 things

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:54:16 UTC

1. Your installation is good now

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:54:27 UTC

2. you seem to have 2 containers for each pod.

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:54:33 UTC

are you using a service mesh?

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:55:11 UTC

yes istio

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:55:25 UTC

istio might have something to do with connectivity

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:55:33 UTC

hmm gotcha

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:55:59 UTC

it intercepts everything

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:56:10 UTC

I have not tested with istio or any other service mesh for now

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:56:41 UTC

gotcha, but why would the router die? It looks like it's receiving something

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:58:17 UTC

router has trouble connecting to etcd

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:58:23 UTC

`[2023-11-16T20:47:12Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/nodes/, get message error: grpc request error: status: Unknown, message: "h2 protocol error: error reading a body from connection: stream closed because of a broken pipe", details: [], metadata: MetadataMap { headers: {} }`

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:58:50 UTC

Can you disable istio for this namespace to make sure that that is not what is causing the issue

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 20:59:02 UTC

I have a very strong feeling that istio is the trouble here

Photo of Dylan
Dylan
Thu, 16 Nov 2023 20:59:20 UTC

Gotcha, I'll need to check in w/my team on that will get back to you

Photo of Dylan
Dylan
Thu, 16 Nov 2023 22:19:27 UTC

I haven't heard back from my team but I'd just like to thank you Prabhat for all the help in the past few days, very cool project and I'm excited to use it as a potential replacement for our ELK stack!

Photo of Dylan
Dylan
Thu, 16 Nov 2023 22:20:11 UTC

and just as a note I'll be on vacation for the next 2 weeks, if you don't hear from me thats why :slightly_smiling_face:

Photo of Prabhat
Prabhat
Thu, 16 Nov 2023 22:25:10 UTC

Thank you Dylan and enjoy your vacation. Happy thanksgiving.