#general

HA Deployment Problem in EKS Cluster

TLDR Dylan was having issues setting up an HA deployment in their EKS cluster. Prabhat and Hengfei advised them to update to a newer version (0.7.0) which fixed the problem.

Powered by Struct AI

1

1

1

22
2w
Solved
Join the chat
Nov 14, 2023 (2 weeks ago)
Dylan
Photo of md5-70a65c96b293c1aafa37eaaefeaefa64
Dylan
08:59 PM
Hey guys, my team is running into a problem when attempting to setup the HA deployment in our eks cluster. It's looking like every pod is running smoothly and we've PF'd to the router ui. We are seeing this failing req (screenshot) when trying to view our sample data we successfully posted to the router.

It looks like it's trying to retrieve the schema from localhost so looks like there might be some sort of communication failure between the router and querier.

Any ideas? Thanks!


helmfile.yaml
- name: openobserve-{{ .Environment.Name }}
  chart: openobserve/openobserve
  version: 0.6.4
  labels:
    service-name: openobserve
    is-elk: true
  namespace: {{ .Environment.Name }}
  values:
  - ./openobserve/values/{{ .Environment.Name }}.yaml

dev.yaml
config:
  ZO_S3_BUCKET_NAME: ""

serviceAccount:
  annotations:
    : arn:aws:iam:::```
kubectl --context development -n openobserve logs openobserve-development-router-7689f4c577-c2v7q | grep 503
[2023-11-14T20:48:11Z INFO actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&fetchSchema=true HTTP/1.1" 503 43 "-" "http://localhost:5080/web/logs?stream=default&period=15m&refresh=0&org_identifier=default&sql_mode=false" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000134
[2023-11-14T20:48:13Z INFO actix_web::middleware::logger] 127.0.0.1 "GET /api/default/summary HTTP/1.1" 503 43 "-" "http://localhost:5080/web/?org_identifier=default" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000230
[2023-11-14T20:48:14Z INFO actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&fetchSchema=true HTTP/1.1" 503 43 "-" "http://localhost:5080/web/logs?org_identifier=default" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000150
[2023-11-14T20:48:19Z INFO actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&fetchSchema=true HTTP/1.1" 503 43 "-" "http://localhost:5080/web/logs?org_identifier=default" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000212
[2023-11-14T20:48:38Z INFO actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&fetchSchema=true HTTP/1.1" 503 43 "-" "http://localhost:5080/web/logs?org_identifier=default" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000303
[2023-11-14T20:49:06Z INFO actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&fetchSchema=true HTTP/1.1" 503 43 "-" "http://localhost:5080/web/logs?org_identifier=default" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000210
[2023-11-14T20:49:12Z INFO actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs HTTP/1.1" 503 43 "-" "http://localhost:5080/web/metrics?org_identifier=default" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000197
[2023-11-14T20:49:12Z INFO actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=metrics&fetchSchema=true HTTP/1.1" 503 43 "-" "http://localhost:5080/web/metrics?org_identifier=default" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000182
[2023-11-14T20:49:13Z INFO actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&fetchSchema=true HTTP/1.1" 503 43 "-" "http://localhost:5080/web/logs?org_identifier=default" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000191

kubectl --context development -n openobserve logs openobserve-development-querier-5f87588bcc-bd8bz
[2023-11-14T20:47:26Z INFO openobserve] Starting OpenObserve v0.6.4
[2023-11-14T20:47:26Z INFO openobserve] System info: CPU cores 64, MEM total 511451 MB, Disk total 511 GB, free 84 GB
[2023-11-14T20:47:27Z INFO openobserve::common::infra::cluster] Start watching node_list
[2023-11-14T20:47:27Z INFO openobserve::common::infra::cluster] [CLUSTER] Register to cluster ok
[2023-11-14T20:47:27Z INFO openobserve::common::infra::cluster] [CLUSTER] join Node { id: 3, uuid: "b9b83ece-2f42-4b06-8355-444f5861ab43", name: "openobserve-development-router-7689f4c577-c2v7q", http_addr: "http://10.19.226.124:5080", grpc_addr: "http://10.19.226.124:5081", role: [Router], cpu_num: 64, status: Prepare, broadcasted: false }
[2023-11-14T20:47:27Z INFO openobserve::common::infra::cluster] [CLUSTER] join Node { id: 4, uuid: "53f45f30-46e9-4302-a603-426f3cbc44ee", name: "openobserve-development-alertmanager-687dfc4cd5-77mfc", http_addr: "http://10.19.226.240:5080", grpc_addr: "http://10.19.226.240:5081", role: [AlertManager], cpu_num: 64, status: Prepare, broadcasted: false }
[2023-11-14T20:47:28Z INFO openobserve::service::db::user] Start watching user
[2023-11-14T20:47:28Z INFO openobserve::service::db::user] Users Cached
[2023-11-14T20:47:28Z INFO openobserve::service::db::functions] Start watching function
[2023-11-14T20:47:28Z INFO openobserve::service::db::metrics] Start watching prometheus cluster leader
[2023-11-14T20:47:28Z INFO openobserve::service::db::schema] Start watching stream schema
[2023-11-14T20:47:28Z INFO openobserve::service::db::compact::retention] Start watching stream deleting
[2023-11-14T20:47:28Z INFO openobserve::service::db::triggers] Start watching Triggers
[2023-11-14T20:47:28Z INFO openobserve::service::db::alerts::destinations] Start watching alert destinations
[2023-11-14T20:47:28Z INFO openobserve::service::db::alerts] Start watching alerts
[2023-11-14T20:47:28Z INFO openobserve::service::db::alerts::templates] Start watching alert templates
[2023-11-14T20:47:28Z INFO openobserve::service::db::schema] Stream schemas Cached
[2023-11-14T20:47:28Z INFO openobserve::service::db::functions] Functions Cached
[2023-11-14T20:47:28Z INFO openobserve::service::db::metrics] Prometheus cluster leaders Cached
[2023-11-14T20:47:28Z INFO openobserve::service::db::alerts::templates] Alert templates Cached
[2023-11-14T20:47:28Z INFO openobserve::service::db::alerts::destinations] Alert destinations Cached
[2023-11-14T20:47:28Z INFO openobserve::service::db::alerts] Alerts Cached
[2023-11-14T20:47:28Z INFO openobserve::service::db::triggers] Triggers Cached
[2023-11-14T20:47:28Z INFO openobserve::service::db::syslog] SyslogRoutes Cached
[2023-11-14T20:47:28Z INFO openobserve::service::db::syslog] SyslogServer settings Cached
[2023-11-14T20:47:28Z INFO object_store::aws] Using WebIdentity credential provider
[2023-11-14T20:47:28Z INFO openobserve::common::infra::cluster] [CLUSTER] join Node { id: 3, uuid: "b9b83ece-2f42-4b06-8355-444f5861ab43", name: "openobserve-development-router-7689f4c577-c2v7q", http_addr: "http://10.19.226.124:5080", grpc_addr: "http://10.19.226.124:5081", role: [Router], cpu_num: 64, status: Online, broadcasted: false }
[2023-11-14T20:47:28Z INFO openobserve::service::db::file_list::remote] Load file_list [file_list/] gets 0 files
[2023-11-14T20:47:28Z INFO openobserve::common::infra::cluster] [CLUSTER] join Node { id: 4, uuid: "53f45f30-46e9-4302-a603-426f3cbc44ee", name: "openobserve-development-alertmanager-687dfc4cd5-77mfc", http_addr: "http://10.19.226.240:5080", grpc_addr: "http://10.19.226.240:5081", role: [AlertManager], cpu_num: 64, status: Online, broadcasted: false }
[2023-11-14T20:47:41Z INFO openobserve::common::infra::cluster] [CLUSTER] join Node { id: 5, uuid: "be53aeed-6ff7-440d-a358-f525f316baf8", name: "openobserve-development-ingester-0", http_addr: "http://10.19.226.138:5080", grpc_addr: "http://10.19.226.138:5081", role: [Ingester], cpu_num: 64, status: Prepare, broadcasted: false }
[2023-11-14T20:47:42Z INFO openobserve::common::infra::cluster] [CLUSTER] join Node { id: 5, uuid: "be53aeed-6ff7-440d-a358-f525f316baf8", name: "openobserve-development-ingester-0", http_addr: "http://10.19.226.138:5080", grpc_addr: "http://10.19.226.138:5081", role: [Ingester], cpu_num: 64, status: Online, broadcasted: false }

kubectl --context development -n openobserve logs openobserve-development-router-7689f4c577-c2v7q | grep CLUSTER
[2023-11-14T20:47:27Z INFO openobserve::common::infra::cluster] [CLUSTER] Register to cluster ok
[2023-11-14T20:47:27Z INFO openobserve::common::infra::cluster] [CLUSTER] join Node { id: 4, uuid: "53f45f30-46e9-4302-a603-426f3cbc44ee", name: "openobserve-development-alertmanager-687dfc4cd5-77mfc", http_addr: "http://10.19.226.240:5080", grpc_addr: "http://10.19.226.240:5081", role: [AlertManager], cpu_num: 64, status: Prepare, broadcasted: false }
[2023-11-14T20:47:28Z INFO openobserve::common::infra::cluster] [CLUSTER] join Node { id: 3, uuid: "b9b83ece-2f42-4b06-8355-444f5861ab43", name: "openobserve-development-router-7689f4c577-c2v7q", http_addr: "http://10.19.226.124:5080", grpc_addr: "http://10.19.226.124:5081", role: [Router], cpu_num: 64, status: Online, broadcasted: false }
[2023-11-14T20:47:28Z INFO openobserve::common::infra::cluster] [CLUSTER] join Node { id: 4, uuid: "53f45f30-46e9-4302-a603-426f3cbc44ee", name: "openobserve-development-alertmanager-687dfc4cd5-77mfc", http_addr: "http://10.19.226.240:5080", grpc_addr: "http://10.19.226.240:5081", role: [AlertManager], cpu_num: 64, status: Online, broadcasted: false }
[2023-11-14T20:47:41Z INFO openobserve::common::infra::cluster] [CLUSTER] join Node { id: 5, uuid: "be53aeed-6ff7-440d-a358-f525f316baf8", name: "openobserve-development-ingester-0", http_addr: "http://10.19.226.138:5080", grpc_addr: "http://10.19.226.138:5081", role: [Ingester], cpu_num: 64, status: Prepare, broadcasted: false }
[2023-11-14T20:47:42Z INFO openobserve::common::infra::cluster] [CLUSTER] join Node { id: 5, uuid: "be53aeed-6ff7-440d-a358-f525f316baf8", name: "openobserve-development-ingester-0", http_addr: "http://10.19.226.138:5080", grpc_addr: "http://10.19.226.138:5081", role: [Ingester], cpu_num: 64, status: Online, broadcasted: false }```
Image 1 for Hey guys, my team is running into a problem when attempting to setup the HA deployment in our eks cluster. It's looking like every pod is running smoothly and we've PF'd to the router ui. We are seeing this failing req (screenshot) when trying to view our sample data we successfully posted to the router.

It looks like it's trying to retrieve the schema from localhost so looks like there might be some sort of communication failure between the router and querier.

Any ideas? Thanks!


```helmfile.yaml
- name: openobserve-{{ .Environment.Name }}
  chart: openobserve/openobserve
  version: 0.6.4
  labels:
    service-name: openobserve
    is-elk: true
  namespace: {{ .Environment.Name }}
  values:
  - ./openobserve/values/{{ .Environment.Name }}.yaml

dev.yaml
config:
  ZO_S3_BUCKET_NAME: ""

serviceAccount:
  annotations:
    <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam:::```

```kubectl --context development -n openobserve logs openobserve-development-router-7689f4c577-c2v7q | grep 503
[2023-11-14T20:48:11Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&amp;fetchSchema=true HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/logs?stream=default&amp;period=15m&amp;refresh=0&amp;org_identifier=default&amp;sql_mode=false>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000134
[2023-11-14T20:48:13Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/summary HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000230
[2023-11-14T20:48:14Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&amp;fetchSchema=true HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/logs?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000150
[2023-11-14T20:48:19Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&amp;fetchSchema=true HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/logs?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000212
[2023-11-14T20:48:38Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&amp;fetchSchema=true HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/logs?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000303
[2023-11-14T20:49:06Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&amp;fetchSchema=true HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/logs?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000210
[2023-11-14T20:49:12Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/metrics?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000197
[2023-11-14T20:49:12Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=metrics&amp;fetchSchema=true HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/metrics?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000182
[2023-11-14T20:49:13Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&amp;fetchSchema=true HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/logs?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000191```
```kubectl --context development -n openobserve logs openobserve-development-querier-5f87588bcc-bd8bz
[2023-11-14T20:47:26Z INFO  openobserve] Starting OpenObserve v0.6.4
[2023-11-14T20:47:26Z INFO  openobserve] System info: CPU cores 64, MEM total 511451 MB, Disk total 511 GB, free 84 GB
[2023-11-14T20:47:27Z INFO  openobserve::common::infra::cluster] Start watching node_list
[2023-11-14T20:47:27Z INFO  openobserve::common::infra::cluster] [CLUSTER] Register to cluster ok
[2023-11-14T20:47:27Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 3, uuid: "b9b83ece-2f42-4b06-8355-444f5861ab43", name: "openobserve-development-router-7689f4c577-c2v7q", http_addr: "<http://10.19.226.124:5080>", grpc_addr: "<http://10.19.226.124:5081>", role: [Router], cpu_num: 64, status: Prepare, broadcasted: false }
[2023-11-14T20:47:27Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 4, uuid: "53f45f30-46e9-4302-a603-426f3cbc44ee", name: "openobserve-development-alertmanager-687dfc4cd5-77mfc", http_addr: "<http://10.19.226.240:5080>", grpc_addr: "<http://10.19.226.240:5081>", role: [AlertManager], cpu_num: 64, status: Prepare, broadcasted: false }
[2023-11-14T20:47:28Z INFO  openobserve::service::db::user] Start watching user
[2023-11-14T20:47:28Z INFO  openobserve::service::db::user] Users Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::functions] Start watching function
[2023-11-14T20:47:28Z INFO  openobserve::service::db::metrics] Start watching prometheus cluster leader
[2023-11-14T20:47:28Z INFO  openobserve::service::db::schema] Start watching stream schema
[2023-11-14T20:47:28Z INFO  openobserve::service::db::compact::retention] Start watching stream deleting
[2023-11-14T20:47:28Z INFO  openobserve::service::db::triggers] Start watching Triggers
[2023-11-14T20:47:28Z INFO  openobserve::service::db::alerts::destinations] Start watching alert destinations
[2023-11-14T20:47:28Z INFO  openobserve::service::db::alerts] Start watching alerts
[2023-11-14T20:47:28Z INFO  openobserve::service::db::alerts::templates] Start watching alert templates
[2023-11-14T20:47:28Z INFO  openobserve::service::db::schema] Stream schemas Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::functions] Functions Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::metrics] Prometheus cluster leaders Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::alerts::templates] Alert templates Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::alerts::destinations] Alert destinations Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::alerts] Alerts Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::triggers] Triggers Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::syslog] SyslogRoutes Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::syslog] SyslogServer settings Cached
[2023-11-14T20:47:28Z INFO  object_store::aws] Using WebIdentity credential provider
[2023-11-14T20:47:28Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 3, uuid: "b9b83ece-2f42-4b06-8355-444f5861ab43", name: "openobserve-development-router-7689f4c577-c2v7q", http_addr: "<http://10.19.226.124:5080>", grpc_addr: "<http://10.19.226.124:5081>", role: [Router], cpu_num: 64, status: Online, broadcasted: false }
[2023-11-14T20:47:28Z INFO  openobserve::service::db::file_list::remote] Load file_list [file_list/] gets 0 files
[2023-11-14T20:47:28Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 4, uuid: "53f45f30-46e9-4302-a603-426f3cbc44ee", name: "openobserve-development-alertmanager-687dfc4cd5-77mfc", http_addr: "<http://10.19.226.240:5080>", grpc_addr: "<http://10.19.226.240:5081>", role: [AlertManager], cpu_num: 64, status: Online, broadcasted: false }
[2023-11-14T20:47:41Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 5, uuid: "be53aeed-6ff7-440d-a358-f525f316baf8", name: "openobserve-development-ingester-0", http_addr: "<http://10.19.226.138:5080>", grpc_addr: "<http://10.19.226.138:5081>", role: [Ingester], cpu_num: 64, status: Prepare, broadcasted: false }
[2023-11-14T20:47:42Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 5, uuid: "be53aeed-6ff7-440d-a358-f525f316baf8", name: "openobserve-development-ingester-0", http_addr: "<http://10.19.226.138:5080>", grpc_addr: "<http://10.19.226.138:5081>", role: [Ingester], cpu_num: 64, status: Online, broadcasted: false }```
```kubectl --context development -n openobserve logs openobserve-development-router-7689f4c577-c2v7q | grep CLUSTER
[2023-11-14T20:47:27Z INFO  openobserve::common::infra::cluster] [CLUSTER] Register to cluster ok
[2023-11-14T20:47:27Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 4, uuid: "53f45f30-46e9-4302-a603-426f3cbc44ee", name: "openobserve-development-alertmanager-687dfc4cd5-77mfc", http_addr: "<http://10.19.226.240:5080>", grpc_addr: "<http://10.19.226.240:5081>", role: [AlertManager], cpu_num: 64, status: Prepare, broadcasted: false }
[2023-11-14T20:47:28Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 3, uuid: "b9b83ece-2f42-4b06-8355-444f5861ab43", name: "openobserve-development-router-7689f4c577-c2v7q", http_addr: "<http://10.19.226.124:5080>", grpc_addr: "<http://10.19.226.124:5081>", role: [Router], cpu_num: 64, status: Online, broadcasted: false }
[2023-11-14T20:47:28Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 4, uuid: "53f45f30-46e9-4302-a603-426f3cbc44ee", name: "openobserve-development-alertmanager-687dfc4cd5-77mfc", http_addr: "<http://10.19.226.240:5080>", grpc_addr: "<http://10.19.226.240:5081>", role: [AlertManager], cpu_num: 64, status: Online, broadcasted: false }
[2023-11-14T20:47:41Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 5, uuid: "be53aeed-6ff7-440d-a358-f525f316baf8", name: "openobserve-development-ingester-0", http_addr: "<http://10.19.226.138:5080>", grpc_addr: "<http://10.19.226.138:5081>", role: [Ingester], cpu_num: 64, status: Prepare, broadcasted: false }
[2023-11-14T20:47:42Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 5, uuid: "be53aeed-6ff7-440d-a358-f525f316baf8", name: "openobserve-development-ingester-0", http_addr: "<http://10.19.226.138:5080>", grpc_addr: "<http://10.19.226.138:5081>", role: [Ingester], cpu_num: 64, status: Online, broadcasted: false }```
Image 2 for Hey guys, my team is running into a problem when attempting to setup the HA deployment in our eks cluster. It's looking like every pod is running smoothly and we've PF'd to the router ui. We are seeing this failing req (screenshot) when trying to view our sample data we successfully posted to the router.

It looks like it's trying to retrieve the schema from localhost so looks like there might be some sort of communication failure between the router and querier.

Any ideas? Thanks!


```helmfile.yaml
- name: openobserve-{{ .Environment.Name }}
  chart: openobserve/openobserve
  version: 0.6.4
  labels:
    service-name: openobserve
    is-elk: true
  namespace: {{ .Environment.Name }}
  values:
  - ./openobserve/values/{{ .Environment.Name }}.yaml

dev.yaml
config:
  ZO_S3_BUCKET_NAME: ""

serviceAccount:
  annotations:
    <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam:::```

```kubectl --context development -n openobserve logs openobserve-development-router-7689f4c577-c2v7q | grep 503
[2023-11-14T20:48:11Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&amp;fetchSchema=true HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/logs?stream=default&amp;period=15m&amp;refresh=0&amp;org_identifier=default&amp;sql_mode=false>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000134
[2023-11-14T20:48:13Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/summary HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000230
[2023-11-14T20:48:14Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&amp;fetchSchema=true HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/logs?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000150
[2023-11-14T20:48:19Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&amp;fetchSchema=true HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/logs?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000212
[2023-11-14T20:48:38Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&amp;fetchSchema=true HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/logs?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000303
[2023-11-14T20:49:06Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&amp;fetchSchema=true HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/logs?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000210
[2023-11-14T20:49:12Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/metrics?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000197
[2023-11-14T20:49:12Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=metrics&amp;fetchSchema=true HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/metrics?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000182
[2023-11-14T20:49:13Z INFO  actix_web::middleware::logger] 127.0.0.1 "GET /api/default/streams?type=logs&amp;fetchSchema=true HTTP/1.1" 503 43 "-" "<http://localhost:5080/web/logs?org_identifier=default>" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36" 0.000191```
```kubectl --context development -n openobserve logs openobserve-development-querier-5f87588bcc-bd8bz
[2023-11-14T20:47:26Z INFO  openobserve] Starting OpenObserve v0.6.4
[2023-11-14T20:47:26Z INFO  openobserve] System info: CPU cores 64, MEM total 511451 MB, Disk total 511 GB, free 84 GB
[2023-11-14T20:47:27Z INFO  openobserve::common::infra::cluster] Start watching node_list
[2023-11-14T20:47:27Z INFO  openobserve::common::infra::cluster] [CLUSTER] Register to cluster ok
[2023-11-14T20:47:27Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 3, uuid: "b9b83ece-2f42-4b06-8355-444f5861ab43", name: "openobserve-development-router-7689f4c577-c2v7q", http_addr: "<http://10.19.226.124:5080>", grpc_addr: "<http://10.19.226.124:5081>", role: [Router], cpu_num: 64, status: Prepare, broadcasted: false }
[2023-11-14T20:47:27Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 4, uuid: "53f45f30-46e9-4302-a603-426f3cbc44ee", name: "openobserve-development-alertmanager-687dfc4cd5-77mfc", http_addr: "<http://10.19.226.240:5080>", grpc_addr: "<http://10.19.226.240:5081>", role: [AlertManager], cpu_num: 64, status: Prepare, broadcasted: false }
[2023-11-14T20:47:28Z INFO  openobserve::service::db::user] Start watching user
[2023-11-14T20:47:28Z INFO  openobserve::service::db::user] Users Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::functions] Start watching function
[2023-11-14T20:47:28Z INFO  openobserve::service::db::metrics] Start watching prometheus cluster leader
[2023-11-14T20:47:28Z INFO  openobserve::service::db::schema] Start watching stream schema
[2023-11-14T20:47:28Z INFO  openobserve::service::db::compact::retention] Start watching stream deleting
[2023-11-14T20:47:28Z INFO  openobserve::service::db::triggers] Start watching Triggers
[2023-11-14T20:47:28Z INFO  openobserve::service::db::alerts::destinations] Start watching alert destinations
[2023-11-14T20:47:28Z INFO  openobserve::service::db::alerts] Start watching alerts
[2023-11-14T20:47:28Z INFO  openobserve::service::db::alerts::templates] Start watching alert templates
[2023-11-14T20:47:28Z INFO  openobserve::service::db::schema] Stream schemas Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::functions] Functions Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::metrics] Prometheus cluster leaders Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::alerts::templates] Alert templates Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::alerts::destinations] Alert destinations Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::alerts] Alerts Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::triggers] Triggers Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::syslog] SyslogRoutes Cached
[2023-11-14T20:47:28Z INFO  openobserve::service::db::syslog] SyslogServer settings Cached
[2023-11-14T20:47:28Z INFO  object_store::aws] Using WebIdentity credential provider
[2023-11-14T20:47:28Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 3, uuid: "b9b83ece-2f42-4b06-8355-444f5861ab43", name: "openobserve-development-router-7689f4c577-c2v7q", http_addr: "<http://10.19.226.124:5080>", grpc_addr: "<http://10.19.226.124:5081>", role: [Router], cpu_num: 64, status: Online, broadcasted: false }
[2023-11-14T20:47:28Z INFO  openobserve::service::db::file_list::remote] Load file_list [file_list/] gets 0 files
[2023-11-14T20:47:28Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 4, uuid: "53f45f30-46e9-4302-a603-426f3cbc44ee", name: "openobserve-development-alertmanager-687dfc4cd5-77mfc", http_addr: "<http://10.19.226.240:5080>", grpc_addr: "<http://10.19.226.240:5081>", role: [AlertManager], cpu_num: 64, status: Online, broadcasted: false }
[2023-11-14T20:47:41Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 5, uuid: "be53aeed-6ff7-440d-a358-f525f316baf8", name: "openobserve-development-ingester-0", http_addr: "<http://10.19.226.138:5080>", grpc_addr: "<http://10.19.226.138:5081>", role: [Ingester], cpu_num: 64, status: Prepare, broadcasted: false }
[2023-11-14T20:47:42Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 5, uuid: "be53aeed-6ff7-440d-a358-f525f316baf8", name: "openobserve-development-ingester-0", http_addr: "<http://10.19.226.138:5080>", grpc_addr: "<http://10.19.226.138:5081>", role: [Ingester], cpu_num: 64, status: Online, broadcasted: false }```
```kubectl --context development -n openobserve logs openobserve-development-router-7689f4c577-c2v7q | grep CLUSTER
[2023-11-14T20:47:27Z INFO  openobserve::common::infra::cluster] [CLUSTER] Register to cluster ok
[2023-11-14T20:47:27Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 4, uuid: "53f45f30-46e9-4302-a603-426f3cbc44ee", name: "openobserve-development-alertmanager-687dfc4cd5-77mfc", http_addr: "<http://10.19.226.240:5080>", grpc_addr: "<http://10.19.226.240:5081>", role: [AlertManager], cpu_num: 64, status: Prepare, broadcasted: false }
[2023-11-14T20:47:28Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 3, uuid: "b9b83ece-2f42-4b06-8355-444f5861ab43", name: "openobserve-development-router-7689f4c577-c2v7q", http_addr: "<http://10.19.226.124:5080>", grpc_addr: "<http://10.19.226.124:5081>", role: [Router], cpu_num: 64, status: Online, broadcasted: false }
[2023-11-14T20:47:28Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 4, uuid: "53f45f30-46e9-4302-a603-426f3cbc44ee", name: "openobserve-development-alertmanager-687dfc4cd5-77mfc", http_addr: "<http://10.19.226.240:5080>", grpc_addr: "<http://10.19.226.240:5081>", role: [AlertManager], cpu_num: 64, status: Online, broadcasted: false }
[2023-11-14T20:47:41Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 5, uuid: "be53aeed-6ff7-440d-a358-f525f316baf8", name: "openobserve-development-ingester-0", http_addr: "<http://10.19.226.138:5080>", grpc_addr: "<http://10.19.226.138:5081>", role: [Ingester], cpu_num: 64, status: Prepare, broadcasted: false }
[2023-11-14T20:47:42Z INFO  openobserve::common::infra::cluster] [CLUSTER] join Node { id: 5, uuid: "be53aeed-6ff7-440d-a358-f525f316baf8", name: "openobserve-development-ingester-0", http_addr: "<http://10.19.226.138:5080>", grpc_addr: "<http://10.19.226.138:5081>", role: [Ingester], cpu_num: 64, status: Online, broadcasted: false }```
Prabhat
Photo of md5-23052f31f8f3c4b1bb3297fbc3a2aec5
Prabhat
09:01 PM
Are you trying to setup HA or local? You mentioned HA, but screenshots are for localhost?
09:02
Prabhat
09:02 PM
Also, you are using an older version (0.6.4). We shipped 0.7.0 which has a lot more error handling
09:03
Prabhat
09:03 PM
in HA mode you sometimes get an error in 0.6.4 due to a race condition between router and querier.
09:03
Prabhat
09:03 PM
Try restarting router pod. Generally this resolves the issue
Dylan
Photo of md5-70a65c96b293c1aafa37eaaefeaefa64
Dylan
09:03 PM
HA in EKS, setup local docker fine
09:04
Dylan
09:04 PM
we did, it now can't discover any other openobserve nodes
Prabhat
Photo of md5-23052f31f8f3c4b1bb3297fbc3a2aec5
Prabhat
09:04 PM
Is this a fresh install?
Dylan
Photo of md5-70a65c96b293c1aafa37eaaefeaefa64
Dylan
09:04 PM
yes
Prabhat
Photo of md5-23052f31f8f3c4b1bb3297fbc3a2aec5
Prabhat
09:04 PM
why don't you upgrade to 0.7.0 in that case
Dylan
Photo of md5-70a65c96b293c1aafa37eaaefeaefa64
Dylan
09:06 PM
how can we get 0.7.0 using your helm chart? We've just followed the docs
Prabhat
Photo of md5-23052f31f8f3c4b1bb3297fbc3a2aec5
Prabhat
09:06 PM
oh, my fault
09:06
Prabhat
09:06 PM
I haven't updated the helm chart yet
09:06
Prabhat
09:06 PM
Let me do that

1

Dylan
Photo of md5-70a65c96b293c1aafa37eaaefeaefa64
Dylan
09:06 PM
Haha thanks 😆
Prabhat
Photo of md5-23052f31f8f3c4b1bb3297fbc3a2aec5
Prabhat
09:14 PM
I updated the helm chart. You can simply update the tag in values.yaml file and do a helm upgrade

1

09:14
Prabhat
09:14 PM
image:
  repository: public.ecr.aws/zinclabs/openobserve
  pullPolicy: IfNotPresent
  # Overrides the image tag whose default is the chart appVersion.
  tag: "0.7.0"
Dylan
Photo of md5-70a65c96b293c1aafa37eaaefeaefa64
Dylan
09:14 PM
Thanks will do
Prabhat
Photo of md5-23052f31f8f3c4b1bb3297fbc3a2aec5
Prabhat
09:14 PM
helm repo update
helm -n openobserve -f values.yaml upgrade --install zo1 openobserve/openobserve
Dylan
Photo of md5-70a65c96b293c1aafa37eaaefeaefa64
Dylan
09:18 PM
looks like that fixed this, tyvm Prabhat!

1

09:18
Dylan
09:18 PM
real fast
Nov 15, 2023 (2 weeks ago)
Hengfei
Photo of md5-c30bb074b7d997d2cd6e689678b65dc1
Hengfei
01:18 AM
Prabhat That is bug, need to upgrade to 0.7.0

OpenObserve

OpenObserve is an open-source, petabyte-scale observability platform for the cloud native realm, offering a 10x cost reduction and 140x less storage use compared to competitors like Elasticsearch or Splunk. Built in Rust for exceptional performance, it offers comprehensive features like logs, metrics, traces, dashboards, and more | Knowledge Base powered by Struct.AI

Indexed 404 threads (74% resolved)

Join Our Community