Encountering Issues with openobserve Deployment in HA Mode

TLDR Shashank is having trouble enabling mtls with Istio in their openobserve deployment. Hengfei suggested the issue could be with how Istio proxies etcd communication. Prabhat confirmed ongoing istio-related problems and West detailed issues in strict ISTIO mode.

Photo of Shashank
Shashank
Tue, 21 Nov 2023 06:09:45 UTC

Hi Team, I have deployed openobserve in HA mode. I have security use case where I need to enable mtls(using istio). When I do so calls like: ``` ``` starts failing with status code 503. Now as per this istio doc: . I have added grpc port in following services: (compactor, ingester, querier, router), but no luck with that !! ```ports: - name: grpc port: 5081 protocol: TCP targetPort: grpc``` I am continuously getting below error: ```[2023-11-20T22:20:13Z DEBUG tonic::codec::decode] decoder inner stream error: Status { code: Internal, message: "h2 protocol error: error reading a body from connection: stream error received: not a result of an error", source: Some(hyper::Error(Body, Error { kind: Reset(StreamId(13), NO_ERROR, Remote) })) } [2023-11-20T22:20:13Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/nodes/, get message error: grpc request error: status: Internal, message: "h2 protocol error: error reading a body from connection: stream error received: not a result of an error", details: [], metadata: MetadataMap { headers: {} } [2023-11-20T22:20:13Z DEBUG tower::balance::p2c::service] updating from discover [2023-11-20T22:20:13Z DEBUG tower::buffer::worker] service.ready=true message=processing request [2023-11-20T22:20:13Z DEBUG hyper::proto::h2::client] client request body error: error writing a body to connection: send stream capacity unexpectedly closed [2023-11-20T22:20:13Z DEBUG hyper::proto::h2::client] client request body error: error writing a body to connection: send stream capacity unexpectedly closed [2023-11-20T22:20:13Z DEBUG tonic::codec::decode] decoder inner stream error: Status { code: Internal, message: "h2 protocol error: error reading a body from connection: stream error received: not a result of an error", source: Some(hyper::Error(Body, Error { kind: Reset(StreamId(23), NO_ERROR, Remote) })) } [2023-11-20T22:20:13Z ERROR openobserve::common::infra::db::etcd] watching prefix: /zinc/observe/user/, get message error: grpc request error: status: Internal, message: "h2 protocol error: error reading a body from connection: stream error received: not a result of an error", details: [], metadata: MetadataMap { headers: {} }``` Can someone pls helm me here ?

Photo of Hengfei
Hengfei
Tue, 21 Nov 2023 06:50:18 UTC

Yep, i am not sure how to resolve it. it looks istio proxied etcd communication.

Photo of Hengfei
Hengfei
Tue, 21 Nov 2023 06:50:25 UTC

etcd use port: 2379

Photo of Prabhat
Prabhat
Tue, 21 Nov 2023 12:17:23 UTC

We have a known problem around istio. Need more time to study and find a resolution.

Photo of West
West
Wed, 22 Nov 2023 10:59:38 UTC

More information on this issue : Quarrier and Ingester Services are failing in ISTIO strict mode , rest of the services are fine with mTLS `curl -u :xyz -k -d '[{"level":"info","job":"test","log":"test message for openobserve"}]' -vvv` % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 20.75.176.101:443... * Connected to (20.75.176.101) port 443 (#0) 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* ALPN: offers h2 * ALPN: offers http/1.1 } [5 bytes data] * TLSv1.3 (OUT), TLS handshake, Client hello (1): } [512 bytes data] * TLSv1.3 (IN), TLS handshake, Server hello (2): { [122 bytes data] * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): { [15 bytes data] * TLSv1.3 (IN), TLS handshake, Certificate (11): { [4024 bytes data] * TLSv1.3 (IN), TLS handshake, CERT verify (15): { [264 bytes data] * TLSv1.3 (IN), TLS handshake, Finished (20): { [52 bytes data] * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): } [1 bytes data] * TLSv1.3 (OUT), TLS handshake, Finished (20): } [52 bytes data] * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 * ALPN: server accepted h2 * Server certificate: * subject: CN= * start date: Nov 2 11:34:53 2023 GMT * expire date: Jan 31 11:34:52 2024 GMT * issuer: C=US; O=Let's Encrypt; CN=R3 * SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway. * Using HTTP2, server supports multiplexing * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 } [5 bytes data] * Server auth using Basic with user '' * h2h3 [:method: POST] * h2h3 [:path: /api/default/default/_json] * h2h3 [:scheme: https] * h2h3 [:authority: ] * h2h3 [authorization: Basic YWRtaW5AaHYuY29tOjlQWjI1bFdNeEU4VElsUXk=] * h2h3 [user-agent: curl/7.84.0] * h2h3 [accept: */*] * h2h3 [content-length: 68] * h2h3 [content-type: application/x-www-form-urlencoded] * Using Stream ID: 1 (easy handle 0x267039e16c0) } [5 bytes data] > POST /api/default/default/_json HTTP/2 > Host: > authorization: Basic YWRtaW5AaHYuY29tOjlQWjI1bFdNeEU4VElsUXk= > user-agent: curl/7.84.0 > accept: */* > content-length: 68 > content-type: application/x-www-form-urlencoded > } [5 bytes data] * We are completely uploaded and fine { [5 bytes data] * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): { [230 bytes data] * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): { [230 bytes data] * old SSL session ID is stale, removing { [5 bytes data] * Connection state changed (MAX_CONCURRENT_STREAMS == 2147483647)! } [5 bytes data] < HTTP/2 503 < content-length: 95 < content-type: text/plain < server: istio-envoy < date: Wed, 22 Nov 2023 10:49:38 GMT < x-envoy-upstream-service-time: 49 < { [95 bytes data] 100 163 100 95 100 68 111 79 --:--:-- --:--:-- --:--:-- 191upstream connect error or disconnect/reset before headers. reset reason: connection termination * Connection #0 to host left intact

Photo of West
West
Wed, 22 Nov 2023 11:00:29 UTC

2023-11-22T15:59:52+05:30 2023-11-22T10:29:52.153248Z ERROR openobserve::service::db::file_list::broadcast: [broadcast] send event to node[4cab14ff-283b-43ba-afea-5b636f0bfd20] failed: status: Unavailable, message: "upstream connect error or disconnect/reset before headers. reset reason: connection termination", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "server": "envoy", "date": "Wed, 22 Nov 2023 10:29:51 GMT"} }, retrying...