ZO Kubernetes Issues: Query Log Error and Adding Users

TLDR Sa had issues with querying logs in ZO on Kubernetes and creating users. Hengfei provided solutions for both problems, including updating with a dev version and setting memory cache values.

Photo of Sa
Sa
Tue, 23 May 2023 10:10:29 UTC

I tried to setup ZO on kubernetes with helm charts. The installation is successful and I can push log using fluentbit too. But, now I cannot query log in the dashboard.

Photo of Ashish
Ashish
Tue, 23 May 2023 10:17:06 UTC

where are you stuck?

Photo of Ashish
Ashish
Tue, 23 May 2023 10:17:18 UTC

getting any error

Photo of Ashish
Ashish
Tue, 23 May 2023 10:17:25 UTC

please share more details

Photo of Sa
Sa
Tue, 23 May 2023 10:36:28 UTC

I cannot query log at here. The request is error.

Photo of Sa
Sa
Tue, 23 May 2023 10:37:04 UTC

already have data

Photo of Ashish
Ashish
Tue, 23 May 2023 10:37:25 UTC

can you check querier pod log

Photo of Ashish
Ashish
Tue, 23 May 2023 10:37:40 UTC

and share what error does querier gives

Photo of Sa
Sa
Tue, 23 May 2023 10:40:07 UTC

yes. this is the querier log.

Photo of Sa
Sa
Tue, 23 May 2023 10:40:23 UTC

There is no error trace in querier.

Photo of Ashish
Ashish
Tue, 23 May 2023 10:41:58 UTC

checking..will get back to you

Photo of Hengfei
Hengfei
Tue, 23 May 2023 10:42:23 UTC

this not error. just shows that memory cache is full, drop something and then can cache new file.

Photo of Hengfei
Hengfei
Tue, 23 May 2023 10:42:46 UTC

and at the end, the query give result with 114s

Photo of Hengfei
Hengfei
Tue, 23 May 2023 10:43:38 UTC

Do you try to query the all data? 88GB?

Photo of Hengfei
Hengfei
Tue, 23 May 2023 10:44:07 UTC

i guess, your node have 8GB memory.

Photo of Ashish
Ashish
Tue, 23 May 2023 10:46:39 UTC

can you go to steams page

Photo of Ashish
Ashish
Tue, 23 May 2023 10:46:47 UTC

and share the details of stream

Photo of Sa
Sa
Tue, 23 May 2023 10:58:28 UTC

I only query logs from the last 15 mins

Photo of Sa
Sa
Tue, 23 May 2023 10:58:57 UTC

I have 1 pod for querier and the node for this pod is 16GB Ram

Photo of Hengfei
Hengfei
Tue, 23 May 2023 11:00:16 UTC

the logs shows it not only for 15 minutes. can you try search agian, and check what is in the query logs. it will give the query detail.

Photo of Hengfei
Hengfei
Tue, 23 May 2023 11:00:41 UTC

Do you have alerts?

Photo of Sa
Sa
Tue, 23 May 2023 11:02:44 UTC

What is the alert ?

Photo of Sa
Sa
Tue, 23 May 2023 11:03:32 UTC

past 15 mins

Photo of Sa
Sa
Tue, 23 May 2023 11:04:51 UTC

Photo of Sa
Sa
Tue, 23 May 2023 11:04:54 UTC

the log output

Photo of Sa
Sa
Tue, 23 May 2023 11:05:00 UTC

when I perform query

Photo of Sa
Sa
Tue, 23 May 2023 11:05:15 UTC

no result

Photo of Hengfei
Hengfei
Tue, 23 May 2023 11:07:03 UTC

What is your deploy values for query? you set memory cache is 100MB?

Photo of Hengfei
Hengfei
Tue, 23 May 2023 11:08:24 UTC

or, you eanbeld `ZO_MEMORY_CACHE_CACHE_LATEST_FILES`

Photo of Sa
Sa
Tue, 23 May 2023 11:11:44 UTC

I'm using default value from helm charts

Photo of Sa
Sa
Tue, 23 May 2023 11:12:34 UTC

this helm chart value

Photo of Ashish
Ashish
Tue, 23 May 2023 11:14:05 UTC

can you go to steams menu -> click

Photo of Ashish
Ashish
Tue, 23 May 2023 11:14:18 UTC

a side panel will open

Photo of Ashish
Ashish
Tue, 23 May 2023 11:14:26 UTC

please share screen shot of it

Photo of Hengfei
Hengfei
Tue, 23 May 2023 11:18:40 UTC

```[2023-05-23T11:03:44Z INFO zincobserve::service::search::grpc::storage] [TRACE] storage->search: load files 339 done [2023-05-23T11:03:44Z INFO tracing::span] service:search:storage:group_and_calc_files_size; [2023-05-23T11:03:44Z INFO tracing::span] service:file_list:calculate_files_size; [2023-05-23T11:03:44Z INFO zincobserve::service::search::grpc::storage] [TRACE] storage->search: load files 339, scan_size 2986565933``` as this logs, for 15 minutes, it need load 339 parquet files, default our configure is 32MB per file. it should be 10GB, the but scan_size give us 2.8GB. and the log ```[2023-05-23T11:03:44Z INFO zincobserve::infra::cache::file_data] [TRACE] File cache is full 105101150/104857600, can't cache 721844 bytes``` tell use the total memory cache size is: 104857600 = 100MB

Photo of Hengfei
Hengfei
Tue, 23 May 2023 11:20:10 UTC

did you add some resource limit for the querier pod?

Photo of Sa
Sa
Tue, 23 May 2023 14:29:09 UTC

this is the stream captured. Ashish

Photo of Sa
Sa
Tue, 23 May 2023 14:29:46 UTC

Hengfei there is no limit for querier pod

Photo of Sa
Sa
Tue, 23 May 2023 14:31:14 UTC

Don't know why ZO need to download all log contents from S3. I think the log content can paginate loading when perform query

Photo of Hengfei
Hengfei
Tue, 23 May 2023 16:45:50 UTC

it should be. but the logs shows something not as we expect. like i said, the memory cache was set only 100MB.

Photo of Hengfei
Hengfei
Tue, 23 May 2023 16:46:47 UTC

it should only load the files in the 15 minutes.

Photo of Hengfei
Hengfei
Tue, 23 May 2023 16:51:17 UTC

can you try add some config: ```ZO_MEMORY_CACHE_ENABLED=true ZO_MEMORY_CACHE_CACHE_LATEST_FILES=false ZO_MEMORY_CACHE_MAX_SIZE=4096```

Photo of Hengfei
Hengfei
Tue, 23 May 2023 16:51:49 UTC

those config means: enable memory cache for query, and set max memory cache is 4GB

Photo of Sa
Sa
Wed, 24 May 2023 02:14:00 UTC

I will try to set these values

Photo of Sa
Sa
Wed, 24 May 2023 02:17:54 UTC

there is no loading log any more

Photo of Hengfei
Hengfei
Wed, 24 May 2023 02:24:02 UTC

can you share the logs of ingester, to check the ingest is working

Photo of Sa
Sa
Wed, 24 May 2023 03:10:21 UTC

oh. my agent is stopped. I will restart now

Photo of Sa
Sa
Wed, 24 May 2023 08:44:54 UTC

Photo of Sa
Sa
Wed, 24 May 2023 08:44:59 UTC

this is the error right ?

Photo of Hengfei
Hengfei
Wed, 24 May 2023 08:57:07 UTC

this is an error, which version do you use? 0.4.3?

Photo of Sa
Sa
Wed, 24 May 2023 08:58:06 UTC

yes. the latest version

Photo of Hengfei
Hengfei
Wed, 24 May 2023 09:22:30 UTC

i will give you a dev version for fix this issue.

Photo of Hengfei
Hengfei
Wed, 24 May 2023 10:03:23 UTC

Can you try this image: ```public.ecr.aws/zinclabs/zincobserve-dev:v0.4.3-dd87ab7-amd64```

Photo of Sa
Sa
Wed, 24 May 2023 10:07:58 UTC

yes. let's me try

Photo of Sa
Sa
Wed, 24 May 2023 10:08:01 UTC

upgrade the querier ?

Photo of Hengfei
Hengfei
Wed, 24 May 2023 10:08:25 UTC

you use helm, right? you can upgrade all pods.

Photo of Sa
Sa
Wed, 24 May 2023 10:35:31 UTC

the error still there

Photo of Hengfei
Hengfei
Wed, 24 May 2023 10:36:33 UTC

Ha...

Photo of Hengfei
Hengfei
Wed, 24 May 2023 10:36:42 UTC

Thanks, let me check

Photo of Sa
Sa
Wed, 24 May 2023 10:39:09 UTC

ok

Photo of Hengfei
Hengfei
Wed, 24 May 2023 12:08:44 UTC

```public.ecr.aws/zinclabs/zincobserve-dev:v0.4.3-2e0d7a7-amd64```

Photo of Hengfei
Hengfei
Wed, 24 May 2023 12:08:51 UTC

Can you try this version?

Photo of Sa
Sa
Wed, 24 May 2023 14:05:19 UTC

yes

Photo of Sa
Sa
Wed, 24 May 2023 14:13:48 UTC

try to query the last 2 days but there is no result

Photo of Hengfei
Hengfei
Wed, 24 May 2023 14:21:17 UTC

last 2 days maybe have a lot of data, need more resource.

Photo of Hengfei
Hengfei
Wed, 24 May 2023 14:21:29 UTC

can you try query last 5 minutes, 15 minutes, 1 hours.

Photo of Hengfei
Hengfei
Wed, 24 May 2023 14:21:37 UTC

Does it have result?

Photo of Hengfei
Hengfei
Wed, 24 May 2023 14:22:03 UTC

first, let's confirm the query can work. then let's talk about query all data.

Photo of Sa
Sa
Wed, 24 May 2023 14:27:20 UTC

I have to change to another node agent. because the log of current node is too large (2GB/h)

Photo of Sa
Sa
Wed, 24 May 2023 14:27:37 UTC

I will inform you the result later

Photo of Hengfei
Hengfei
Wed, 24 May 2023 14:35:37 UTC

2GB/h no problem, in our own test cluster have more than 2TB data, every hour increase 10GB logs

Photo of Hengfei
Hengfei
Wed, 24 May 2023 14:36:31 UTC

Yes, you can use small data size to test.

Photo of Sa
Sa
Thu, 25 May 2023 02:21:27 UTC

Pushing log from other machine with lower log size

Photo of Sa
Sa
Thu, 25 May 2023 02:47:09 UTC

It's working now.

Photo of Sa
Sa
Thu, 25 May 2023 02:47:45 UTC

One more bug, I cannot create a new user .

Photo of Sa
Sa
Thu, 25 May 2023 02:48:07 UTC

there is no button on User page

Photo of Hengfei
Hengfei
Thu, 25 May 2023 02:48:26 UTC

So, the next problem will be when you search more data, it maybe timeout.

Photo of Hengfei
Hengfei
Thu, 25 May 2023 02:48:54 UTC

Do you use the root user login? and what is your organanition?

Photo of Sa
Sa
Thu, 25 May 2023 02:49:04 UTC

I will try to push more data over time (maybe 2 days) then try to query the last 2 days.

Photo of Sa
Sa
Thu, 25 May 2023 02:49:25 UTC

yes. I logged in with the root role in default organization

Photo of Hengfei
Hengfei
Thu, 25 May 2023 02:50:49 UTC

can you give a screenshot for users page.

Photo of Sa
Sa
Thu, 25 May 2023 02:51:19 UTC

here you are.

Photo of Sa
Sa
Thu, 25 May 2023 02:51:32 UTC

there is no button create user.

Photo of Hengfei
Hengfei
Thu, 25 May 2023 02:52:39 UTC

Do you use the version 0.4.3 or the dev version yesterday i given you?

Photo of Sa
Sa
Thu, 25 May 2023 02:54:19 UTC

only querier pod is using your version. other components is 0.4.3

Photo of Hengfei
Hengfei
Thu, 25 May 2023 03:03:01 UTC

no one reported this issue can't create user. i saw you already create 2 users, earlier you can, but now, you can't?

Photo of Sa
Sa
Thu, 25 May 2023 03:03:38 UTC

yes.

Photo of Sa
Sa
Thu, 25 May 2023 03:03:44 UTC

now, I cannot create more user.

Photo of Hengfei
Hengfei
Thu, 25 May 2023 03:04:35 UTC

Can you open the console of chrome, to check if there are some js error report?

Photo of Hengfei
Hengfei
Thu, 25 May 2023 03:04:46 UTC

Or, try to logout and login again?

Photo of Sa
Sa
Thu, 25 May 2023 03:05:40 UTC

I opened the console and there is no js error.

Photo of Sa
Sa
Thu, 25 May 2023 03:05:51 UTC

I logged out and logged in but no different

Photo of Sa
Sa
Thu, 25 May 2023 03:06:53 UTC

using incognito tab but no different

Photo of Hengfei
Hengfei
Thu, 25 May 2023 03:12:51 UTC

Okay, this is a bug, i confirm.

Photo of Hengfei
Hengfei
Thu, 25 May 2023 03:13:15 UTC

Can you help create an issue on github? we will fix it soon.

Photo of Sa
Sa
Thu, 25 May 2023 03:35:54 UTC

Bug about create user ?

Photo of Hengfei
Hengfei
Thu, 25 May 2023 03:36:04 UTC

yes

Photo of Sa
Sa
Thu, 25 May 2023 03:54:19 UTC

here you are

Photo of Hengfei
Hengfei
Thu, 25 May 2023 03:56:16 UTC

thanks

Photo of Hengfei
Hengfei
Thu, 25 May 2023 06:44:32 UTC

we fixed the add user issue, please try this tag: ```public.ecr.aws/zinclabs/zincobserve-dev:v0.4.3-56a4c15-amd64```