Improving Queries Performance in Large Data

TLDR Joaquin sought advice to improve query speed on a large dataset. Prabhat suggested using partitioning to reduce the search space and increase speed.

Photo of Joaquin
Joaquin
Fri, 06 Oct 2023 09:27:44 UTC

Hi folks! I would like any pointers that you might have regarding increasing the performance of queries. Let me describe the situation, right now we're ingesting about 15TB of logs per month and searching logs for anything past a few hours is super slow. I've tried this: • Increasing resources for querier nodes, right now I'm running 4 instances with 4vCPU and 10GB RAM on K8s. • Adding a persitent volume of 50GB for disk cache and increased the percentage from 50% utilization to 48GB. Do you have any suggestion on how to improve it?

Photo of Prabhat
Prabhat
Fri, 06 Oct 2023 10:56:29 UTC

You should leverage partitioning

Photo of Prabhat
Prabhat
Fri, 06 Oct 2023 10:58:25 UTC

E.g. you can partition by namespace. After you partition by namespace you can use namespace as a search parameter before searching anything else. This will increase performance.

Photo of Prabhat
Prabhat
Fri, 06 Oct 2023 10:58:56 UTC

The idea is that you reduce the search space.

Photo of Prabhat
Prabhat
Fri, 06 Oct 2023 11:00:12 UTC

This will work for new data going forward after enabling partitioning.

Photo of Joaquin
Joaquin
Fri, 06 Oct 2023 12:21:48 UTC

Perfect, thanks, I'll check which field gives me the best partitioning