High Memory Usage Issue in OpenObserve 0.5.1
TLDR Chris reported high memory usage with OpenObserve 0.5.1 when making specific queries, causing out of memory errors. Hengfei suggested attempting without conditions. Chris will raise an official issue regarding the problem.
Jul 27, 2023 (4 months ago)
Chris
01:09 PMHengfei
01:14 PMChris
01:14 PMHengfei
01:17 PM[db::file_list::remote] Load file_list [file_list/] begin
[db::file_list::remote] Load file_list [file_list/] gets 670 files
[db::file_list::remote] Load file_list [file_list/] load 670:565100 done
Chris
01:18 PM[2023-07-27T13:04:12Z INFO openobserve::service::db::file_list::remote] Load file_list [file_list/] gets 11410 files
[2023-07-27T13:04:14Z INFO openobserve::service::db::file_list::remote] Load file_list [file_list/] load 11410:230511 done
Hengfei
01:21 PMAnd what is the memory usage before and after the query.
Hengfei
01:21 PMChris
01:22 PMChris
01:29 PM[2023-07-27T13:02:55Z INFO openobserve::service::search::sql] sqlparser: stream_name -> "redacted", fields -> [], partition_key -> [("device_name", "redacted", Eq)], full_text -> [], time_range -> Som
e((1690376575921000, 1690462975921000)), order_by -> [], limit -> 0,150
[2023-07-27T13:02:55Z INFO tracing::span] service:search:grpc:in_wal;
[2023-07-27T13:02:55Z INFO tracing::span] service:search:grpc:in_storage;
[2023-07-27T13:02:55Z INFO tracing::span] service:search:grpc:storage:enter;
[2023-07-27T13:02:55Z INFO openobserve::service::search::grpc::storage] search->storage: org redacted, stream redacted, load files 2143, scan_size 523899602002, compressed_size 21766337881
[2023-07-27T13:02:55Z INFO tracing::span] service:search:grpc:storage:datafusion;
[2023-07-27T13:02:55Z INFO openobserve::service::search::datafusion::exec] Query sql: select * FROM tbl WHERE (_timestamp >= 1690376575921000 AND _timestamp < 1690462975921000) AND device_name='redacted' ORDER BY _timestamp DESC LIMIT 150
[2023-07-27T13:02:55Z INFO tracing::span] datafusion::storage::nocache::list;
[2023-07-27T13:02:56Z INFO openobserve::service::search] search->grpc: result node: 2, is_querier: false, total: 0, took: 992, files: 5, scan_size: 101
[2023-07-27T13:02:57Z INFO openobserve::service::search] search->grpc: result node: 3, is_querier: false, total: 0, took: 1404, files: 5, scan_size: 104
Chris
01:36 PM[2023-07-27T13:02:55Z INFO openobserve::service::search::grpc::storage] search->storage: org redacted, stream redacted, load files 2142, scan_size 212242226193, compressed_size 9408088229
[2023-07-27T13:02:56Z INFO openobserve::service::search::grpc::storage] search->storage: org redacted, stream redacted, load files 2142, into memory cache done
Chris
01:55 PMHengfei
01:55 PMselect * FROM tbl WHERE (_timestamp >= 1690376575921000 AND _timestamp < 1690462975921000) AND device_name='redacted' ORDER BY _timestamp DESC LIMIT 150
this is your final query.
Hengfei
01:56 PMHengfei
01:56 PMChris
02:05 PM[2023-07-27T13:57:08Z INFO openobserve::service::search::sql] sqlparser: stream_name -> "redacted", fields -> [], partition_key -> [], full_text -> [], time_range -> Some((1690379828171000, 1690466228171000)), order_by -> [], limit -> 0,150
[2023-07-27T13:57:08Z INFO openobserve::service::search::grpc::storage] search->storage: org redacted, stream redacted, load files 2066, scan_size 504941830625, compressed_size 20971616709
[2023-07-27T13:57:08Z INFO openobserve::service::search::sql] sqlparser: stream_name -> "redacted", fields -> [], partition_key -> [], full_text -> [], time_range -> Some((1690379828171000, 1690466228171000)), order_by -> [], limit -> 0,150
[2023-07-27T13:57:08Z INFO openobserve::service::search::grpc::storage] search->storage: org redacted, stream redacted, load files 2065, scan_size 236777136354, compressed_size 10373784759
Hengfei
02:05 PMChris
02:06 PMChris
02:08 PMHengfei
02:34 PMbut with condition, need to load all files to search data, we know the 21GB is after compress, when we search data, we need to uncompress something, the original size is 504941830625, 500GB, even we don't need 500GB, but after uncompress it need more than 21GB memory, so 64GB - 20GB(for cache)-21GB(download files)=20GB, there is only 20GB can use, for uncompress data, maybe it cause OOM.
Can you help create an issue for your case, we will improve it. like do search by partition, every time we only load 10GB data, and search multiple times, in the end, we merge partition search result. it should be work, search large data scale in limit memory usage.
Chris
03:27 PMThinking more though.... i would guess that most searches are ordered by time.... and i have often noticed that most of the search time is spent determining the number of events in the time period for the histogram and total.... often we don't need either of those, and if the need to get those could be switched off (merely turning off the histogram doesn't stop it counting), then just locating the first n events could be hugely faster? (possibly cache the first few hundred so you have several screens) .... which might affect the scale/operation of what you describe above?
Jul 28, 2023 (4 months ago)
Hengfei
01:46 AMHengfei
01:46 AMHengfei
01:47 AMOpenObserve
Indexed 404 threads (74% resolved)
Similar Threads
Query and Issue with Disparity in Stream Stats and Disk Usage
Karan shared a query and noted a disparity in stream stats and disk usage. After troubleshooting, Hengfei identified a bug and advised on stats refresh. Ashish confirmed that the disk size was the accurate measure and they would investigate the stats calculation.
Erroneous Triggering of Alarm in 0.5.1
Chris expressed experiencing occasional erroneous alerts trigger on a specific stream query. Ashish suggested the issue might be linked to a known duplicates bug. Uncertain, Chris decided to monitor the issue further and report if it persists.
Troubleshooting High CPU and Memory Usage in Zinc Service
Zygimantas experienced high CPU and memory usage in zinc service while querying large datasets. Hengfei offered suggestions to optimize and test for local disk and S3 usage. Gaby and Prabhat discussed SIMD tag performance.
ZO Kubernetes Issues: Query Log Error and Adding Users
Sa had issues with querying logs in ZO on Kubernetes and creating users. Hengfei provided solutions for both problems, including updating with a dev version and setting memory cache values.
OpenObserve issues with FluentBit and Dashboard
Alejandro experienced issues with FluentBit losing connection with OpenObserve and discarding logs, and an error when saving a chart on the OpenObserve dashboard. Prabhat could not identify the cause of record loss. However, potential solutions were suggested to save the dashboard with a string-type filter instead of integer one.