Erroneous Triggering of Alarm in 0.5.1
TLDR Chris expressed experiencing occasional erroneous alerts trigger on a specific stream query. Ashish suggested the issue might be linked to a known duplicates bug. Uncertain, Chris decided to monitor the issue further and report if it persists.
1
Aug 16, 2023 (3 months ago)
Chris
09:08 AMselect count(message) as occurrences from 'streamname' where message like '%ALERT%'
and a condition of occurrences > 0, duration 5 mins, frequency 5 mins, delay 10 mins.
It was set up a week ago, tested ok, and has been ok until last night it fired.... there certainly isn't anything in the stream containing ALERT..... i did notice some false triggers when i was experimenting but thought that may have just been due to the rapid changes i was doing.... i include the {timestamp} in the template and it agrees with when the alert triggered
Prabhat
09:41 AMChris
09:43 AMPrabhat
09:44 AMChris
09:45 AMAshish
09:47 AMAshish
09:48 AMChris
09:48 AMAshish
09:49 AMChris
09:50 AMAshish
09:51 AMAshish
09:51 AMAshish
09:53 AMAshish
09:53 AMChris
09:53 AMChris
09:54 AM1
Chris
10:26 AM[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:in_storage;
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:in_wal;
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:in_storage;
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:storage:enter;
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:storage:enter;
[2023-08-15T20:32:16Z INFO openobserve::service::search::grpc::storage] search->storage: org pe, stream streamname, load files 1, scan_size 15170, compressed_size 6814
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:storage:cache_parquet_files;
[2023-08-15T20:32:16Z INFO openobserve::service::search::grpc::storage] search->storage: org pe, stream streamname, load files 4, scan_size 318391, compressed_size 40972
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:storage:cache_parquet_files;
[2023-08-15T20:32:16Z INFO tracing::span] service:search:wal:enter;
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:wal:get_file_list;
[2023-08-15T20:32:16Z INFO tracing::span] service:search:wal:enter;
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:wal:get_file_list;
[2023-08-15T20:32:16Z INFO openobserve::service::search::sql] sqlparser: stream_name -> "streamname", fields -> [], partition_key -> [("host", "host-b", Eq)], full_text -> [], time_range -> Some((1692127336196919, 1692131536196919)), order_by -> [], limit -> 0,100
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:in_wal;
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:in_storage;
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:storage:enter;
[2023-08-15T20:32:16Z INFO openobserve::service::search::grpc::storage] search->storage: org pe, stream streamname, load files 4, scan_size 318391, compressed_size 40972
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:storage:cache_parquet_files;
[2023-08-15T20:32:16Z INFO tracing::span] service:search:wal:enter;
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:wal:get_file_list;
[2023-08-15T20:32:16Z INFO openobserve::service::search::grpc::wal] wal->search: load files 1, scan_size 19023
[2023-08-15T20:32:16Z INFO openobserve::service::search::grpc::wal] wal->search: load files 1, scan_size 19023
[2023-08-15T20:32:16Z INFO openobserve::service::search::grpc::wal] wal->search: load files 1, scan_size 19023
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:wal:datafusion;
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:wal:datafusion;
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:wal:datafusion;
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query sql: select count(*) as occurrences FROM tbl where (_timestamp >= 1692127336196919 AND _timestamp < 1692131536196919) AND host = 'host-b' LIMIT 100
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query sql: select count(*) as occurrences FROM tbl where (_timestamp >= 1692127336196941 AND _timestamp < 1692131536196941) AND host = 'host-c' LIMIT 100
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query sql: select count(message) as occurrences FROM tbl where (_timestamp >= 1692131236196967 AND _timestamp < 1692131536196967) AND message like '%ALARM%' LIMIT 100
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query took 0.001 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query all took 0.002 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query took 0.002 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query all took 0.002 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query took 0.003 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query all took 0.003 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::grpc::storage] search->storage: org pe, stream streamname, load files 1, into memory cache done
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:storage:datafusion;
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query sql: select count(message) as occurrences FROM tbl where (_timestamp >= 1692131236196967 AND _timestamp < 1692131536196967) AND message like '%ALARM%' LIMIT 100
[2023-08-15T20:32:16Z INFO openobserve::service::search::grpc::storage] search->storage: org pe, stream streamname, load files 4, into memory cache done
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:storage:datafusion;
[2023-08-15T20:32:16Z INFO openobserve::service::search::grpc::storage] search->storage: org pe, stream streamname, load files 4, into memory cache done
[2023-08-15T20:32:16Z INFO tracing::span] service:search:grpc:storage:datafusion;
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query sql: select count(*) as occurrences FROM tbl where (_timestamp >= 1692127336196941 AND _timestamp < 1692131536196941) AND host = 'host-c' LIMIT 100
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query sql: select count(*) as occurrences FROM tbl where (_timestamp >= 1692127336196919 AND _timestamp < 1692131536196919) AND host = 'host-b' LIMIT 100
[2023-08-15T20:32:16Z INFO tracing::span] datafusion::storage::memory::list;
[2023-08-15T20:32:16Z INFO tracing::span] datafusion::storage::memory::list;
[2023-08-15T20:32:16Z INFO tracing::span] datafusion::storage::memory::list;
Chris
10:27 AM[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query took 0.002 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query took 0.002 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query all took 0.002 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query all took 0.002 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query took 0.001 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query all took 0.002 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] merge_write_recordbatch took 0.000 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] merge_write_recordbatch took 0.000 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] merge_write_recordbatch took 0.000 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] merge_rewrite_sql took 0.000 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] merge_rewrite_sql took 0.000 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] merge_rewrite_sql took 0.000 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Merge sql: SELECT sum("occurrences") as "occurrences" FROM tbl LIMIT 100
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Merge sql: SELECT sum("occurrences") as "occurrences" FROM tbl LIMIT 100
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Merge sql: SELECT sum("occurrences") as "occurrences" FROM tbl LIMIT 100
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Merge took 0.001 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Merge took 0.001 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Merge took 0.001 seconds.
[2023-08-15T20:32:16Z INFO openobserve::service::search] search->grpc: result node: 1, is_querier: true, total: 0, took: 19, files: 5, scan_size: 0
[2023-08-15T20:32:16Z INFO openobserve::service::search] search->grpc: result node: 1, is_querier: true, total: 0, took: 19, files: 2, scan_size: 0
[2023-08-15T20:32:16Z INFO openobserve::service::search] search->grpc: result node: 1, is_querier: true, total: 0, took: 19, files: 5, scan_size: 0
[2023-08-15T20:32:16Z INFO openobserve::service::search] search_in_cluster: query num_batches: 1
[2023-08-15T20:32:16Z INFO openobserve::service::search] search_in_cluster: query num_batches: 1
[2023-08-15T20:32:16Z INFO openobserve::service::search] search_in_cluster: query num_batches: 1
[2023-08-15T20:32:16Z INFO openobserve::service::search] search->result: total: 1, took: 21, scan_size: 0
[2023-08-15T20:32:16Z INFO openobserve::service::search] search->result: total: 1, took: 21, scan_size: 0
[2023-08-15T20:32:16Z INFO openobserve::service::search] search->result: total: 1, took: 21, scan_size: 0
[2023-08-15T20:32:16Z INFO tracing::span] save_trigger;
Ashish
10:32 AM[2023-08-15T20:32:16Z INFO openobserve::service::search::datafusion::exec] Query sql: select count(message) as occurrences FROM tbl where (_timestamp >= 1692131236196967 AND _timestamp < 1692131536196967) AND message like '%ALARM%' LIMIT 100
if we check this log…. having time 2023-08-15T20:32:16Z query is appropriately fired for last 5 mins of duration
Chris
10:33 AMAshish
10:34 AMAshish
10:35 AMChris
10:41 AMChris
10:43 AMAshish
10:50 AMChris
10:54 AMChris
10:57 AM{
"subject": "{alert_name} detected",
"message": "{alert_name} was detected at {timestamp}"
}
and the text from the one last night was
Alarm was detected at 1692131536219058
Ashish
10:59 AMChris
11:01 AMChris
11:04 AMOpenObserve
Indexed 404 threads (74% resolved)
Similar Threads
High Memory Usage Issue in OpenObserve 0.5.1
Chris reported high memory usage with OpenObserve 0.5.1 when making specific queries, causing out of memory errors. Hengfei suggested attempting without conditions. Chris will raise an official issue regarding the problem.
OpenObserve issues with FluentBit and Dashboard
Alejandro experienced issues with FluentBit losing connection with OpenObserve and discarding logs, and an error when saving a chart on the OpenObserve dashboard. Prabhat could not identify the cause of record loss. However, potential solutions were suggested to save the dashboard with a string-type filter instead of integer one.
Synchronizing MySQL table with OpenObserve
dlonghi asked how to synchronize a MySQL table with OpenObserve. Prabhat advised using the same `log_id` field, and addressed a potential duplicity concern, though deletion is still not available. A feature request was created for this issue.
Issues with openobserve(v0.5.0) API Response Code
Shashank is facing issues with an API call getting a bad response code. Hengfei indicated the query may be incorrect and suggested an upgrade. Prabhat showed typical stream page queries. The issue is unresolved.
Issue with Alerts on Metric "testing" not Triggering
Mark encountered an issue with alert creation on "testing" metric, and Ashish identified a bug, which was fixed in the released version v0.4.2.