Utilizing VRL for Log Grouping and Searching in ETL Jobs

TLDR West requested help with log group and searching within an ETL context. Prabhat recommended using VRL at the ingestion level and clarified its limitations.

Photo of West
West
Wed, 27 Sep 2023 17:21:59 UTC

Hi Team , is there any way to group and search the logs based on some attribute and keep them as a temporary view or buffer and perform search operation on the view?

Photo of Prabhat
Prabhat
Wed, 27 Sep 2023 17:23:04 UTC

You could use VRL during search if that helps. It is considerably slower but it should provide you with what you need.

Photo of West
West
Wed, 27 Sep 2023 17:26:21 UTC

Will it be faster if we apply VRL at ingestion level ? pls , do you have any example that will help us more

Photo of Prabhat
Prabhat
Wed, 27 Sep 2023 17:27:00 UTC

Yes, using VRL at ingestion time is the recommended way to use it for now while we work on making VRL performant at query time.

Photo of Prabhat
Prabhat
Wed, 27 Sep 2023 17:27:59 UTC

Check this for an example -

Photo of West
West
Wed, 27 Sep 2023 17:34:53 UTC

I understand VRL function will takes log record/row as an input and it will help to enrich the log entry as an iterative process , pls correct me if I am wrong. My requirement is something like this, select all log entries based on some attribute -> on top of this result perform one more select (selection on top of the result)

Photo of Prabhat
Prabhat
Wed, 27 Sep 2023 17:35:29 UTC

give me an example

Photo of Prabhat
Prabhat
Wed, 27 Sep 2023 17:35:41 UTC

enrichment is just one use case for VRL

Photo of West
West
Wed, 27 Sep 2023 17:52:03 UTC

We have some ETL jobs which will emit logs with unique job id with start time (when job started), similarly we will have log entry with end time also (when job ends), these jobs are multi step jobs for each step we would have status also , If I need to get the job completion time based on start time and end time of job, How do I get . Example Logs something like below startTime=2023-09-21T11:48:17.672252900, jobName=Producer, userContext=xyz, Steps=1,jobId=bf0db59b-6bcd-4b9a-b0a2-bc8ddccd1fb3,status=progressing jobName=Producer, userContext=xyz, Steps=2, jobId=bf0db59b-6bcd-4b9a-b0a2-bc8ddccd1fb3,status=progressing jobName=Producer, userContext=xyz, Steps=3, jobId=bf0db59b-6bcd-4b9a-b0a2-bc8ddccd1fb3,status=progressing jobName=Producer, userContext=xyz, Steps=4, jobId=bf0db59b-6bcd-4b9a-b0a2-bc8ddccd1fb3,status=progressing endTime=2023-09-21T12:48:17.672252900, jobName=Producer, userContext=xyz, Steps=3, jobId=bf0db59b-6bcd-4b9a-b0a2-bc8ddccd1fb3,status=complete

Photo of Prabhat
Prabhat
Wed, 27 Sep 2023 19:57:31 UTC

is this a single log record? or multiple records?

Photo of West
West
Thu, 28 Sep 2023 04:00:42 UTC

Multiple

Photo of Prabhat
Prabhat
Thu, 28 Sep 2023 04:24:52 UTC

you will need subqueries and joins that OpenObserve does not support yet. We do have plans to support these, but will take a bit.

Photo of West
West
Thu, 28 Sep 2023 04:25:38 UTC

Thank you

Photo of West
West
Thu, 28 Sep 2023 04:26:44 UTC

Can we deal with VRL for this use case ?

Photo of Prabhat
Prabhat
Thu, 28 Sep 2023 04:28:02 UTC

no. VRL works at a single record level. It can take 1 record in and 1 record out. or 1 record in and multiple records out. It can't do multiple records in and 1 record out yet.