Queries can either be "Direct" or the results of a query can be cached. Direct mode will always execute the query directly against the database live when an associated visualization is displayed. The associated load time is directly proportional to the time it takes for the query to execute.
Alternatively, the query results can be stored into the Knowi "ElasticStore" seamlessly. This is particularly useful in the following cases:
- Long running queries
- Reduce load on your database for reporting workloads by offloading it to the ElasticStore
- The results can be used as a parent dataset to other derived queries off this dataset. Useful in cases where the resulting dataset tracked in the ElasticStore runs into the millions of records
- In cases with large raw datasets in the underlying database, this can be used to incrementally update the ElasticStore for that dataset without running the entire historical query
In Warehousing Strategy section, check Direct Query for direct. By default, the results will be stored into the ElasticStore. If ElasticStore is used, specify a schedule for asynchronous query execution.
For more details, see Elastic Store documentation.
Once ready, click on Save to just save the query or hit Save & Run Now button to immediately execute the query.
With ElasticStore usage, Overwrite strategy provides you powerful control on how the data is updated for the dataset when the query is run. The values can be one of the following:
i. All will replace the existing values in the store for this dataset with new incoming values during query execution.
ii. Empty will leave the existing results as is and append incoming data to it.
iii. One or more field names, comma separated: Will update the existing dataset with new values for the same values of the fields, insert new records when a match is not found and leave existing recrods as is. This is useful for incrementally updating the dataset. For example, let's say the raw data that a query operates on has 100's of millions of events, and say the query is doing a grouping by count, date and event type. In this case, trolling through the entire 100's of millions of records each time the query runs can induce loads into your system. You can structure the query in the to something like the following:
select count(*) as Totals, date(eventTime) as Date, EventType
where eventTime >= $c9_yesterday
group by EventType, date(eventTime)
with Overwrite Strategy of Date, EventType and scheduled every few hours. In this example, it'll query only data from yesterday and the results are then upserted into the dataset.