Knowi provides an optional Elastic Store that can store and track Query results. Unlike traditional warehouses that require complex ETL processes and pre-defined schema, the Elastic Store is a flexible, scalable, and schema-less warehouse.

Elastic Store offers the following:

Cache/persistent layer for fast analysis and visualizations
Long-running queries
Leaves raw data at the source
Typically used to store query results, results of multi-datasource joins, etc.
Eliminates complex ETL processing
Does not require a predefined schema
Keeps multi-structured data intact
Shields your raw datasources from your BI/reporting queries
Builds aggregations and data pipelines from datasets stored from this store
Reduces the load on your database for reporting workloads by offloading it to the Elastic Store

Note: Using Knowi Elastic Store is optional and can be used only for queries that need it. You can also use the Direct Query Mode, which bypasses this data layer to interact with your raw datasource directly.

Overview

The result of any Query (real-time/direct queries or non-direct queries) is considered a dataset. A dataset is a reusable component that can span multiple datasources/queries that also abstracts the underlying complexity of execution modalities, runtime parameters, and transformations. Further, datasets can be reused as inputs into other queries. For more information, please refer to the documentation- Datasets

You can directly run Queries in real-time in your database or save the Query results into our Elastic Store via Non-direct execution. With Elastic Store usage, the Overwrite Strategy provides control on how the data are updated in the dataset once the Query is run. The options available are as follows:

Upsert: This will replace the existing values for the same key with the latest data and insert new records where such keys do not exist.

Replace All: This will replace all data in the current dataset with the latest run.

Replace All - Include Empty: Same as Replace All, except that in cases where no data are found for the query run, it will update the dataset with no data (whereas Replace All will not update)

TTL - Time-Based Retention: This will skip the records beyond a specific time period for each run. Example: DateField-3m will keep any records where DateField is within 3 months and will drop all prior records for each run.

Append: This will leave the existing data as it is while adding new records for each run. Caveat: This is not common and you may end up with duplicates if the data is the same.

Query A Dataset

You can reuse and query any dataset stored within the ElasticStore (same applied for direct queries).

Step 1: Create a new Knowi Warehouse Datasource (if one does not exist) and start Querying.

#1 Enter a name for your datasource for unique identification.

#2 Optionally, you can check the ACL (Access Control List) which will give you the ability to show/hide datasets shown during queries. You can choose datasets you would like to show on the query page or use Regex to limit indexes shown for more complex use cases.

If the datasource already exists, then you can simply click on the New Query button, select your existing Knowi datasource, and start Querying.

Step 2: Select the Dataset from the dropdown. Fields for the selected dataset will be auto-retrieved for building the Query.

Step 3: Start Querying against the dataset by either using the Visual Builder or by writing the query directly into the Query Editor.

Visual Builder: Directly drag and drop the fields for generating Queries.

Query Editor: Write a query in a SQL-like syntax environment using a versatile text editor designed for editing code.

Reuse Existing Datasets

From any dataset/query, you can then create a new “Child” Query that takes a subset of the original “Parent” query and save it as a new, linked dataset.

Cloud9 Agent Configuration

In addition to the UI based approach, you can also query Knowi using your agent. This can be used to query data in the warehouse, apply custom processors and build derived datasets. For added security, your API key needs to be pre-approved by Cloud9 Charts (with an optional domain match) enable this functionality. Please Contact us to enable this.

Highlights:

Your datasets stored within Cloud9 Charts can be queried and transformed.
Typically, used to aggregate data into another dataset.

Datasource Configuration:

Parameter	Comments
name	Unique Datasource Name.
datasource	Set value to cloud9charts
exportApiKey	Export API key from Cloud9 Charts

Query Configuration:

Query Config Params	Comments
entityName	Dataset Name Identifier
identifier	A unique identifier for the dataset. Either identifier or entityName must be specified.
dsName	Name of the datasource name configured in the datasource_XXX.json file to execute the query against. Required.
c9ExportEntityName	Entity Name of the dataset to pull data from.
queryStr	Required. Cloud9QL query to manipulate data in Cloud9 Charts. See Cloud9QL docs
frequencyType	One of minutes, hours, days,weeks,months. If this is not specified, this is treated as a one time query, executed upon Cloud9Agent startup (or when the query is first saved)
frequency	Indicates the frequency, if frequencyType is defined. For example, if this value is 10 and the frequencyType is minutes, the query will be executed every 10 minutes
startTime	Optional, can be used to specify when the query should be run for the first time. If set, the the frequency will be determined from that time onwards. For example, is a weekly run is scheduled to start at 07/01/2014 13:30, the first run will run on 07/01 at 13:30, with the next run at the same time on 07/08/2014. The time is based on the local time of the machine running the Agent. Supported Date Formats: MM/dd/yyyy HH:mm, MM/dd/yy HH:mm, MM/dd/yyyy, MM/dd/yy, HH:mm:ss,HH:mm,mm
overrideVals	This enables data storage strategies to be specified. If this is not defined, the results of the query is added to the existing dataset. To replace all data for this dataset within Cloud9 Charts, specify {"replaceAll":true}. To upsert data specify "replaceValuesForKey":["fieldA","fieldB"]. This will replace all existing records in Cloud9 Charts with the same fieldA and fieldB with the the current data and insert records where they are not present.

Datasource Example:

[
  {
    "name":"c9cDatasource",
    "datasource":"cloud9charts",
    "exportApiKey":"hDIdFflk9ilgmdogpgpPMN1Rcsie"
  }
]

Query Example:

[
  {
    "entityName":"Aggregated Data",
    "dsName":"c9cDatasource",
    "c9ExportEntityName":"Raw Data",
    "queryStr":"select * where a is not null and a < 10800 ; select sum(a) as Total A, sum(b) as Total B, \"Some Type\" as Type order by b desc",
    "overrideVals":{
      "replaceValuesForKey":["Type"]
    },
    "frequencyType":"hourly",
    "frequency":"2",
    "startTime":"09"
  }
]

In the example above, Raw Data dataset is queried for certain conditions, then aggregated and posted into an "Aggregated Data" dataset on a timed schedule.

Overview

Query A Dataset

Reuse Existing Datasets

Cloud9 Agent Configuration

Related articles