Knowi enables analysis, visualization, warehousing and reporting automation from file-based data across the following file formats & Channels:
Note, that data from these sources will be created as widgets directly.
Use our agent inside your network to connect and extract data from a variety of file-based data sources securely. See Cloud9Agent to download your agent along with instructions to run it. Under the examples folder, the agent contains datasource_example_csv.json and query_example_csv.json to get you started.
- Edit those to point to your directory and modify the queries to pull your data.
- Move it into the config directory (datasource_XXX.json files first if the Agent is running).
CSV/Delimited files
Highlights:
- Execute queries on the file when the file is modified, one time or on a schedule.
- Extract/Cleanse and query file based data using Cloud9QL.
Datasource Configuration:
Parameter | Comments |
---|---|
name | Unique Datasource Name. |
datasource | Set value to csv |
url | File path. Example: /usr/local/examples/demoFile.txt |
quoteChar | Quote Character for String values, if applicable. Defaults to double quotes if not specified. |
delimChar | Delimiter character. Defaults to comma if not specified |
nullFindString | (Optional) Searches for fields to replace with nullReplaceWith. Useful to replace values such as 'NA', 'None' etc in a number field |
nullReplaceWith | Optional, Replaces the string specified in nullFindString with this |
columnTypes | Nested JSON that defines datatypes for fields. Defaults to String if a type is not specified. Datatypes supported: String, Date(), Number, double. See example and the Date Format section. |
Date Formats for Date Fields in columnTypes:
Letter | Date/Time Component |
---|---|
y | Year |
M | Month |
w | Week of Year |
W | Week in month |
D | Day in Year |
d | Day in Month |
F | Day of Week in Month |
a | AM/PM marker |
H | Hour in day (0-23) |
h | Hour in am/pm (1-12) |
m | Minute in hour |
s | Second in minute |
S | Millisecond |
z | Time zone |
Z | Time zone |
Example:
... "columnTypes":{ "Week":"Date(MM/dd/yy)", "Date":"Date(MM/dd/yy HH:mm)" } ...
Query Configuration:
Query Config Params | Comments |
---|---|
entityName | Dataset Name Identifier |
identifier | A unique identifier for the dataset. Either identifier or entityName must be specified. |
dsName | Name of the datasource name configured in the datasource_XXX.json file to execute the query against. Required. |
queryStr | Cloud9QL query to extract and manipulate data using. Use select * to select all. See Cloud9QL docs |
runOnSourceChange | The query is run when the file has been modified if set to true. Possible values: true or false. |
frequencyType | Alternative to runOnSourceChange, pick up the file on a schedule. One of minutes, hours, days, weeks, months. If this is not specified, this is treated as a one-time query, executed upon Cloud9Agent startup (or when the query is first saved) |
frequency | Indicates the frequency if frequencyType is defined. For example, if this value is 10 and the frequencyType is minutes, the query will be executed every 10 minutes |
startTime | (Optional) Can be used to specify when the query should be run for the first time. If set, the frequency will be determined from that time onwards. For example, is a weekly run is scheduled to start at 07/01/2014 13:30, the first run will run on 07/01 at 13:30, with the next run at the same time on 07/08/2014. The time is based on the local time of the machine running the Agent. Supported Date Formats: MM/dd/yyyy HH:mm, MM/dd/yy HH:mm, MM/dd/yyyy, MM/dd/yy, HH:mm:ss,HH:mm,mm |
overrideVals | This enables data storage strategies to be specified. If this is not defined, the results of the query are added to the existing dataset. To replace all data for this dataset within Knowi, specify {"replaceAll":true}. To upsert data specify "replaceValuesForKey":["fieldA","fieldB"]. This will replace all existing records in Knowi with the same fieldA and fieldB with the current data and insert records where they are not present. |
Datasource Example:
[
{
"name":"demoCSV",
"url":"/usr/local/examples/demo-data.csv",
"datasource":"csv",
"quoteChar":"\"",
"delimChar":",",
"eolSymbols":"\n",
"nullFindString":"None",
"nullReplaceWith":"0",
"columnTypes":{
"Week":"Date(MM/dd/yy)",
"Date":"Date(MM/dd/yy HH:mm)"
}
}
]
Query Example:
[
{
"entityName":"CSV Demo",
"dsName":"demoCSVFile",
"queryStr":"select count(*) where someField like abc",
"runOnSourceChange":true,
"overrideVals":{
"replaceAll":true
}
}
]
JSON
Highlights:
- Extract data from JSON documents
- Extract/Cleanse Query data using Cloud9QL, including nested documents.
- Trigger query execution once, on a schedule or when a file is modified.
Datasource Configuration:
Parameter | Comments |
---|---|
name | Unique Datasource Name. |
datasource | Set value to json |
url | File path. Example: /usr/local/examples/demoFile.json |
Query Configuration:
Query Config Params | Comments |
---|---|
entityName | Dataset Name Identifier |
identifier | A unique identifier for the dataset. Either identifier or entityName must be specified. |
dsName | Name of the datasource name configured in the datasource_XXX.json file to execute the query against. Required. |
queryStr | Cloud9QL query to extract and manipulate data using. Use select * to select all. See Cloud9QL docs. Nested queries and objects are supported. |
runOnSourceChange | Optional. The query is run when the file has been modified if set to true. Possible values: true or false. |
frequencyType | Alternative to runOnSourceChange, pick up the file on a schedule. One of minutes, hours, days, weeks, months. If this is not specified, this is treated as a one-time query, executed upon Cloud9Agent startup (or when the query is first saved) |
frequency | Indicates the frequency if frequencyType is defined. For example, if this value is 10 and the frequencyType is minutes, the query will be executed every 10 minutes |
startTime | (Optional) Can be used to specify when the query should be run for the first time. If set, the frequency will be determined from that time onwards. For example, is a weekly run is scheduled to start at 07/01/2014 13:30, the first run will run on 07/01 at 13:30, with the next run at the same time on 07/08/2014. The time is based on the local time of the machine running the Agent. Supported Date Formats: MM/dd/yyyy HH:mm, MM/dd/yy HH:mm, MM/dd/yyyy, MM/dd/yy, HH:mm:ss,HH:mm,mm |
overrideVals | This enables data storage strategies to be specified. If this is not defined, the results of the query are added to the existing dataset. To replace all data for this dataset within Knowi, specify {"replaceAll":true}. To upsert data specify "replaceValuesForKey":["fieldA","fieldB"]. This will replace all existing records in Knowi with the same fieldA and fieldB with the current data and insert records where they are not present. |
Datasource Example:
[
{
"name":"demoJSON",
"url":"examples/demo-data.json",
"datasource":"json"
},
{
"name":"demoJSON-Nested",
"url":"examples/demo-nested.json",
"datasource":"json"
}
]
Query Example:
[
{
"entityName":"JSON Demo - 1",
"dsName":"demoJSON",
/* Using Cloud9SQL to query JSON file */
"queryStr":"select sum(sent) as Sent, customer as Customer group by Customer",
"runOnSourceChange":true,
"overrideVals":{
"replaceAll":true
}
},
{
"entityName":"JSON Nested Demo",
"dsName":"demoJSON-Nested",
"queryStr":"select nestedObj.a, nestedArr[0], sent where nestedArr[0]=150",
"runOnSourceChange":true,
"overrideVals":{
"replaceAll":true
}
}
]
Log Files
Highlights:
- Extract data from Logs using a regex based parser
- Extract/Cleanse Query data using Cloud9QL
- Trigger query execution once, on a schedule or when a file is modified.
Note: The examples folder of the agent contains datasourceexampleregex.json and queryexampleregex.json sample configurations.
Datasource Configuration:
Parameter | Comments |
---|---|
name | Unique Datasource Name. |
datasource | Set value to regex |
url | File path. Example: /usr/local/examples/demoFile.json |
logFormat | Format of the log entries to process. |
Example: ${date:DATE(MMM d HH:mm:ss)}${host:WORD()}\\S+:${msg}
Given the following log entries
Jan 25 04:02:02 abhf sudo: admin : TTY=unknown ; PWD=/home/admin ; USER=root ; COMMAND=/usr/bin/dk
Jan 25 04:02:02 ldklf sshd[27099]: pddam_unix(sshd:session): session closed for user admin
The results will be three fields:
Date: parsed in the date format
host: a host field, parsed after the date
msg: Rest of the log entry.
Query Configuration:
.
Query Config Params | Comments |
---|---|
entityName | Dataset Name Identifier |
identifier | A unique identifier for the dataset. Either identifier or entityName must be specified. |
dsName | Name of the datasource name configured in the datasource_XXX.json file to execute the query against. Required. |
queryStr | Cloud9QL query to extract and manipulate data using. Use select * to select all. See Cloud9QL docs. Nested queries and objects are supported. |
runOnSourceChange | Optional. The query is run when the file has been modified if set to true. Possible values: true or false. |
frequencyType | Alternative to runOnSourceChange, pick up the file on a schedule. One of minutes, hours, days, weeks, months. If this is not specified, this is treated as a one-time query, executed upon Cloud9Agent startup (or when the query is first saved) |
frequency | Indicates the frequency if frequencyType is defined. For example, if this value is 10 and the frequencyType is minutes, the query will be executed every 10 minutes |
startTime | (Optional) Can be used to specify when the query should be run for the first time. If set, the frequency will be determined from that time onwards. For example, is a weekly run is scheduled to start at 07/01/2014 13:30, the first run will run on 07/01 at 13:30, with the next run at the same time on 07/08/2014. The time is based on the local time of the machine running the Agent. Supported Date Formats: MM/dd/yyyy HH:mm, MM/dd/yy HH:mm, MM/dd/yyyy, MM/dd/yy, HH:mm:ss,HH:mm,mm |
overrideVals | This enables data storage strategies to be specified. If this is not defined, the results of the query is added to the existing dataset. To replace all data for this dataset within Knowi, specify {"replaceAll":true}. To upsert data specify "replaceValuesForKey":["fieldA","fieldB"]. This will replace all existing records in Knowi with the same fieldA and fieldB with the current data and insert records where they are not present. |
Datasource Example:
[
{
"name":"demoRegex",
"url":"examples/demo-regex.log",
"datasource":"regex",
"logFormat":"${date:DATE(MMM d HH:mm:ss)}${host:WORD()}\\S+:${msg}"
}
]
Query Example:
[
{
"entityName":"Sample Regex Query",
"dsName":"demoRegex",
"queryStr":"select date(date) as date, count(*) as count group by date(date)",
"runOnSourceChange":true,
"overrideVals":{
"replaceAll":true
}
}
]
XML Files
Highlights:
- Extract data from XML files
- Traverse/Extract/Cleanse/Aggregate data using Cloud9QL
- Trigger query execution once, on a schedule or when a file is modified.
Note: The examples folder of the agent contains datasourceexamplexml.json and queryexamplexml.json sample configurations.
Datasource Configuration:
Parameter | Comments |
---|---|
name | Unique Datasource Name. |
datasource | Set value to xml |
url | File path. Example: /usr/local/examples/demoFile.txt |
dataNode | The base node to traverse from |
columnTypes | Nested JSON that defines datatypes for fields. For nested elements, use the dot notation. Example: Count.Sent Defaults to String if a type is not specified. Datatypes supported: String, Date(), Number, double. See example and the Date Format section. |
Date Formats for Date Fields in columnTypes:
Letter | Date/Time Component |
---|---|
y | Year |
M | Month |
w | Week of Year |
W | Week in month |
D | Day in Year |
d | Day in Month |
F | Day of Week in Month |
a | AM/PM marker |
H | Hour in day (0-23) |
h | Hour in am/pm (1-12) |
m | Minute in hour |
s | Second in minute |
S | Millisecond |
z | Time zone |
Z | Time zone |
Example:
... "columnTypes":{ "Week":"Date(MM/dd/yy)", "Date":"Date(MM/dd/yy HH:mm)", "Count.Sent":"Integer" } ...
Query Configuration:
Query Config Params | Comments |
---|---|
entityName | Dataset Name Identifier |
identifier | A unique identifier for the dataset. Either identifier or entityName must be specified. |
dsName | Name of the datasource name configured in the datasource_XXX.json file to execute the query against. Required. |
queryStr | Cloud9QL query to extract and manipulate data using. Use select * to select all. See Cloud9QL docs |
runOnSourceChange | The query is run when the file has been modified if set to true. Possible values: true or false. |
frequencyType | Alternative to runOnSourceChange, pick up the file on a schedule. One of minutes, hours, days, weeks, months. If this is not specified, this is treated as a one-time query, executed upon Cloud9Agent startup (or when the query is first saved) |
frequency | Indicates the frequency if frequencyType is defined. For example, if this value is 10 and the frequencyType is minutes, the query will be executed every 10 minutes |
startTime | (Optional) Can be used to specify when the query should be run for the first time. If set, the frequency will be determined from that time onwards. For example, is a weekly run is scheduled to start at 07/01/2014 13:30, the first run will run on 07/01 at 13:30, with the next run at the same time on 07/08/2014. The time is based on the local time of the machine running the Agent. Supported Date Formats: MM/dd/yyyy HH:mm, MM/dd/yy HH:mm, MM/dd/yyyy, MM/dd/yy, HH:mm:ss,HH:mm,mm |
overrideVals | This enables data storage strategies to be specified. If this is not defined, the results of the query are added to the existing dataset. To replace all data for this dataset within Knowi, specify {"replaceAll":true}. To upsert data specify "replaceValuesForKey":["fieldA","fieldB"]. This will replace all existing records in Knowi with the same fieldA and fieldB with the current data and insert records where they are not present. |
Datasource Example:
[
{
"name":"demoXML",
"url":"examples/demo-data.xml",
"datasource":"xml",
"dataNode":"data.item",
"columnTypes":{
"id":"Integer",
"Week":"Date(MM/dd/yy)",
"Date":"Date(MM/dd/yy HH:mm)",
"Count.Sent":"Integer"
}
}
]
Query Example:
[
{
"entityName":"XML Demo - 1",
"dsName":"demoXML",
"queryStr":"select *",
"runOnSourceChange":true,
"overrideVals":{
"replaceAll":true
}
},
{
"entityName":"XML Demo - 2",
"dsName":"demoXML",
"queryStr":"select sum(Count.Sent) as Sent, Customer where recipients.recipient[0] = c001@cloud9charts.com",
"runOnSourceChange":true,
"overrideVals":{
"replaceAll":true
}
}
]
The first query returns the entire document as is with nested elements intact. The second traverses nested relationships to build a simple aggregation, based on specific conditions.
FTP
Highlights:
- Extends the CSV/Delimiter functionality to connect and pull files from an FTP site.
- Can be set to only process if the file is different than the last pulled file.
Datasource and Query configurations are the same as the CSV/JSON processing, except for a customProcessors attribute that enables FTP.
Example CSV Datasource using FTP to pull file:
[
{
"name":"pageViewsLogs",
"url":"/export/home/akjd/file.csv",
"datasource":"csv",
"quoteChar":"\"",
"delimChar":"|",
"eolSymbols":"\n",
"nullFindString":"None",
"nullReplaceWith":"0",
"columnTypes":{
"startCount":"int",
"endCount":"int"
},
"customProcessors":[
{
"className":"com.cloud9.connector.processors.FTPProcessor",
"processOrder":"timed",
"frequencyType":"days",
"frequency":1,
"startTime":"03:15",
"params":{
"host":"ftp.box.com",
"user":"box@cloud9charts.com",
"password":"somePass",
"remoteFile":"/BoxDir/\${date:yyyyMMdd}/page_views_\${date:yyyyMMdd-1d}000000_\${date:yyyyMMdd}000000.csv",
"localFile":"/export/home/akjd/file.csv",
"overwriteOnlyWhenChanged":true
}
}
]
}
]
FTP Processor params configuration:
remoteFileRemote File Name. For files with dates in them, use \${date:dateFormat} to for the current date. You can also add or subtract days or weeks. Example: \${date:yyyyMMdd-1d}
FTP Processor Parameter | Comments |
---|---|
host | FTP Host to connect to |
port | Port number, optional |
user | user to connect to the FTP site. Connects as anonymous if not specified |
password | Password to connect. Connects as anonymous if a user is not specified |
mode | Optional security mode. Default is empty. Values: ftps or ftpes |
localFile | Where to put the file pulled from the FTP site. Typically, this is set to the same as url parameter in the main datasource configuration |
overwriteOnlyWhenChanged | If set to true overwrites local copy only if the remote file is different than the local copy. |
Email Attachments
This can be used to turn existing text reports sent via email into dashboards widgets and dashboards, with storage and trends.
Highlights:
- Visualize, transform and track your existing CSV/delimited email reports.
- CC existing email report to us.
Contact us for the mailbox to CC your report into and we'll take care of the rest.
Excel Files
Highlights:
- Process Excel files when the file is modified, on a schedule, or one-time.
- Extract/Cleanse data from a Worksheet using Cloud9QL.
Datasource Configuration:
Parameter | Comments |
---|---|
name | Unique Datasource Name. |
datasource | Set value to excel |
url | File path. Example: /usr/local/examples/demoFile.txt |
Query Configuration:
Query Config Params | Comments |
---|---|
entityName | Dataset Name Identifier |
identifier | A unique identifier for the dataset. Either identifier or entityName must be specified. |
dsName | Name of the datasource name configured in the datasource_XXX.json file to execute the query against. Required. |
queryStr | Cloud9QL query to extract and manipulate data from the worksheet. Use select * to select everything from the worksheet. See Cloud9QL docs |
excelWorksheetName | Worksheet name to process data from |
runOnSourceChange | The query is run when the file has been modified if set to true. Possible values: true or false. |
frequencyType | Alternative to runOnSourceChange, pick up the file on a schedule. One of minutes, hours, days, weeks, months. If this is not specified, this is treated as a one-time query, executed upon Cloud9Agent startup (or when the query is first saved) |
frequency | Indicates the frequency if frequencyType is defined. For example, if this value is 10 and the frequencyType is minutes, the query will be executed every 10 minutes |
startTime | (Optional) Can be used to specify when the query should be run for the first time. If set, th frequency will be determined from that time onwards. For example, is a weekly run is scheduled to start at 07/01/2014 13:30, the first run will run on 07/01 at 13:30, with the next run at the same time on 07/08/2014. The time is based on the local time of the machine running the Agent. Supported Date Formats: MM/dd/yyyy HH:mm, MM/dd/yy HH:mm, MM/dd/yyyy, MM/dd/yy, HH:mm:ss,HH:mm,mm |
overrideVals | This enables data storage strategies to be specified. If this is not defined, the results of the query are added to the existing dataset. To replace all data for this dataset within Knowi, specify {"replaceAll":true}. To upsert data specify "replaceValuesForKey":["fieldA","fieldB"]. This will replace all existing records in Knowi with the same fieldA and fieldB with the current data and insert records where they are not present. |
Datasource Example:
[
{
"name":"demoExcel",
"url":"/usr/local/abc/files/demo-data.xlsx",
"datasource":"excel"
}
]
Query Example:
[
{
"entityName":"Excel Data - Demo Sent",
"dsName":"demoExcel",
"queryStr":"select sum(sent) as Sent, Customer group by customer",
"runOnSourceChange":true,
"excelWorksheetName":"demo-data",
"overrideVals":{
"replaceAll":true
}
}
]
HTTP File Downloads
Highlights:
- Download and process CSV/JSON files via HTTP.
- Extract/manipulate data using Cloud9QL.
- Pull on a schedule, or once, or process if the file has been modified since last run.
Datasource Example:
[
{
"name":"demoHttpPublic",
"url":"https://s3.amazonaws.com/datatesting/demo-data-public.csv",
"datasource":"http",
"dataType":"csv",
"localFile":"./downloadedFile.csv"
}
]
Query Example:
[
{
"entityName":"HTTP File Demo",
"dsName":"demoHttpPublic",
"queryStr":"select date(date) as Date, sum(sent) as Total Opened, sum(opened) as Opened group by date(date)",
"overrideVals":{
"replaceAll":true
}
}
]
Amazon S3
Highlights:
- Process CSV/JSON files stored in S3.
- Extract/manipulate data using Cloud9QL.
- Pull on a schedule, once, or process if the file has been modified after last run.
- Process File names with dates dynamically based on the current date.
Authorization
For files that require authentication/authorization, create credentials from AWS:
- Login to the AWS IAM console:
- Click on 'Create a New group of Users', enter a group name. Select 'Read Only Access'
- Enable "s3:Get*" in the Policy Document, remove all others:
- Click on 'Create New Users'. Create a user name with 'Generate an access key for each user' checked.
- Once created, use the show credentials to note down the credentials for the user.
Datasource Configuration:
Parameter | Comments |
---|---|
name | Unique Datasource Name. |
datasource | Set value to s3 |
s3RegionName | Region associated to the bucket. Example: us-east-1 |
s3BucketName | S3 Bucket Name/Identifier |
s3FileKey | S3 File Name |
datatype | File format. Supported: json/csv |
s3FileKey | S3 File Name |
datatype | File format. Supported: json/csv. Defaults to csv if not specified. |
localFile | Location and name of the local file upon download. Optional - if not specified, it'll pull same name as the S3 key into the base agent directory. |
authAccessToken | S3 Credentials key obtained for the user during the auth step |
authAccessTokenSecret | S3 Credentials secret obtained for the user using the auth step |
Query Configuration:
Query Config Params | Comments |
---|---|
entityName | Dataset Name Identifier |
identifier | A unique identifier for the dataset. Either identifier or entityName must be specified. |
dsName | Name of the datasource name configured in the datasource_XXX.json file to execute the query against. Required. |
queryStr | Cloud9QL query to extract and manipulate data from the worksheet. Use select * to select everything from the worksheet. See Cloud9QL docs |
runOnSourceChange | The query is run when the file has been modified, if set to true. Possible values: true or false. |
frequencyType | Alternative to runOnSourceChange, pick up the file on a schedule. One of minutes, hours, days, weeks, months. If this is not specified, this is treated as a one time query, executed upon Cloud9Agent startup (or when the query is first saved). |
frequency | Indicates the frequency, if frequencyType is defined. For example, if this value is 10 and the frequencyType is minutes, the query will be executed every 10 minutes. |
startTime | Optional, can be used to specify when the query should be run for the first time. If set, the the frequency will be determined from that time onwards. For example, is a weekly run is scheduled to start at 07/01/2014 13:30, the first run will run on 07/01 at 13:30, with the next run at the same time on 07/08/2014. The time is based on the local time of the machine running the Agent. Supported Date Formats: MM/dd/yyyy HH:mm, MM/dd/yy HH:mm, MM/dd/yyyy, MM/dd/yy, HH:mm:ss,HH:mm,mm |
overrideVals | This enables data storage strategies to be specified. If this is not defined, the results of the query is added to the existing dataset. To replace all data for this dataset within Knowi, specify {"replaceAll":true}. To upsert data specify "replaceValuesForKey":["fieldA","fieldB"]. This will replace all existing records in Knowi with the same fieldA and fieldB with the the current data and insert records where they are not present. |
Datasource Example:
[
{
"name":"demoS3CSV",
"datasource":"s3",
"s3RegionName":"us-east-1",
"s3BucketName":"datatesting",
"s3FileKey":"demo-data.csv",
"dataType":"csv",
"authAccessToken":"someToken",
"authAccessTokenSecret":"someSecret",
"localFile":"./downloadedFile.csv"
},
{
"name":"demoS3JSON",
"datasource":"s3",
"s3RegionName":"us-east-1",
"s3BucketName":"datatesting",
"s3FileKey":"demo-data.json",
"dataType":"json",
"authAccessToken":"someToken",
"authAccessTokenSecret":"someSecret"
}
]
For processing s3BucketName or s3FileKey that has a date based name, use ${date: <dateFormat>} to process based on current date.
Examples:
i. If the current date is 08/01/2014 and today's data is identified by 20140801-file.csv in S3, this can be defined as ${date:yyyyMMdd}-file.csv for processing.
ii. If the current date is 08/01/2014 and the file has multiple days worth of data, identified by 20140801-20140727-file.csv, define it as ${date:yyyyMMdd}-${date:yyyyMMdd-5d}-file.csv
Query Example:
[
{
"entityName":"S3 CSV",
"dsName":"demoS3CSV",
"queryStr":"select * limit 10",
"overrideVals":{
"replaceAll":true
}
},
{
"entityName":"S3 JSON",
"dsName":"demoS3JSON",
"queryStr":"select * where sent > 10000",
"overrideVals":{
"replaceAll":true
}
}
]
Public Files
For public files on S3 that does not require S3 credentials, see our HTTP File Downloads section