Running historical data into chunks from the raw source

In cases where your raw datasets are large or queries that span across a large timespan takes a long time to execute, we provide a mechanism to chunk up date ranges to do it in chunks. The chunks are put together and stored into our ElasticStore.

Instructions on using a date range token to run historical data in chunks:

The format of the range date token is:

$c9_<token><op>[<start>,<end>,<increment>]<unit>

- <token>: same as before such as today, yesterday, etc.

- <unit>: same as before such as d, m, y, etc

- <op>: normally we support +,-,*,/ however for range, we only support + and - here

- <start>: start chunk (inclusive) (can be negative)

- <end>: end chunk (inclusive) (can be negative)

- <increment>: increment (optional default to +1) (can be negative)

Example:

select * where date >= $c9_lastyear+[0,11]m and date < $c9_lastyear+[1,12]m

The above query will run the data monthly for the whole year of last year.

Notes:

1. If your query has the chunk token specified, currently all operation on that query (review, schedule execute, etc) will use the LAST chunk. This is important, as for the example above, when you do preview, you are basically execute the following query:

select * where date >= $c9_lastyear+11m and date < $c9_lastyear+12m

2. The chunk will take effect ONLY when you click on "Save and Run Now" and not when you click on Save or during scheduled run (see 1 above).

3. The replacement policy will be taken into account. So if you have replace ALL then only the last chunk data will stay at the end. So make sure your replacement policy is correct. For historical runs, if you starting out with a new query, it is strongly suggested that you leave the overwrite policy as empty so that new data will just get added to the existing result since you already partition the date range in chunk already.

4. After the "Save and Run Now" run, if the query has a schedule, it will run at the schedule using the last chunk (mentioned in #1 above).