When you create a Data Set with IoT Analytics, the default SQL query that is shown in the SQL editor is something like this;
SELECT * FROM datastore
Whilst there’s nothing wrong with this as a starting point, it’s important to realise that an unconstrained query like this will do a full table scan of your data every time it is executed. That may be OK if you have a short retention period configured for your datastore and you only have a few thousand messages, but imagine having terabytes of data stretching back over years and really you only want to get data for the last few days? The query above may work (it may also time out), but it’s vastly inefficient and you’ll be paying for the privilege of scanning the entire data store every time.
There is a better way.
Your first starting point should always be to use a WHERE clause referencing the system provided __dt column. For ‘dt’, think ‘datetime’ because it is this column that gives you access to the time based partitions that help you make queries on IoT Analytics and the double underscores are used to avoid clashes with any attribute names in your own data.
Here’s an example that just looks at data from the past day.
select * from telescope_data where __dt >= current_date - interval '1' day
__dt has a granularity of 1 day so if you want to look at a narrower time-range, you would add a constraint on whatever timestamp column you had in your data, but always, always, always include the __dt constraint as part of your where clause to maximise query performance and lower query cost.
For example, what if we just want data from the last 15 minutes in our query results?
select * from telescope_data where __dt >= current_date - interval '1' day and from_unixtime(epoch) > now() - interval '15' minute
Notice that we have the __dt clause to restrict data to the most recent day and then have a clause based on the epoch reported in each message to further restrict to the last 15 minutes.