Quick Tip – creating export workflows from AWS IoT Analytics

AWS IoT Analytics provides access to the results of a SQL query as a data set that you download using a pre-signed URL, but what if you want to export the results somewhere else automatically?

Although AWS IoT Analytics doesn’t contain this functionality natively, we can leverage the power of a triggered notebook container to achieve our desired outcome. For flexibility, in this example I’m going to use Amazon Kinesis Firehose to stream the data into S3 and in a future post we’ll look at how we can also use Firehose to stream the data into Redshift.

Http iframes are not shown in https pages in many major browsers. Please read this post for details.

That’s pretty straightforward.

Let’s take a closer look at the crux of the job, streaming the CSV out of IoT Analytics and putting JSON records into S3 via firehose.

stream = urllib.request.urlopen(dataset_url)
reader = csv.DictReader(codecs.iterdecode(stream, 'utf-8'))
rows=0
for row in reader:
    record = json.dumps(row)+"\n"
    response = firehose.put_record(DeliveryStreamName=streamName,Record={'Data': record})

This isn’t especially efficient as we are calling Put Record once for each row in our CSV, but it keeps the code simple. If we migrated to using Put Records instead, which is a batch API, it would be much faster but we would have to introduce complexity to keep the batch size within the limits.

All we need to do now is setup a container notebook triggered from the dataset execution in a similar way to how we did it in an earlier post and we’ll have the data set streamed into S3 on every execution.

You may be wondering why you can’t just do this with a Lambda function? You could, if you could trigger a Lambda function when the data set content has been generated, but that’s not currently possible 🙁