Build a real time data streaming system with Amazon Kinesis Data Stream and Kinesis Agent

Lab Details

  1. This lab walks you through hosting a sample website using the apache web server on the EC2 Linux instance and collecting the real-time logs of the website to AWS S3.

  2. You will practice this lab using Kinesis Data Streams, Kinesis Agent, EC2, Kinesis Firehose, S3.

  3. Duration : 1 hour 30 minutes

  4. AWS Region : US East (N. Virginia) us-east-1

Introduction

Amazon Kinesis Data Streams

  • Data streaming technology enables a customer to ingest, process and analyze high volumes of data from a variety of sources.

  • Kinesis data streams is one such scalable and durable real-time data streaming service.

  • A Kinesis data stream is an ordered sequence of data records meant to be written to and read from in real time.

  • The pricing of the data streams is on a per-shard basis.

Components

  • Data record - The unit of data stored by Kinesis Data Stream.

  • Data stream - represents a group of data records. The data records in a data stream are distributed into shards.

  • Retention period - Length of time Data records are accessible from streams. A Kinesis data stream stores records from 24 hours by default, up to 365 days.

  • Kinesis Client Library - Ensures that for every shard there is a record processor running and processing the shard.

  • A producer puts data records into shards..

  • A consumer gets data records from shards

  • Shard - It has a sequence of data records in a stream.

    • There can be more than one shards. The number of shards required is mentioned while creating the data stream.

    • Total capacity of stream is the sum of capacities of its shards.

    • Ingest rate per shard - 1 MB or 1,000 messages per second.

    • Data read rate per shard - 2 MB per second.

    • Partition Key - Used to group data by shard within a stream.

  • The stream records can be directly sent to services like S3, Redshift, ElasticSearch, etc. instead of creating consumer applications.

Amazon Kinesis Agent

  • Amazon Kinesis agent is a Java software application that offers an easy way to collect and send data to Kinesis Data Firehose.

  • The agent continuously monitors a set of files and sends new data to your Kinesis Data Firehose delivery stream.

Architecture Diagram

Case Study

  1. Suppose an application is running on the EC2 Instance and it is generating continuous logs.

  2. Those logs will be pushed into the Kinesis Data Streams.

  3. From the Kinesis Data Streams, it gets consumed through the Kinesis Firehose.

  4. The data from Kinesis Firehose is then saved into the S3 Bucket.

Task Details

  1. Log into the AWS Management Console

  2. Creating an IAM Role

  3. Launching an EC2 Instance

  4. SSH into EC2 Instance

  5. Host a sample website

  6. Set file permissions to httpd

  7. Creating Kinesis data stream

  8. Creating a S3 Bucket

  9. Creating Kinesis Data Firehose

  10. Creating and configuring Kinesis Agent

  11. Testing the real-time streaming of data

  12. Checking the CloudWatch metrics of Kinesis Data Streams and Data Firehose

  13. Deleting AWS Resources