This lab walks you through hosting a sample website using the apache web server on the EC2 Linux instance and collecting the real-time logs of the website to AWS S3.
You will practice this lab using Kinesis Data Streams, Kinesis Agent, EC2, Kinesis Firehose, S3.
Duration : 1 hour 30 minutes
AWS Region : US East (N. Virginia) us-east-1
Data streaming technology enables a customer to ingest, process and analyze high volumes of data from a variety of sources.
Kinesis data streams is one such scalable and durable real-time data streaming service.
A Kinesis data stream is an ordered sequence of data records meant to be written to and read from in real time.
The pricing of the data streams is on a per-shard basis.
Data record - The unit of data stored by Kinesis Data Stream.
Data stream - represents a group of data records. The data records in a data stream are distributed into shards.
Retention period - Length of time Data records are accessible from streams. A Kinesis data stream stores records from 24 hours by default, up to 365 days.
Kinesis Client Library - Ensures that for every shard there is a record processor running and processing the shard.
A producer puts data records into shards..
A consumer gets data records from shards
Shard - It has a sequence of data records in a stream.
There can be more than one shards. The number of shards required is mentioned while creating the data stream.
Total capacity of stream is the sum of capacities of its shards.
Ingest rate per shard - 1 MB or 1,000 messages per second.
Data read rate per shard - 2 MB per second.
Partition Key - Used to group data by shard within a stream.
The stream records can be directly sent to services like S3, Redshift, ElasticSearch, etc. instead of creating consumer applications.
Amazon Kinesis agent is a Java software application that offers an easy way to collect and send data to Kinesis Data Firehose.
The agent continuously monitors a set of files and sends new data to your Kinesis Data Firehose delivery stream.
Suppose an application is running on the EC2 Instance and it is generating continuous logs.
Those logs will be pushed into the Kinesis Data Streams.
From the Kinesis Data Streams, it gets consumed through the Kinesis Firehose.
The data from Kinesis Firehose is then saved into the S3 Bucket.
Log into the AWS Management Console
Creating an IAM Role
Launching an EC2 Instance
SSH into EC2 Instance
Host a sample website
Set file permissions to httpd
Creating Kinesis data stream
Creating a S3 Bucket
Creating Kinesis Data Firehose
Creating and configuring Kinesis Agent
Testing the real-time streaming of data
Checking the CloudWatch metrics of Kinesis Data Streams and Data Firehose
Deleting AWS Resources