Introduction to Dataproc

Lab Details:

  1. This lab walks you through GCP Dataproc, solve any mathematical problem by submitting it in the form of a Job and getting the result with utmost precison.

  2. Region: us-central1

  3. Duration: 1 hour

Note: Do not refresh the page after you click Start Lab, wait for a few seconds to get the credentials.
In case while login into Google, if it asks for verification, please enter your mobile number and verify with OTP, Don't worry this Google Account will be deleted after the lab.

What is Dataproc ?

It is more used for the purpose of data analysis . If you want to use Apache Spark (used to run queries on large datasets , create a data pipeline , working with graphs and everything related to big data) , you need the in-memory cache and a good computing power along with TB/PB of memory . The solution to the above requirements in Dataproc in Google Cloud. By using Google Dataproc you can create a cluster in under 90 seconds .

The underlying stack for Dataproc are Apache Spark , Apache Hive , Apache Pig and Apache Hadoop.

What are Clusters in Dataproc?

Think of Cluster as a group of 1 or more computers(nodes) connected with each other in a single VPC(virtual private cloud) . The benifit of using a cluster is that it enhances the memory capacity and computational power , thus increasing the performance of the system .

What is a Job in Dataproc ?

As the name suggets , Job in Dataproc is defined as any work you want to assign to the cluster in dataproc , for example calcualtion of pi or reading the number of words in a document.

Advantage of using Dataproc:

  •  You won’t need to worry about losing data, because Dataproc is integrated with BigQuery and other core services.

  • If you are having a budget constraint,you can Scale up or down even when jobs are running .

  • You can even switch off the cluster when you don't need them , thus reducing the billing charges.

Lab Tasks :

  1. Creating a Cluster and Job using the Cloud Shell.

  2. Submitting a job to the Cluster.

  3. Updating the Cluster using the Console.

  4. Deleting the Cluster.



Join Whizlabs_Hands-On to Read the Rest of this Lab..and More!

Step 1 : Login to My-Account
Step 2 : Click on "Access Now" to view the course you have purchased
Step 3 : You will be taken to our Learn Management Solution (LMS) to access your Labs,Quiz and Video courses

Open Console