Apache Spark with Scala for High-Performance Big Data Processing

In order to embark on a successful career in data science, understanding Apache Spark is essential. This course helps solidify that understanding.

Enroll Now

Big Data eLearning Apache Spark with Scala: Become a Spark Expert in 21 days. The objective of this course is to provide a comprehensive understanding and training on Apache Spark. If you are an absolute beginner, this Apache Spark tutorial is perfect for your needs. Also, if you understand the basics of Apache Spark but would like to deepen your knowledge, then this course is an apt choice.

It should be noted that this course doesn’t delve into deep instruction about the Scala programming language. However, a basic understanding of Scala is useful for this course.

From the fundamentals of RDDs to advanced features like Spark SQL and Streaming, you will learn everything you need to know about Apache Spark in 21 days.

Enroll Now

Course Objectives

In the course, we will talk about a wide variety of topics in Spark, including:

An introduction to Spark and Spark installation
RDD programming and key-value pair RDDs
Partitioning in Spark
Spark SQL tutorial
The latest spark abstractions like Dataframe and Dataset
An exploration of what file systems are supported in Spark
Spark UDFs
An understanding of shared variables like Accumulators and broadcast variables
How to tune and debug Spark applications
Various concepts related to structured streaming
An introduction to machine learning and how MLib libraries can be used for machine learning use cases

Who Should Take This Course?

The course is designed to provide a solid foundation of understanding for Apache Spark — for those who aspire to make a career move to big data. Additionally, anyone looking to become a Spark expert or deepen their understanding of Apache Spark will benefit from this course.

The course is ideal for:

Spark/Big Data aspirants
Big data engineers
Big data developers
Big data architects
Data scientists
Analytics professional

Pre-Requisites for This Course

A basic understanding of Scala is a plus. However, it is not mandatory.

Enroll Now

Why Learn Apache Spark?

More and more companies are starting to use big data technologies such as Apache Hive, Spark, and Pig to process ever-growing amounts of data in order to derive valuable insights and make better business decisions.

Spark can process huge volumes of data with ease. There's no need to write complex map-reduce programs, and development time is also significantly reduced. Spark can perform all the ETL tasks, making it a versatile tool for big data.

Curriculum

Apache Spark 2 with Scala

Apache Spark Introduction
How to Install Spark on Windows
Get Familiarized with Scala and Python Shells
Persistence and Storage levels

Key-Value Pair RDDs

What Are Pair RDDs?
How to Create Pair RDDs
Transformations of Single Pair RDDs
Combine by Key Pair RDD Transformation
Transformations on Multi-Pair RDDs
Actions on Pair RDDs

Partitioning in Spark - RDD Partitioning

What Is RDD Partitioning?
Why Should We Partition the Data in Spark?
How to Create Partitions in RDD
Determining the Number of Partitions

Spark SQL

What Is Spark SQL?
What Is a Dataframe?
How to Programmatically Specify a Schema
Dataframe Operations

Loading and Saving your Data

How to Create a DataFrame From JSON File
How to Create a DataFrame from RDBMS
How to Create a Dataframe from ElasticSearch
How to Create a DataFrame From Parquet File
How to Create DataFrame from TextFile
How to Create a DataFrame From CSV File

Programming with Dataset

What Is a Dataset?
How to Create a Dataset: 4 Ways to Create a Dataset
RDD vs Dataframe vs Dataset
Joining Datasets: How to Join Two Datasets
How to Remove the Duplicate Column while Joining the Datasets

Supported FileSystems by Apache Spark

Most Commonly-Used File Systems
Which File System to Use HDFS or Amazon S3
Spark User-Defined Functions

Accumulators and Broadcast Variables

Spark Accumulators
Spark Broadcast Variables

Tuning and Debugging Spark

Spark-Submit Commands and Flags
How to Set Executor Cores, Number of Executors and Executor Memory for Maximum Performance
5 Strategies to Improve Spark Application Performance
How to Improve Spark Performance: Tuning the Levels of Parallelism
How to Improve Spark Performance: Use Kryo Serializer
How to Improve Spark Performance: Tweaking Memory Parameters
How to Improve Spark Performance: Improving Cache Policy
How to Improve Spark Performance: Tweaking Cluster Sizing Parameters
How to Debug a Spark Application
Spark Web UI

Spark Structured Streaming

What Is Spark Structured Streaming?
Dstreams vs Streaming Datasets
How to Create a Streaming Dataset from a Socket Source
How to Create a Streaming Dataset from a Continuously-Updated Directory
How to Create a Streaming Dataframe from an S3 File
How to Write the Contents of a Streaming Dataset onto a Console
How to Perform Window Operations on a Streaming Dataframe Dataset

Machine Learning in Spark using MLibs

What is Machine Learning?
How to Perform Spam Email Classification Using MLib Library in Spark

Enroll now and gain the necessary skills to excel in the world of big data, making yourself a valuable asset to any company. Sign up for our free course on Apache Spark today by clicking the link below:

Enroll Now

Apache Spark with Scala for High-Performance Big Data Processing

Course Objectives

Who Should Take This Course?

Pre-Requisites for This Course

Why Learn Apache Spark?

Curriculum

Enroll now and gain the necessary skills to excel in the world of big data, making yourself a valuable asset to any company. Sign up for our free course on Apache Spark today by clicking the link below:

Enter your name and email below to get your FREE Guide