Azure Data Engineer training in Hyderabad | Azure Data Engineer training i Hyderabad

Jacinta InfoTech

AZURE DATA ENGINEER & Demo video

Azure Data Engineer + Databricks Developer

DataBricks

Spark

Pyspark

Spark SQL

Delta Lake

Azure Data Factory

 Azure Synapse DW (Dedicated SQL POOL)

 Azure ADF & Databricks Projects

Azure Databricks Concepts.

1) Azure Databricks Introduction

  1. Databricks Architecture
  1. Databricks Components overview
  1. Benefits for data engineers and data scientists

2) Azure Databricks concepts

  1. Workspace – Creation and managing workspace.
  1. Notebook – creating notebooks, calling and managing different notebooks.
  1. Library – installing libraries, managing librariesYouTube Channel: techlake

3) Data Management

  1. Databricks File System. – DBFS commands copy and manage files using DBFS.
  1. Database – Creating database, tables and managing databases and tables.
  1. Table – Creating Tables, dropping tables, loading data ..
  1. Metastore – managing metadata and delta tables creation, managing delta tables.

4) Computation Management

  1. Cluster — Creating Clusters , managing clusters
  1. Pool – creating pools and using pools for Auto scaling.
  1. Databricks RunTime – understanding and using Databricks runtimes based on requirement.
  1. Jobs – creating jobs from notebooks and assigning types of clusters for jobs.
  1. Workload – monitoring jobs and managing loads.
  1. Execution Context – understanding context.

5) Databricks Advanced topics.

  1. Databricks Workflows

Calling one notebook into another notebook.

  1. Creating global variables (widgets) and using into Azure ADF pipeline.
  1. How to implement parallelism in notebooks execution.
  1. Mounting azure blob storage and data lake storage accounts.
  1. Integrating source code (notebooks) with GitHub
  1. Calling DataBricks notebooks into Azure Data factory.
  1. Databricks Clusters logs monitoring flow.

SPARK Concepts

1) Introduction to Spark -Getting started

  1. What is Spark and what is its purpose?
  1. Components of the Spark unified stack
  1. Resilient Distributed Dataset (RDD)
  1. Downloading and installing Spark standalone
  1. Scala and Python overview
  1. Launching and using Spark’s Scala and Python shell ©

2) Resilient Distributed Dataset and DataFramesYouTube Channel: techlake

  1. Understand how to create parallelized collections and external datasets
  1. Work with Resilient Distributed Dataset (RDD) operations
  1. Utilize shared variables and key-value pairs

3) Spark application programming

  1. Understand the purpose and usage of the Spark Context
  1. Initialize Spark with the various programming languages
  1. Describe and run some Spark examples
  1. Pass functions to Spark
  1. Create and run a Spark standalone application
  1. Submit applications to the cluster

4) Introduction to Spark libraries

  1. Understand and use the various Spark libraries

5) Spark configuration, monitoring and tuning

  1. Understand components of the Spark cluster
  1. Configure Spark to modify the Spark properties, environmental variables, or logging

properties

  1. Monitor Spark using the web UIs, metrics, and external instrumentation ,Understand

performance tuning considerations

PySpark Content

Introduction To Pyspark

1)What is SparkSession

2)How to create spark session

3)What is SparkContext

4)How to create SparkContext

5)What is SQLContext

How to Use Jupyter Notebooks & Databricks notebooks for Python Development.

Install and configure PySpark in Local System for development.

Introduction to Big Data and Apache Spark

Apache Spark Framework & Execution Process

Introduction To RDDs

1)Different Ways to Create RDD’s in Pyspark.

2)RDD TransformationsYouTube Channel: techlake

3)RDD Actions

4)RDD Cache & Persist

Introduction to DataFrame.

1)Different Ways to Create Data Frame’sin Pyspark.

2)Dataframe Transformations

3)Dataframe Actions

4)Dataframe Cache & Persist

Different types of Big Data File systems.

1)Difference between Row store format and column store format.

2)Avro File

3)Parquet file

4)ORC File

Reading and Writing Different Types of Files using Dataframe.

1)Csv files

2)Json files

3)Xml files

4)Excel files

5)Complex Json files

6)Avro files

7)Parquet files

8)Orc files

Need for Spark SQL

What is Spark SQL

1)SQL Table Creation

2)SQL Join Types

3)SQL Nested Queries

4)SQL DML Operations

5)SQL Merge Scripts

6)SQL SCD Type 2 implementation

User-Defined Functions

Performance Tuning

Spark-Hive Integration

Pyspark Project with execution.

1) End to End Pyspark Projects implementation

2)Executing Pyspark Project in Databricks

3)Executing PySpark project in Azure ADF.

DELTA LAKE

1) Delta Lake usage in Databricks.YouTube Channel: techlake

  1. Delta Lake Architecture
  1. Delta Lake Storage Understanding
  1. Delta lake table creation and API options
  1. Delta Lake DML Operations usage.
  1. Delta Lake partitions
  1. Delta Lake Schema Enforcement
  1. Delta Lake Schema Evolution
  1. Delta Lake Versions
  1. Delta Lake Time Travel
  1. Delta Lake Vaccum
  1. Delta Lake Merge (SCD Type 1 and SCD Type2)

Azure Data Engineer

1) Overview of the Microsoft Azure Platform

  1. Introduction to Azure
  1. Basics of Cloud computing
  1. Azure Infrastructure
  1. Walkthrough of Azure Portal
  1. Overview of Azure Services

2) Azure Data Architecture

  1. Traditional RDBMS workloads.
  1. Data Warehousing Approach
  1. Big data architectures.
  1. Transferring data to and from Azure

3) Azure Storage options

  1. Blob Storage
  1. ADLS Gen1 & Gen2
  1. RDBMS
  1. Hadoop
  1. NoSQL
  1. Disk

4) Blob Storage

  1. Azure Blob Resources
  1. Types of Blobs in Azure
  1. Azure storage account data objects
  1. Azure storage account types and Options
  1. Replications in distributionYouTube Channel: techlake
  1. Secure access to an

application’s data

  1. Azure Import/Export service
  1. Storage Explorer
  1. Practical section on Blob Storage

5) Azure Data Factory

  1. Azure Data Factory Architecture
  1.  

Creating ADF Resource and Use in azure cloud

  1. Pipeline Creation and Usage Options
  1. Copy Data Tool in ADF Portal, Use
  1. Linked Service Creation in ADF
  1. Dataset Creation, Connection Reuse
  1. Staging Dataset with Azure Storage
  1. ADF Pipeline Deployments
  1. Pipeline Orchestration using Triggers
  1. ADF Transformations and other tools integration.
  1. Processing different type’s files using ADF.
  1. Integration Runtime
  1. Monitoring ADF Jobs
  1. Manage IR’s and Linked Services.

6) Azure SQL Database Service

  1. Introduction to Azure SQL Database
  1. Relational Data Services in the Cloud
  1. Azure SQL Database Service Tiers
  1. Database Throughput Units (DTU)
  1. Scalable performance and pools
  1. Creating and Managing SQL Databases
  1. Azure SQL Database Tools
  1. Migrating data to Azure SQL Database

7) Azure Data Lake Gen1 & Gen2

  1. Explore the Azure Data Lake enterprise-class security features.
  1. Understand storage account keys.
  1. Understand shared access signatures.
  1. Understand transport-level encryption with HTTPS.
  1. Understand Advanced Threat Protection.
  1. Control network access.
  1. Differences between Gen1 & Gen2

8) Azure Synapse SQL DW (Dedicated SQL POOL)

  1. Azure Synapse DW (Dedicated SQL POOL)?
  1. Synapse DW Architecture.
  1. Creating Internal table with default distribution
  1. Creating external table in synapse dw
  1. Loading data from databricks to azure synase dw
  1. Loading data from adls gen2 to azure synapse dw
  1. What is dedicated sql pool
  1. data warehouse unit overview
  1. Distributed table with example
  1. Hash distribution with exampleYouTube Channel: techlake
  1. Round robin distribution with example
  1. Replicate distribution with example
  1. What are the types of indexes with examples
  2. Clustered Index with example
  1. Non-Clustered index with example
  1. Clustered Column Store Index with example
  1. Heap Index with example

SPARK SQL:

1) Introduction to Spark SQL.

2) Spark SQL Create database

3) Drop databases

4) Create internal table

5) Create external table

6) Create partitioned table

7) Create partitioned with bucketing table

8) SPARK DML insert, update, delete and merge operations

9) SPARK SQL DRL Select queries with different clauses

10) Spark SQL MERGE With SCD Type 1 and SCD Type 2

11) Spark SQL WHERE Clause, Group By Clause and Having Clauses

12) Spark SQL Order by, Sort By clauses

13) Spark SQL join types, Window , Pivot , Limit and Like

14) Spark SQL Grouping Sets, Rollup and Cube

15) Spark SQL Cultured By and Distributed By

16) Spark SQL Case, With and Take sample

.