My Path to AWS Big Data Speciality Certification

4 min readOct 4, 2018

I recently sat for the Amazon AWS Certified Big Data Specialty exam and passed it! In this article I would like to provide an outline of the topics covered and my learning path towards certification. The exam is for anyone who wishes to validate their technical skills and experience in designing and implementing big data solutions using AWS cloud services.

For anyone interested in gaining this certification, having an existing AWS Associate level certification is required, and some experience with data analysis is recommended. For myself, I have recently obtained the Certified Solutions Architect Associate and Certified Developer Associate along with at least 5 years experience in the business intelligence & data analytics field.

The exam is 3 hours long with 65 multiple choice questions. The questions are all scenario-based type of questions that requires an understanding of how multiple services are interconnected to solve a big data problem. Refer to the exam blueprint and sample questions.

Enrol in Training Courses

A Cloud Guru’s AWS Big Data Certification course was my main source for training. The course is regularly updated to the fast changing pace of AWS and topics cover over a range of AWS services that will be examinable. Once you have an idea of which AWS services make up the big data ecosystem, proceed with deep diving into each AWS service via documentations and whitepapers, and further study via YouTube videos. Note that this course alone won’t get you through the exam; though it helps a lot.

Read the Documentations

There is a plethora amount of online documentation and resources on the AWS website alone. I looked at the Developer Guides for each AWS service covered for the exam. The Big Data Blog was also another source of learning material, however I only paid attention to the blog posts as mentioned in the A Cloud Guru course. I felt it would have been beneficial to read up more articles there.

The whitepapers is a good complementary to the online documentation and guides which had some useful material particularly in understanding use cases and problem solving different scenarios. Below are the minimum papers one should go through:

Big Data Analytics Options on AWS (January 2016) PDF
Streaming Data Solutions on AWS with Amazon Kinesis (July 2017) PDF
Data Warehousing on AWS (March 2016) PDF
Best Practices for Amazon EMR (August 2013) PDF

Watch AWS Videos

Lastly, I watched numerous YouTube videos coming from AWS Re:Invent and AWS Summit which provided me with customer use cases and real world examples in big data architectures. The deep dive videos provided me further understanding of the AWS Services as well as newer features announced on top of the services covered by A Cloud Guru, however these were not examinable. Below are some of the videos I went through:

AWS re:Invent 2017: Advanced Design Patterns for Amazon DynamoDB (DAT403-R) Link
AWS re:Invent 2015 | (DAT401) Amazon DynamoDB Deep Dive Link
AWS re:Invent 2017: Analyzing Streaming Data in Real Time with Amazon Kinesis (ABD301) Link
AWS re:Invent 2017: Best Practices for Data Warehousing with Amazon Redshift & Redsh (ABD304) Link
Amazon Redshift Masterclass Link
Amazon EMR Masterclass Link
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (BDM401) Link
AWS re:Invent 2017: Building Visualizations and Dashboards with Amazon QuickSight (ABD206) Link
AWS re:Invent 2017: Deploying Business Analytics at Enterprise Scale with Amazon Qui (ABD311) Link

Overview of Exam Topics

Below is an outline of the topics that I have covered, and may come up for the exam. Pay extra attention to services like Kinesis, Redshift, EMR and in particular how it integrates with S3.

Kinesis Streams

KPL, KCL, Kinesis Agent, Kinesis API, Connector Library
Sharding, Retention period, autoscaling
Difference between Streams Vs SQS
Batching, Aggregation, Collection
KCL checkpointing
Monitoring and exceptions

Kinesis Firehose

Integration with S3/Redshift/ElasticSearch
Kinesis Agent, Kinesis API
Monitoring and exceptions

IOT — general knowledge

Data Pipeline — Integration with AWS services

S3 — Integration with AWS services

Glacier — general knowledge

DynamoDB

Integration with AWS services
Choice of Partition/Sort Key, LSI/GSI
Partitioning size
Throttling reads/writes, and mitigations
DynamoDB streams — general knowledge

Lambda — Integration with AWS services

EMR

Instance types, storage and compression
Consistent view
S3Distcp
Resizing and autoscaling a cluster
Hadoop ecosystem with Hive, HBase, Presto, Spark
Spark integration with Kinesis
File formats Text/Parquet/ORC/AVRO — general knowledge

Redshift

Node slice
Distribution styles
Sort Key
Data Types
Compression
Constraints
Workload Management / Queues
Data loading techniques, encryption and compression
Upsert
Vacuum and Deep Copy
Snapshots, Cross Region Snapshots, Restore from Snapshots

ElasticSearch — general knowledge

Data Visualisation

QuickSight
Zepplin, Jupyter, D3.js, Microstrategy — general knowledge

Athena / Glue — general knowledge

Machine Learning — general knowledge

Security

Data at rest/in-transit
SSE/CSE
KMS
Private Subnet / VPC endpoints
Redshift Security
EMR Security

Good luck all to your journey to AWS certification!

Originally posted in LinkedIn: https://www.linkedin.com/pulse/my-path-aws-big-data-speciality-certification-simon-lee/

My Path to AWS Big Data Speciality Certification

Enrol in Training Courses

Read the Documentations

Watch AWS Videos

Overview of Exam Topics

Written by Simon Lee

Responses (3)