Certified Big Data Science Analyst (CBDSA)

Introduction and Overview

Business Analytics is the emerging and fastest growing technology which every organization is embracing. As per Gartner’s prediction by 2014, 30 percent of analytic applications will use proactive, predictive and forecasting capabilities, and the software market for business intelligence, analytics and corporate performance management grew by 13.4% in 2010 to $10.5 billion and would continue to grow. On the other hand, modern day businesses accumulate an astonishing amount of digital data, which can be leveraged to unlock new sources of economic value or to provide fresh insights into business trends.

Big Data Analytics delivers competitive advantage in two ways compared to the traditional analytical model. First, Big Data Analytics describes the efficient use of a simple model applied to volumes of data that would be too large for the traditional analytical environment. Research suggests that a simple algorithm with a large volume of data is more accurate than a sophisticated algorithm with little data. The algorithm is not the competitive advantage; the ability to apply it to huge amounts of data—without compromising performance—generates the competitive edge. Second, Big Data Analytics refers to the sophistication of the model itself. Increasingly, analysis algorithms are provided directly by database management system (DBMS) vendors. To pull away from the pack, companies must go well beyond what is provided and innovate by using newer, more sophisticated statistical analysis.

Certified Big Data Science Analyst (CBDSA) course is specialized to cover the concept of business analytics and big data technologies with its strategic importance to any organization. Participants will be introduced to the concept of business analytics with big data technologies: Hadoop, Hive and HBase. The course deals with basic principles, concepts, and techniques/tools used for big data and business analytics, which includes data mining, Hadoop, HDFS & MapReduce, Apache HBase and Apache Hive. Also, this course covers different types of business analytics with real life use cases including association rule mining and regression models. Participants will get good picture of all these concepts and how they all are interconnected to each other in organizational context.


  • 40 Hours (5 Days) Classroom Training

Who Should Attend?

  • Data Analyst – Statistics and Mining
  • Big Data Analyst
  • Operations Research Analyst
  • Senior Data Analyst- Statistics and Mining
  • Data Scientist


Participants are recommended to have preferably min. 2 years of experience in software development with Java/Unix/Linux environment and good understanding on data/business analytics.

Assessment and Certification

  • Component 1: Written Examination (MCQ)
    • 40 Questions
    • 1 Hour duration
    • Closed Book
    • Score 70% to pass
  • Component 2: Project Work Component (PWC)
    • Individual work
    • 2 weeks to complete from the last day of course
    • Score 70% to pass
  • Certification
    • Upon passing the course, you will be awarded “Certified Big Data Science Analyst
    • Certification body – Global Science and Technology Forum


Course Outcome

  • Understand business analytics and big data technologies with its impact on enterprises
  • Learning data mining concepts, techniques through an open source DM tool
  • Understand the role of big data technologies (Hadoop, HBase, Hive) in business analytics
  • Acquire the knowledge and learn to use Hadoop (HDFS and MapReduce), HBase and Hive

Course Outline

Introduction to Business Analytics

  • The concept of Business Analytics
  • Data, Information, Knowledge and Wisdom
  • Data as Unique Enterprise Asset
  • Data, Information and Analytics Lifecycle
  • Business Analytics – Current Context
  • Types of Analytics
    • Descriptive Analytics
    • Predictive Analytics
    • Prescriptive Analytics

Data/Information Architecture for Business Analytics

  • Data/Information Architecture
  • Concept of Data Warehouse/Enterprise Data Warehouse (EDW)
  • ETL – Key Process
  • Concept of Data Mart
  • Business Intelligence
  • Data Mining

Data Mining Tool

  • Understand the open source DM tool RapidMiner
  • Explore the various features of RapidMiner
  • Walkthrough a RapidMiner demo with different scenarios

Data Mining Techniques

  • Understand the various data mining techniques
  • Understand how correlation matrix works
  • Understand how association rule mining works
  • Understanding the Predictive Analytics technique
  • Understand the forecasting technique

Introduction to Big Data

  • What is Big Data? Why Big Data?
  • 3V’s of Big Data
  • The Rapid Growth of Unstructured Data
  • Big Data Market Forecast
  • Big Data Analytics
  • Big Data in Business
  • Big Data Types & Architecture

Introduction to Hadoop

  • Big Data – Current Industry Trends
  • Why Process Big Data?
  • Challenges in Data Processing
  • Why Hadoop?
  • What is Hadoop offering?
  • Hadoop Network Structure
  • Hadoop Eco-System
  • Hadoop Core Components
  • Hadoop – Features
  • Hadoop – Relevance
  • Hadoop in Action

Hadoop HDFS & MapReduce

  • Hadoop HDFS
    • What does HDFS Facilitate?
    • HDFS Architecture
    • Hadoop Network and Server Infrastructure
    • NameNode, Secondary NameNode and DataNode
    • Ensuring Data Correctness
    • Data Pipelining while Loading Data
    • fs Operations
  • Hadoop MapReduce
    • MapReduce Conceptualization
    • MapReduce – Overview
    • MapReduce – Programming Model
    • MapReduce – Execution Overview
    • Hadoop – Application Examples
    • Word Count – Example

Apache HBase

  • What is HBase?
  • HBase Architecture
  • ZooKeeper
  • HBase Data model
  • HBase Deployment
  • HBase Cluster Architecture
  • Indexes in HBase
  • Scaling HBase
  • Data Locality, Coherence and Concurrency, Fault Tolerance
  • Hadoop Integration
  • High-Level Architecture
  • Replication of Data Across Data Centres
  • HBase Applications
  • Advantages and Disadvantages

Apache Hive

  • What is Hive?
  • Why Hive?
  • Where to use Hive?
  • Hive Architecture
  • Hive: Benefits
  • Hive: Tradeoffs
  • Hive: Real world Examples