Introduction and Overview
“Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few.” – IBM
Business Analytics is the emerging and fastest growing technology which every organization is embracing. As per Gartner’s prediction by 2014, 30 percent of analytic applications will use proactive, predictive and forecasting capabilities, and the software market for business intelligence, analytics and corporate performance management grew by 13.4% in 2010 to $10.5 billion and would continue to grow. On the other hand, modern day businesses accumulate an astonishing amount of digital data, which can be leveraged to unlock new sources of economic value or to provide fresh insights into business trends.
Big Data Analytics delivers competitive advantage in two ways compared to the traditional analytical model. First, Big Data Analytics describes the efficient use of a simple model applied to volumes of data that would be too large for the traditional analytical environment. Research suggests that a simple algorithm with a large volume of data is more accurate than a sophisticated algorithm with little data. The algorithm is not the competitive advantage; the ability to apply it to huge amounts of data—without compromising performance—generates the competitive edge. Second, Big Data Analytics refers to the sophistication of the model itself. Increasingly, analysis algorithms are provided directly by database management system (DBMS) vendors. To pull away from the pack, companies must go well beyond what is provided and innovate by using newer, more sophisticated statistical analysis.
The role of a Certified Big Data Science Analyst (CBDSA):
A big data science analyst’s main objective is to organize and analyze large amounts of data, often using software specifically designed for the task. The result of the analysis needs to be simple enough for all invested stakeholders to understand — especially those working outside of IT.
A data scientist’s approach to data analysis depends on their industry and the specific needs of the business or department they are working for. Before a data scientist can find meaning in structured or unstructured data, business leaders and department managers must communicate what they’re looking for. As such, a data scientist must have enough business domain expertise to translate company goals into data-based deliverables such as prediction engines, pattern detection analysis or even optimization algorithms.
Certified Big Data Science Analyst (CBDSA) course covers the concept of Business Analytics and Big Data technologies with its strategic importance to any organization. Participants will be introduced to the concept of business analytics with big data technologies: Hadoop, Hive and HBase. The course deals with basic principles, concepts, and techniques/tools used for big data and business analytics, which includes data mining, Hadoop, HDFS & MapReduce, Apache HBase and Apache Hive. Also, this course covers different types of business analytics with real life use cases including association rule mining and regression models. Participants will get good picture of all these concepts and how they all are interconnected to each other in organizational context.
- 40 Hours (5 Days) Classroom Training
Who Should Attend?
- Data Analyst – Statistics and Mining
- Big Data Analyst
- Operations Research Analyst
- Senior Data Analyst- Statistics and Mining
- Data Scientist
Participants are recommended to have preferably min. 2 years of experience in software development with Java/Unix/Linux environment and good understanding on data/business analytics.
Assessment and Certification
- Component 1: Written Examination (MCQ)
- 40 Questions
- 1 Hour duration
- Closed Book
- Score 70% to pass
- Component 2: Project Work Component (PWC)
- Individual work
- 2 weeks to complete from the last day of course
- Score 70% to pass
- Upon passing the course, you will be awarded “Certified Big Data Science Analyst”
- Certification body – Global Science and Technology Forum
- PIC Grant (IRAS)
- Get 60% Cash Payout, or 400% Tax Rebate on the total Training Amount Spent by the Company
- Find out more about the PIC Grant, available for training your employees and improving their productivity on IRAS website http://www.iras.gov.sg/irashome/PIcredit.aspx%23About_Productivity_and_Innovation_Credit.
- Understand business analytics and big data technologies with its impact on enterprises
- Learning data mining concepts, techniques through an open source DM tool
- Understand the role of big data technologies (Hadoop, HBase, Hive) in business analytics
- Acquire the knowledge and learn to use Hadoop (HDFS and MapReduce), HBase and Hive
Introduction to Business Analytics
- The concept of Business Analytics
- Data, Information, Knowledge and Wisdom
- Data as Unique Enterprise Asset
- Data, Information and Analytics Lifecycle
- Business Analytics – Current Context
- Types of Analytics
- Descriptive Analytics
- Predictive Analytics
- Prescriptive Analytics
Data/Information Architecture for Business Analytics
- Data/Information Architecture
- Concept of Data Warehouse/Enterprise Data Warehouse (EDW)
- ETL – Key Process
- Concept of Data Mart
- Business Intelligence
- Data Mining
Data Mining Tool
- Understand the open source DM tool RapidMiner
- Explore the various features of RapidMiner
- Walkthrough a RapidMiner demo with different scenarios
Data Mining Techniques
- Understand the various data mining techniques
- Understand how correlation matrix works
- Understand how association rule mining works
- Understanding the Predictive Analytics technique
- Understand the forecasting technique
Introduction to Big Data
- What is Big Data? Why Big Data?
- 3V’s of Big Data
- The Rapid Growth of Unstructured Data
- Big Data Market Forecast
- Big Data Analytics
- Big Data in Business
- Big Data Types & Architecture
Introduction to Hadoop
- Big Data – Current Industry Trends
- Why Process Big Data?
- Challenges in Data Processing
- Why Hadoop?
- What is Hadoop offering?
- Hadoop Network Structure
- Hadoop Eco-System
- Hadoop Core Components
- Hadoop – Features
- Hadoop – Relevance
- Hadoop in Action
Hadoop HDFS & MapReduce
- Hadoop HDFS
- What does HDFS Facilitate?
- HDFS Architecture
- Hadoop Network and Server Infrastructure
- NameNode, Secondary NameNode and DataNode
- Ensuring Data Correctness
- Data Pipelining while Loading Data
- fs Operations
- Hadoop MapReduce
- MapReduce Conceptualization
- MapReduce – Overview
- MapReduce – Programming Model
- MapReduce – Execution Overview
- Hadoop – Application Examples
- Word Count – Example
- What is HBase?
- HBase Architecture
- HBase Data model
- HBase Deployment
- HBase Cluster Architecture
- Indexes in HBase
- Scaling HBase
- Data Locality, Coherence and Concurrency, Fault Tolerance
- Hadoop Integration
- High-Level Architecture
- Replication of Data Across Data Centres
- HBase Applications
- Advantages and Disadvantages
- What is Hive?
- Why Hive?
- Where to use Hive?
- Hive Architecture
- Hive: Benefits
- Hive: Tradeoffs
- Hive: Real world Examples