Section 1
##### Introduction to Analytics

Section 2
##### Understanding Probability and Probability Distribution

1

Introduction to Probability theory

2

Types of probability distribution – Discrete Distribution and Continuous distribution

3

Understanding Probability Mass Function and Probability Density Function

4

Normal Distribution and Standard Normal Distribution

5

Understanding Binomial Distribution and Poisson Distribution

6

Application on Binomial Distribution

7

Application on Normal Distribution

Section 3
##### Introduction to Sampling Theory and Estimation

1

Concept of Population and Sample

2

Introduction to Some important terminologies

3

Parameter and Statistic

4

Properties of a good estimator

5

Standard Deviation and Standard Error

6

Point and Interval Estimation

7

Confidence level and level of Significance

8

Constructing Confidence Intervals

9

Formulation of Null and Alternative hypothesis and performing simple test of Hypothesis

Section 4
##### Introduction to Segmentation Techniques: Factor Analysis

1

Introduction to Factor Analysis and various techniques

2

Principal Component Analysis (PCA) and Exploratory Factor Analysis (EFA)

3

KMO MSA test, Bartlett’s Test Sphericity

4

The Mineigen Criterion, Scree plot

5

Introduction to Factor Loading Matrix and various rotation techniques like Varimax

6

Application of the technique on a case study

7

Interpretation of the result

Section 5
##### Introduction to Segmentation Techniques: Cluster Analysis

1

Introduction to Cluster Analysis and various techniques

2

Hierarchical and Non – Hierarchical Clustering techniques

3

Using Hierarchical Clustering in R

4

Performing K – means Clustering in R

5

Divisive Clustering, Agglomerative Clustering

6

Application of Cluster Analysis in Analytics with Examples with profiling of the clusters and interpretation of the clusters

7

Application of the techniques on a case study

8

Interpretation of the result

Section 6
##### Correlation and Linear Regression

1

Introduction to Pearson’s Correlation coefficient

2

Correlation and Causation- Fitting a simple linear regression model

3

Introduction to CLRM

4

Assumptions of CLRM

5

Understanding the MLRM technique

6

Understanding the related statistic to linear regression

7

Goodness of fit test for linear regression

8

Importing dataset in R to apply linear regression

9

Splitting of dataset – Training and testing

10

Conducting several tests to understand the results obtained

11

Checking for the accuracy of the linear regression model

12

Assessing Collinearity, Heteroskedasticity and Auto – Correlation

Section 7
##### Introduction to categorical data analysis and Logistic Regression

1

Comparison between Liner Regression and Logistic Regression

2

Performing Goodness of fit test of the model

3

Introduction to Percent Concordant, AIC, SC, and Hosmer – Lemeshow

4

Receiver Operating Characteristics (ROC) Curve and Area under Curve (AUC)

5

Interpretation of the model: overall fit of the model and finding out the influential variables using Odds ratio criteria

6

Understanding the ROC testing

7

Checking for the accuracy of the model

8

Application and interpretation using case study

Section 8
##### Introduction to Time Series Analysis

1

What is Time series Analysis, Objectives and Assumptions of Time Series

2

Identifying pattern in Time series data: Decomposition of the time series data

3

Introduction to Various Smoothing techniques: Simple Moving Average, Weighted Moving Average

4

Exponential Smoothing, Holt’s Linear Exponential Smoothing Examples of Seasonality and detecting Seasonality in Time series data

5

Autoregressive and Moving Average models and Introduction to Box Jenkins Methodology

6

Introduction to Autoregressive Moving Average (ARMA) model and Autoregressive Integrated Moving Average (ARIMA) model

7

Building an ARIMA Model

8

Detection of Stationarity, Seasonality in ARIMA Model

9

Detecting the order of AR and MA of ARIMA model

10

Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF)

11

Detecting the order by using AIC and BIC criterion

12

Estimation and forecast using R

Section 9
##### Text Mining

1

Introduction to text mining

2

Importance of applying this technique

3

Package required in R to do text mining

4

Understanding WordCloud methodology

5

Performing text mining analysis using a data

6

Understanding the Sentiment Analysis

7

Application of the technique on a dataset

8

Interpretation of the result

Section 10
##### Market Basket Analysis

Section 11
##### Statistical Significance T Test Chi Square Tests and Analysis of Variance

1

Performing test of one sample mean

2

Difference between two group means (independent sample)

3

Difference between two group means (Paired sample)

4

Performing Chi square tests: Test of Independence

5

Descriptive statistics and inferential statistics

6

T-tests and it’s application on case studies

7

ANOVA testing and its application on case studies

8

Interpretation of the test results

9

Chi-square test of independence

10

Test for correlation and partial-correlation test

11

Performing post-hoc multiple comparisons tests in R using Tukey HSD

12

Performing two-way ANOVA with and without interactions

Descriptive Statistics is the discipline of quantitatively describing the main features of a given data set. It provides simple summary measures about the sample about the observations that has been made in the set. These summary measures may form the basis of the initial description of a data as a part of a more extensive statistical analysis or they may suffice in themselves for some particular statistical investigations.

The most commonly used descriptive statistics in statistical analysis are:

**Measures of Central Tendency**, which yield a representative value for a set of observations.**Measures of Dispersion**, which show how different are the observations in a given data set different from the central value on an average.**Measures examining the shape of a given data distribution**.**Measures aimed at examining the most unusual observations of a data set**.

In this module we’ll learn about the different measures of central tendency.

**Measures of Central Tendency**

Central tendency refers to the propensity of quantitative data to cluster around a particular value. The particular value around which the observations in the data set fluctuate is called the central value. It is a representative value of the set of given observations. The objective of the analyst is to find out functional forms based on the observations of the data set which would give a ‘good’ representative central value. Such functional forms are known as measures of central tendency. The most widely used measures of central tendency are: Mean, Median and Mode.

For Example;

The vice president of marketing of a fast – food chain is studying the sales performance of the 100 stores in the eastern part of the country. He would be looking at the distribution with an eye toward getting information about the central tendency to compare the eastern part with other parts of country. Central tendency is basically the central most value of a distribution. Now how do we know which one is the central most value?

There are precisely three ways to find the central value: Mean, Median and Mode.

**Mean**

As a measure of central tendency, Mean gives the average value of a set of observations. The idea of average is a familiar one. Suppose, we say ‘Germans live longer than Indians’. This does not mean that every Germans live longer than every Indians. All we mean is that the longevity of a typical German is more than the average longevity of a typical Indian.

**Properties**

- The sum of the deviations of the given values of a variable from its mean is necessarily zero
- The Arithmetic Mean of a sample of observations depends both on the change of scale and origin.
- The combined Arithmetic Mean for two groups of with mean x1 and x2 and with n1 and n2 observations respectively is defined by

**Median**

Median of a set of statistical observations is the middlemost value of a data set when they are arranged in the increasing order of the magnitude. Median is that value of the variable which divides the group into two equal parts, one comprising all the values greater and the other, all values less than median.

**Properties**

Based on the construction and the nature of operation Median, as a measure of central tendency exhibits the following important properties:

- Median obeys linearity, i.e. it depends simultaneously on the change of scale and origin.
- The combined median of the two groups lies in between the median of the two individual groups.

**Mode**

Mode, for a given set of observations is that value of the variable, where the variable occurs with considered to represent the true characteristics of a frequency distribution and it is referred as the most typical or the fashionable value of the variate.