Section 1
##### Introduction to Analytics

1

Introduction to Excel

2

Conditional Formatting

3

Data Summarization techniques

4

Graphical summary using SAS/GRAPH: Introduction to Bar graph

5

Graphical summary using SAS/GRAPH: Introduction to Pie graph

6

Graphical summary using SAS/GRAPH introduction to Histogram, Box plots, Scatter diagram

7

Descriptive Statistics-Introduction to various measures of Central Tendency

8

Introduction to the measures of Dispersion, Range, Mean Deviation , Standard Deviation

Section 2
##### Understanding Probability and Probability Distribution

1

Introduction to Probability theory

2

Types of probability distribution – Discrete Distribution and Continuous distribution

3

Understanding Probability Mass Function and Probability Density Function

4

Normal Distribution and Standard Normal Distribution

5

Normal plot using Proc GPLOT procedure in SAS

6

Application of Normal distribution in Analytics with real life examples

7

Binomial Distribution and Binomial plot using PROC GPLOT procedure in SAS

8

Poisson distribution and Poisson plot using Proc GPLOT procedure in SAS

9

Application of Binomial and Poisson distribution in Analytics with real life examples

Section 3
##### Introduction to Sampling Theory and Estimation

1

Concept of Population and Sample

2

Use of PROC SURVEYSELECT procedure in SAS

3

Introduction to Some important terminologies

4

Parameter and Statistic

5

Properties of a good estimator

6

Standard Deviation and Standard Error

7

Point and Interval Estimation

8

Confidence level and level of Significance

9

Constructing Confidence Intervals

10

Formulation of Null and Alternative hypothesis

11

Performing simple test of Hypothesis

Section 4

Section 5
##### Statistical Significance of T-Tests Chi Square Tests and Analysis of Variance

1

Performing test of one sample mean using Proc ttest

2

Difference between two group means (independent sample) using Proc ttest

3

difference between two group means (Paired sample) using Proc ttest

4

Performing Chi-square tests: Test of Independence

5

Performing one-way ANOVA with PROC ANOVA and PROC GLM procedure

6

Performing post-hoc multiple comparisons tests in PROC

7

GLM using Tukey’s mean test

Section 6
##### Introduction to Segmentation Techniques: Factor Analysis

1

Introduction to Factor Analysis and various techniques

2

Principal Component Analysis (PCA) and Exploratory Factor Analysis (EFA)

3

Application of Factor Analysis using Proc Factor procedure

4

KMO MSA test, Bartlett’s Test Sphericity

5

The Mineigen Criterion, Scree plot

6

Introduction to Factor Loading Matrix

7

Various rotation techniques like Varimax

Section 7
##### Introduction to Segmentation Techniques: Cluster Analysis

1

Introduction to Cluster Analysis and various techniques

2

Hierarchical and Non – Hierarchical Clustering techniques

3

Using Hierarchical Clustering by Proc Tree procedure in SAS

4

Performing K – means Clustering in SAS

5

Divisive Clustering, Agglomerative Clustering

6

Application of Cluster Analysis in Analytics with profiling of the clusters and interpretation of the clusters

Section 8
##### Correlation and Linear Regression

1

Introduction to Pearson’s Correlation coefficient using PROC CORR procedure

2

Correlation and Causation – Fitting a simple linear regression model with the Proc REG procedure

3

Understanding the concepts of Multiple Regression

4

Using automated model selection techniques in PROC REG to choose the best model

5

Interpretation of the model: overall fit of the model and finding out the influential variables

6

Linear Regression diagnostics

7

Examining Residual

8

Assessing Collinearity, Heteroskedasticity and Auto – Correlation

Section 9
##### Introduction to Categorical Data Analysis and Logistic Regression

1

Comparison between Liner Regression and Logistic Regression

2

Performing Logistic regression using Proc Logistic Procedure in SAS

3

Performing Goodness of ft of the model

4

Introduction to Percent Concordant, AIC, SC, and Hosmer – Lemeshow

5

Receiver Operating Characteristics (ROC) Curve and Area under Curve (AUC)

6

Interpretation of the model: overall fit of the model and finding out the influential variables using Odds ratio criteria

7

Using automated model selection techniques in PROC Logistic to choose the best model using AIC criteria

Section 10
##### Introduction to Time Series Analysis

1

What is Time series Analysis, Objectives and Assumptions of Time Series

2

Identifying pattern in Time series data: Decomposition of the time series data and general aspect of the analysis

3

Introduction to Various Smoothing techniques: Simple Moving Average, Weighted Moving Average, Exponential Smoothing, Holt’s Linear Exponential Smoothing

4

Examples of Seasonality and detecting Seasonality in Time series data

5

Introduction to Proc Forecast to generate forecast for time series data

6

Autoregressive models and Stepwise Autoregression (STEPAR) procedure

7

Autoregressive and Moving Average models and Introduction to Box Jenkins Methodology

8

Introduction to Autoregressive Moving Average (ARMA) model

9

Autoregressive Integrated Moving Average (ARIMA) model

10

Building an ARIMA Model

11

Detection of Stationarity, Seasonality in ARIMA Model

12

Detecting the order of AR and MA of ARIMA model by Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF)

13

Detecting the order by using AIC and BIC criterion

14

Estimation and forecast using Proc ARIMA in SAS

**Measures of Dispersion**

Dispersion is the measure of the extent to which the individual observation in a data set varies. It relates to those measures which capture the degree of heterogeneity of a set of statistical observation from a central value. Measuring heterogeneity involves construction of estimators, which pro-vide a standard or a representative value of the scatterings, as a function of all the sample observation. But, the heterogeneity of the data affects the efficiency of the estimator adversely, i.e. greater the dispersion in a data set lesser is the efficiency of the estimator. Therefore, to form an estimator of sufficient efficiency it is necessary to form an idea of the dispersion present in the data. The main classes of the measures of dispersion are:

*Absolute measure of Dispersion**Relative measure of Dispersion*

**Absolute Measure of Dispersion
**Absolute measures of dispersion refer to those measures of dispersion which depend on units of measurement. Hence, if the variability of two or more distributions with the same unit of measurement is to be compared then the absolute measures are helpful. The three main absolute measures of dispersion are:

*Range**Mean Deviation**Standard Deviation*

**Range
**The range of a set of statistical observations is defined as the highest and the lowest values in the set. This is the simplest method of measuring dispersion. Range is defined as: Range (X) = Xmax – Xmin where Xmax = Maximum value of the variable X, Xmin = Minimum value of the set X, X is a set containing observations x1, x2 …xn. Range can compare the variability of two or more distributions with the same units of measurement, but to compare the variability of the distribution given in the different units of measurement, the formula of range cannot be used.

**Mean Deviation
**Mean Deviation is defined as the arithmetic average of the deviations of various items from a measure of central tendency, may be mean, median or mode. Generally, mean deviation is calculated either from mean or median. Mean Deviation can also be calculated about any arbitrary average A.

**Standard Deviation
**Standard Deviation is considered to be an improvement over the mean deviation, since the former gets rid of signs, by taking instead of the absolute value of the deviation, the squares of the deviation of the variable about A. Standard Deviation is defined as: The positive square-root of the arithmetic mean of these quantities, i.e. it is the root-mean-squared deviation about A. The Standard Deviation is measured about the arithmetic mean of the data set since standard deviation is the least about mean. This is a striking feature of the measure of Standard deviation as a measure of dispersion.

**Relative Measure of Dispersion
**Relative Measures of Dispersion is defined as: Measures independent of the units of measurement and used for comparing dispersions of two or more distributions given in different units. Some of the most important measures of Relative dispersion are:

*Co-efficient of Range**Co-efficient of Variation**Co-efficient of Mean Deviation*

**Co-efficient of Range
**The compare the variability of a distribution with another, where the units of measurements are given in different units, it is not possible to use the absolute measure, range. The relative version of measuring the variability between the distributions is called the Coefficient of Range. Coefficient of Range is the ratio of difference between two extreme observations of their distribution to their sum.

**Co-efficient of Variation
**The Relative Measures of Dispersion based on Standard Deviation is called the Coefficient of Variation. This is a pure number independent of the units of measurement, and thus, it is suitable for comparing the variability, homogeneity or uniformity of two or more distributions. A distribution with smaller C.V. is said to be more homogeneous or less variable than the other, and a distribution with more C.V. is said to be more heterogeneous.

** Co-efficient of Mean Deviation
**The Coefficient of Mean Deviation is the relative measure associated with Mean-Deviation. It is de-fined as the ratio of the Mean Deviation and the Average about which it has been calculated.