Section 1
##### Introduction to Analytics

1

Introduction to Excel

2

Conditional Formatting

3

Data Summarization techniques

4

Graphical summary using SAS/GRAPH: Introduction to Bar graph

5

Graphical summary using SAS/GRAPH: Introduction to Pie graph

6

Graphical summary using SAS/GRAPH introduction to Histogram, Box plots, Scatter diagram

7

Descriptive Statistics-Introduction to various measures of Central Tendency

8

Introduction to the measures of Dispersion, Range, Mean Deviation , Standard Deviation

Section 2
##### Understanding Probability and Probability Distribution

1

Introduction to Probability theory

2

Types of probability distribution – Discrete Distribution and Continuous distribution

3

Understanding Probability Mass Function and Probability Density Function

4

Normal Distribution and Standard Normal Distribution

5

Normal plot using Proc GPLOT procedure in SAS

6

Application of Normal distribution in Analytics with real life examples

7

Binomial Distribution and Binomial plot using PROC GPLOT procedure in SAS

8

Poisson distribution and Poisson plot using Proc GPLOT procedure in SAS

9

Application of Binomial and Poisson distribution in Analytics with real life examples

Section 3
##### Introduction to Sampling Theory and Estimation

1

Concept of Population and Sample

2

Use of PROC SURVEYSELECT procedure in SAS

3

Introduction to Some important terminologies

4

Parameter and Statistic

5

Properties of a good estimator

6

Standard Deviation and Standard Error

7

Point and Interval Estimation

8

Confidence level and level of Significance

9

Constructing Confidence Intervals

10

Formulation of Null and Alternative hypothesis

11

Performing simple test of Hypothesis

Section 4

Section 5
##### Statistical Significance of T-Tests Chi Square Tests and Analysis of Variance

1

Performing test of one sample mean using Proc ttest

2

Difference between two group means (independent sample) using Proc ttest

3

difference between two group means (Paired sample) using Proc ttest

4

Performing Chi-square tests: Test of Independence

5

Performing one-way ANOVA with PROC ANOVA and PROC GLM procedure

6

Performing post-hoc multiple comparisons tests in PROC

7

GLM using Tukey’s mean test

Section 6
##### Introduction to Segmentation Techniques: Factor Analysis

1

Introduction to Factor Analysis and various techniques

2

Principal Component Analysis (PCA) and Exploratory Factor Analysis (EFA)

3

Application of Factor Analysis using Proc Factor procedure

4

KMO MSA test, Bartlett’s Test Sphericity

5

The Mineigen Criterion, Scree plot

6

Introduction to Factor Loading Matrix

7

Various rotation techniques like Varimax

Section 7
##### Introduction to Segmentation Techniques: Cluster Analysis

1

Introduction to Cluster Analysis and various techniques

2

Hierarchical and Non – Hierarchical Clustering techniques

3

Using Hierarchical Clustering by Proc Tree procedure in SAS

4

Performing K – means Clustering in SAS

5

Divisive Clustering, Agglomerative Clustering

6

Application of Cluster Analysis in Analytics with profiling of the clusters and interpretation of the clusters

Section 8
##### Correlation and Linear Regression

1

Introduction to Pearson’s Correlation coefficient using PROC CORR procedure

2

Correlation and Causation – Fitting a simple linear regression model with the Proc REG procedure

3

Understanding the concepts of Multiple Regression

4

Using automated model selection techniques in PROC REG to choose the best model

5

Interpretation of the model: overall fit of the model and finding out the influential variables

6

Linear Regression diagnostics

7

Examining Residual

8

Assessing Collinearity, Heteroskedasticity and Auto – Correlation

Section 9
##### Introduction to Categorical Data Analysis and Logistic Regression

1

Comparison between Liner Regression and Logistic Regression

2

Performing Logistic regression using Proc Logistic Procedure in SAS

3

Performing Goodness of ft of the model

4

Introduction to Percent Concordant, AIC, SC, and Hosmer – Lemeshow

5

Receiver Operating Characteristics (ROC) Curve and Area under Curve (AUC)

6

Interpretation of the model: overall fit of the model and finding out the influential variables using Odds ratio criteria

7

Using automated model selection techniques in PROC Logistic to choose the best model using AIC criteria

Section 10
##### Introduction to Time Series Analysis

1

What is Time series Analysis, Objectives and Assumptions of Time Series

2

Identifying pattern in Time series data: Decomposition of the time series data and general aspect of the analysis

3

Introduction to Various Smoothing techniques: Simple Moving Average, Weighted Moving Average, Exponential Smoothing, Holt’s Linear Exponential Smoothing

4

Examples of Seasonality and detecting Seasonality in Time series data

5

Introduction to Proc Forecast to generate forecast for time series data

6

Autoregressive models and Stepwise Autoregression (STEPAR) procedure

7

Autoregressive and Moving Average models and Introduction to Box Jenkins Methodology

8

Introduction to Autoregressive Moving Average (ARMA) model

9

Autoregressive Integrated Moving Average (ARIMA) model

10

Building an ARIMA Model

11

Detection of Stationarity, Seasonality in ARIMA Model

12

Detecting the order of AR and MA of ARIMA model by Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF)

13

Detecting the order by using AIC and BIC criterion

14

Estimation and forecast using Proc ARIMA in SAS

__Histogram__

__Quantitative Data __

__ __

Quantitative data refers to the data comprising of numerical observations like Sales, profits etc. The main techniques of presenting quantitative data are:

- Histogram
- Scatter Plot

In this section we would learn in depth about histograms and then see how we can create histograms in SAS.

__What is a Histogram?__

__ __

A histogram is a graphical representation of the distribution of data, which is an estimate of the probability distribution of a continuous variable, usually in bar graph form, and was first introduced by Karl Pearson in 1891.

The first step in creating a histogram is to divide the entire value range into a series of intervals called “bins” and then to “drop” the individual values into the bins that they belong to. The width of the bin is determined by the range and may or may not be equal to the other bins. If the bins are of equal width, then the height or vertical axis of the bar determines the frequency of the occurrence for that set, but if the bins are not of equal width, then the area of the bar or rectangle represents the frequency of occurrence while the vertical axis represents the density. In both cases, all the bars in the histogram touch to indicate that the variable or data is continuous.

This can be used to visualize data or phenomena with both a contiguous factor and an occurrence factor. For example, a histogram can be used to visualize the commute time of people going to work with the horizontal axis representing time, so the bins are divided according to time, while the vertical axis represents the number of people that fall under that specific travel time.

A histogram is a display of statistical information that uses rectangles to show the frequency of data items in successive numerical intervals of equal size. In the most common form of histogram, the independent variable is plotted along the horizontal axis and the dependent variable is plotted along the vertical axis. The data appears as colored or shaded rectangles of variable area.

Applications of Histograms

**Identifying the most common process outcome:**By simply collecting all data related to the final state of the process and organizing it in a histogram, any special trends will quickly become apparent.**Identifying data symmetry:**A histogram can help us in realising that whether a particular variable is symmetric (normal) or not. In Analytics it’s very important that the variables are all normally distributed, otherwise, we can’t apply any analytical technique on them**Spotting deviations:**the histogram is easily the most useful tool for spotting oddities and identifying worrying trends. Keeping a list of histograms that have been produced in the course of your work and referring back to it can further make things easy to analyze, as you will additionally know when a deviation is potentially caused by old issues, or by a recent change in your operations.**Spotting areas that require little effort**: Last but definitely not least, a histogram can be helpful in determining when you’re wasting too much effort or resources on a specific task. Sometimes, a certain part of your process will not require as much attention as you think it does, and a histogram depicting the current resource allocation can immediately reveal that.

Let’s now turn our focus on how we can create histograms in SAS.

**PROC UNIVARIATE **DATA=mylib.CANDY_SALES_SUMMARY;

VAR SALE_AMOUNT;

HISTOGRAM SALE_AMOUNT;

**RUN; **

This is the representation of quantitative data. The univariate keyword is used to generate all the key descriptive statistics related to a particular variable. Here, the variable under consideration is sale_amount. The code to generate histogram is histogram. If no dimension is mentioned then, it is by default, a 2-dimensional diagram.

**PROC UNIVARIATE **DATA=mylib.CANDY_SALES_SUMMARY;

VAR SALE_AMOUNT;

HISTOGRAM SALE_AMOUNT;

CLASS SUBCATEGORY;

**RUN; **

The univariate option in the code generates all the descriptive statistics associated with the variable sale_amount in the data set candy_sales_summary. Another objective of the code is to construct a histogram for the same variable using the key-word histogram. The total amount of sales is generated for each of the subcategories, which is specified using the keyword class.

The diagram below would be the output of the above code.