Section 1
##### Introduction to Analytics

1

Introduction to Excel

2

Conditional Formatting

3

Data Summarization techniques

4

Graphical summary using SAS/GRAPH: Introduction to Bar graph

5

Graphical summary using SAS/GRAPH: Introduction to Pie graph

6

Graphical summary using SAS/GRAPH introduction to Histogram, Box plots, Scatter diagram

7

Descriptive Statistics-Introduction to various measures of Central Tendency

8

Introduction to the measures of Dispersion, Range, Mean Deviation , Standard Deviation

Section 2
##### Understanding Probability and Probability Distribution

1

Introduction to Probability theory

2

Types of probability distribution – Discrete Distribution and Continuous distribution

3

Understanding Probability Mass Function and Probability Density Function

4

Normal Distribution and Standard Normal Distribution

5

Normal plot using Proc GPLOT procedure in SAS

6

Application of Normal distribution in Analytics with real life examples

7

Binomial Distribution and Binomial plot using PROC GPLOT procedure in SAS

8

Poisson distribution and Poisson plot using Proc GPLOT procedure in SAS

9

Application of Binomial and Poisson distribution in Analytics with real life examples

Section 3
##### Introduction to Sampling Theory and Estimation

1

Concept of Population and Sample

2

Use of PROC SURVEYSELECT procedure in SAS

3

Introduction to Some important terminologies

4

Parameter and Statistic

5

Properties of a good estimator

6

Standard Deviation and Standard Error

7

Point and Interval Estimation

8

Confidence level and level of Significance

9

Constructing Confidence Intervals

10

Formulation of Null and Alternative hypothesis

11

Performing simple test of Hypothesis

Section 4

Section 5
##### Statistical Significance of T-Tests Chi Square Tests and Analysis of Variance

1

Performing test of one sample mean using Proc ttest

2

Difference between two group means (independent sample) using Proc ttest

3

difference between two group means (Paired sample) using Proc ttest

4

Performing Chi-square tests: Test of Independence

5

Performing one-way ANOVA with PROC ANOVA and PROC GLM procedure

6

Performing post-hoc multiple comparisons tests in PROC

7

GLM using Tukey’s mean test

Section 6
##### Introduction to Segmentation Techniques: Factor Analysis

1

Introduction to Factor Analysis and various techniques

2

Principal Component Analysis (PCA) and Exploratory Factor Analysis (EFA)

3

Application of Factor Analysis using Proc Factor procedure

4

KMO MSA test, Bartlett’s Test Sphericity

5

The Mineigen Criterion, Scree plot

6

Introduction to Factor Loading Matrix

7

Various rotation techniques like Varimax

Section 7
##### Introduction to Segmentation Techniques: Cluster Analysis

1

Introduction to Cluster Analysis and various techniques

2

Hierarchical and Non – Hierarchical Clustering techniques

3

Using Hierarchical Clustering by Proc Tree procedure in SAS

4

Performing K – means Clustering in SAS

5

Divisive Clustering, Agglomerative Clustering

6

Application of Cluster Analysis in Analytics with profiling of the clusters and interpretation of the clusters

Section 8
##### Correlation and Linear Regression

1

Introduction to Pearson’s Correlation coefficient using PROC CORR procedure

2

Correlation and Causation – Fitting a simple linear regression model with the Proc REG procedure

3

Understanding the concepts of Multiple Regression

4

Using automated model selection techniques in PROC REG to choose the best model

5

Interpretation of the model: overall fit of the model and finding out the influential variables

6

Linear Regression diagnostics

7

Examining Residual

8

Assessing Collinearity, Heteroskedasticity and Auto – Correlation

Section 9
##### Introduction to Categorical Data Analysis and Logistic Regression

1

Comparison between Liner Regression and Logistic Regression

2

Performing Logistic regression using Proc Logistic Procedure in SAS

3

Performing Goodness of ft of the model

4

Introduction to Percent Concordant, AIC, SC, and Hosmer – Lemeshow

5

Receiver Operating Characteristics (ROC) Curve and Area under Curve (AUC)

6

Interpretation of the model: overall fit of the model and finding out the influential variables using Odds ratio criteria

7

Using automated model selection techniques in PROC Logistic to choose the best model using AIC criteria

Section 10
##### Introduction to Time Series Analysis

1

What is Time series Analysis, Objectives and Assumptions of Time Series

2

Identifying pattern in Time series data: Decomposition of the time series data and general aspect of the analysis

3

Introduction to Various Smoothing techniques: Simple Moving Average, Weighted Moving Average, Exponential Smoothing, Holt’s Linear Exponential Smoothing

4

Examples of Seasonality and detecting Seasonality in Time series data

5

Introduction to Proc Forecast to generate forecast for time series data

6

Autoregressive models and Stepwise Autoregression (STEPAR) procedure

7

Autoregressive and Moving Average models and Introduction to Box Jenkins Methodology

8

Introduction to Autoregressive Moving Average (ARMA) model

9

Autoregressive Integrated Moving Average (ARIMA) model

10

Building an ARIMA Model

11

Detection of Stationarity, Seasonality in ARIMA Model

12

Detecting the order of AR and MA of ARIMA model by Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF)

13

Detecting the order by using AIC and BIC criterion

14

Estimation and forecast using Proc ARIMA in SAS

**Qualitative Data
**This kind of data comprises of attributes and qualitative variables like Age, Gender, race etc. Some important techniques to represent qualitative data are:

- Bar Charts
- Pie-Charts

**Quantitative Data
**Quantitative data refers to the data comprising of numerical observations like Sales, profits etc. The main techniques of presenting quantitative data are:

- Histogram
- Scatter Plot

**Bar Charts
**A bar graph or a bar chart is used to represent data visually using bars of different heights or lengths. Data is graphed either horizontally or vertically, allowing viewers to compare different values and draw conclusions quickly and easily. A typical bar graph will have a label, axis, scales, and bars, which represent measurable values such as amounts or percentages. Bar graphs are used to display all kinds of data, from quarterly sales and job growth to seasonal rainfall and crop yields.

The bars on a bar graph may be the same color, though different colors are sometimes used to distinguish between groups or categories to make the data easier to read and interpret. Bar graphs have a labeled x-axis (horizontal axis) and y-axis (vertical axis). When experimental data is graphed, the independent variable is graphed on the x-axis, while the dependent variable is graphed on the y-axis.

**Type of Bar Graphs
**Bar graphs take different forms depending on the type and complexity of the data they represent. They can be as simple, in some cases, as two bars, such as a graph representing the vote totals of two competing political candidates. As the information becomes more complex, so will the graph, which may even take the form of a grouped or clustered bar graph or a stacked bar graph.

**Single:** Single bar graphs are used to convey the discrete value of the item for each category shown on the opposing axis. An example would be a representation of the number of males in grades 4-6 for each of the years 1995 to 2010. The actual number (discrete value) could be represented by a bar sized to scale, with the scale appearing on the X-axis. The Y-axis would display the corresponding years. The longest bar on the graph would represent the year from 1995 to 2010 in which the number of males in grades 4-6 reached its greatest value. The shortest bar would represent the year in which the number of males in grades 4-6 reached its lowest value.

**Grouped:** A grouped or clustered bar graph is used to represent discrete values for more than one item that share the same category. In the single bar graph example above, only one item (the number of males in grades 4-6) is represented. But one could very easily modify the graph by adding a second value that includes the number of females in grades 4-6. The bars representing each gender by year would be grouped together and color-coded to make it clear which bars represent the male and female values. This grouped bar graph would then allow readers to easily compare the number of students enrolled in grades 4-6 both by year and by gender.

**Stacked:** Some bar graphs have each bar divided into subparts that represent the discrete values for items that constitute a portion of the whole group. For instance, in the examples above, students in grades 4-6 are grouped together and represented by a single bar. This bar could be broken into subsections to represent the proportion of students in each grade. Again, color coding would be needed to make the graph readable.

In order to construct bar charts in SAS, we will have to import the datasets folder in SAS, using the given code.

Libname mylib “E:\Study Material\Datasets”;

**PROC GCHART **DATA = MYLIB.CANDY_SALES_SUMMARY;

HBAR3D SUBCATEGORY;

**RUN;
**

The above code will create a horizontal 3D bar chart on the basis of the variable called “Subcategory”. This variable belongs to the data set called “Candy_Sales_Summary” (present in the “mylib” library). hbar3d is the option for generating the horizontal 3d bar graph. This form of representing the data is useful when we are representing a spatial data.

**PROC GCHART **DATA = mylib.CANDY_SALES_SUMMARY;

VBAR3D SUBCATEGORY/SUM SUMVAR=SALE_AMOUNT;

**RUN;
**

This code generates a 3d vertical bar graph for the variable subcategory. But, corresponding to each vertical bar graph for the subcategory it gives the total sale amount on top of each of the vertical bar.

**PROC GCHART **DATA = MYLIB.CANDY_SALES_SUMMARY;

VBAR3D FISCAL_YEAR/SUM SUMVAR=SALE_AMOUNT

GROUP=CATEGORY SUBGROUP=SUBCATEGORY;

**RUN;
**

This code generates a sub-divided multiple bar diagram. The group generates the bar diagram corresponding to the category and show the sales corresponding to each subcategory for a given fiscal year.