Section 1
##### Introduction to Analytics

1

Introduction to Excel

2

Conditional Formatting

3

Data Summarization techniques

4

Graphical summary using SAS/GRAPH: Introduction to Bar graph

5

Graphical summary using SAS/GRAPH: Introduction to Pie graph

6

Graphical summary using SAS/GRAPH introduction to Histogram, Box plots, Scatter diagram

7

Descriptive Statistics-Introduction to various measures of Central Tendency

8

Introduction to the measures of Dispersion, Range, Mean Deviation , Standard Deviation

Section 2
##### Understanding Probability and Probability Distribution

9

Introduction to Probability theory

10

Types of probability distribution – Discrete Distribution and Continuous distribution

11

Understanding Probability Mass Function and Probability Density Function

12

Normal Distribution and Standard Normal Distribution

13

Normal plot using Proc GPLOT procedure in SAS

14

Application of Normal distribution in Analytics with real life examples

15

Binomial Distribution and Binomial plot using PROC GPLOT procedure in SAS

16

Poisson distribution and Poisson plot using Proc GPLOT procedure in SAS

17

Application of Binomial and Poisson distribution in Analytics with real life examples

Section 3
##### Introduction to Sampling Theory and Estimation

18

Concept of Population and Sample

19

Use of PROC SURVEYSELECT procedure in SAS

20

Introduction to Some important terminologies

21

Parameter and Statistic

22

Properties of a good estimator

23

Standard Deviation and Standard Error

24

Point and Interval Estimation

25

Confidence level and level of Significance

26

Constructing Confidence Intervals

27

Formulation of Null and Alternative hypothesis

28

Performing simple test of Hypothesis

Section 4

Section 5
##### Statistical Significance of T-Tests Chi Square Tests and Analysis of Variance

29

Performing test of one sample mean using Proc ttest

30

Difference between two group means (independent sample) using Proc ttest

31

difference between two group means (Paired sample) using Proc ttest

32

Performing Chi-square tests: Test of Independence

33

Performing one-way ANOVA with PROC ANOVA and PROC GLM procedure

34

Performing post-hoc multiple comparisons tests in PROC

35

GLM using Tukey’s mean test

Section 6
##### Introduction to Segmentation Techniques: Factor Analysis

36

Introduction to Factor Analysis and various techniques

37

Principal Component Analysis (PCA) and Exploratory Factor Analysis (EFA)

38

Application of Factor Analysis using Proc Factor procedure

39

KMO MSA test, Bartlett’s Test Sphericity

40

The Mineigen Criterion, Scree plot

41

Introduction to Factor Loading Matrix

42

Various rotation techniques like Varimax

Section 7
##### Introduction to Segmentation Techniques: Cluster Analysis

43

Introduction to Cluster Analysis and various techniques

44

Hierarchical and Non – Hierarchical Clustering techniques

45

Using Hierarchical Clustering by Proc Tree procedure in SAS

46

Performing K – means Clustering in SAS

47

Divisive Clustering, Agglomerative Clustering

48

Application of Cluster Analysis in Analytics with profiling of the clusters and interpretation of the clusters

Section 8
##### Correlation and Linear Regression

49

Introduction to Pearson’s Correlation coefficient using PROC CORR procedure

50

Correlation and Causation – Fitting a simple linear regression model with the Proc REG procedure

51

Understanding the concepts of Multiple Regression

52

Using automated model selection techniques in PROC REG to choose the best model

53

Interpretation of the model: overall fit of the model and finding out the influential variables

54

Linear Regression diagnostics

55

Examining Residual

56

Assessing Collinearity, Heteroskedasticity and Auto – Correlation

Section 9
##### Introduction to Categorical Data Analysis and Logistic Regression

57

Comparison between Liner Regression and Logistic Regression

58

Performing Logistic regression using Proc Logistic Procedure in SAS

59

Performing Goodness of ft of the model

60

Introduction to Percent Concordant, AIC, SC, and Hosmer – Lemeshow

61

Receiver Operating Characteristics (ROC) Curve and Area under Curve (AUC)

62

Interpretation of the model: overall fit of the model and finding out the influential variables using Odds ratio criteria

63

Using automated model selection techniques in PROC Logistic to choose the best model using AIC criteria

Section 10
##### Introduction to Time Series Analysis

64

What is Time series Analysis, Objectives and Assumptions of Time Series

65

Identifying pattern in Time series data: Decomposition of the time series data and general aspect of the analysis

66

Introduction to Various Smoothing techniques: Simple Moving Average, Weighted Moving Average, Exponential Smoothing, Holt’s Linear Exponential Smoothing

67

Examples of Seasonality and detecting Seasonality in Time series data

68

Introduction to Proc Forecast to generate forecast for time series data

69

Autoregressive models and Stepwise Autoregression (STEPAR) procedure

70

Autoregressive and Moving Average models and Introduction to Box Jenkins Methodology

71

Introduction to Autoregressive Moving Average (ARMA) model

72

Autoregressive Integrated Moving Average (ARIMA) model

73

Building an ARIMA Model

74

Detection of Stationarity, Seasonality in ARIMA Model

75

Detecting the order of AR and MA of ARIMA model by Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF)

76

Detecting the order by using AIC and BIC criterion

77

Estimation and forecast using Proc ARIMA in SAS

**Qualitative Data
**This kind of data comprises of attributes and qualitative variables like Age, Gender, race etc. Some important techniques to represent qualitative data are:

- Bar Charts
- Pie-Charts

**Quantitative Data
**Quantitative data refers to the data comprising of numerical observations like Sales, profits etc. The main techniques of presenting quantitative data are:

- Histogram
- Scatter Plot

**Bar Charts
**A bar graph or a bar chart is used to represent data visually using bars of different heights or lengths. Data is graphed either horizontally or vertically, allowing viewers to compare different values and draw conclusions quickly and easily. A typical bar graph will have a label, axis, scales, and bars, which represent measurable values such as amounts or percentages. Bar graphs are used to display all kinds of data, from quarterly sales and job growth to seasonal rainfall and crop yields.

The bars on a bar graph may be the same color, though different colors are sometimes used to distinguish between groups or categories to make the data easier to read and interpret. Bar graphs have a labeled x-axis (horizontal axis) and y-axis (vertical axis). When experimental data is graphed, the independent variable is graphed on the x-axis, while the dependent variable is graphed on the y-axis.

**Type of Bar Graphs
**Bar graphs take different forms depending on the type and complexity of the data they represent. They can be as simple, in some cases, as two bars, such as a graph representing the vote totals of two competing political candidates. As the information becomes more complex, so will the graph, which may even take the form of a grouped or clustered bar graph or a stacked bar graph.

**Single:** Single bar graphs are used to convey the discrete value of the item for each category shown on the opposing axis. An example would be a representation of the number of males in grades 4-6 for each of the years 1995 to 2010. The actual number (discrete value) could be represented by a bar sized to scale, with the scale appearing on the X-axis. The Y-axis would display the corresponding years. The longest bar on the graph would represent the year from 1995 to 2010 in which the number of males in grades 4-6 reached its greatest value. The shortest bar would represent the year in which the number of males in grades 4-6 reached its lowest value.

**Grouped:** A grouped or clustered bar graph is used to represent discrete values for more than one item that share the same category. In the single bar graph example above, only one item (the number of males in grades 4-6) is represented. But one could very easily modify the graph by adding a second value that includes the number of females in grades 4-6. The bars representing each gender by year would be grouped together and color-coded to make it clear which bars represent the male and female values. This grouped bar graph would then allow readers to easily compare the number of students enrolled in grades 4-6 both by year and by gender.

**Stacked:** Some bar graphs have each bar divided into subparts that represent the discrete values for items that constitute a portion of the whole group. For instance, in the examples above, students in grades 4-6 are grouped together and represented by a single bar. This bar could be broken into subsections to represent the proportion of students in each grade. Again, color coding would be needed to make the graph readable.

In order to construct bar charts in SAS, we will have to import the datasets folder in SAS, using the given code.

Libname mylib “E:\Study Material\Datasets”;

**PROC GCHART **DATA = MYLIB.CANDY_SALES_SUMMARY;

HBAR3D SUBCATEGORY;

**RUN;
**

The above code will create a horizontal 3D bar chart on the basis of the variable called “Subcategory”. This variable belongs to the data set called “Candy_Sales_Summary” (present in the “mylib” library). hbar3d is the option for generating the horizontal 3d bar graph. This form of representing the data is useful when we are representing a spatial data.

**PROC GCHART **DATA = mylib.CANDY_SALES_SUMMARY;

VBAR3D SUBCATEGORY/SUM SUMVAR=SALE_AMOUNT;

**RUN;
**

This code generates a 3d vertical bar graph for the variable subcategory. But, corresponding to each vertical bar graph for the subcategory it gives the total sale amount on top of each of the vertical bar.

**PROC GCHART **DATA = MYLIB.CANDY_SALES_SUMMARY;

VBAR3D FISCAL_YEAR/SUM SUMVAR=SALE_AMOUNT

GROUP=CATEGORY SUBGROUP=SUBCATEGORY;

**RUN;
**

This code generates a sub-divided multiple bar diagram. The group generates the bar diagram corresponding to the category and show the sales corresponding to each subcategory for a given fiscal year.