Course sections

Introduction to Analytics, Lecture 4

Graphical summary using SAS/GRAPH: Introduction to Bar graph

Qualitative Data
This kind of data comprises of attributes and qualitative variables like Age, Gender, race etc. Some important techniques to represent qualitative data are:

  • Bar Charts
  • Pie-Charts

Quantitative Data
Quantitative data refers to the data comprising of numerical observations like Sales, profits etc. The main techniques of presenting quantitative data are:

  • Histogram
  • Scatter Plot

Bar Charts
A bar graph or a bar chart is used to represent data visually using bars of different heights or lengths. Data is graphed either horizontally or vertically, allowing viewers to compare different values and draw conclusions quickly and easily. A typical bar graph will have a label, axis, scales, and bars, which represent measurable values such as amounts or percentages. Bar graphs are used to display all kinds of data, from quarterly sales and job growth to seasonal rainfall and crop yields.

The bars on a bar graph may be the same color, though different colors are sometimes used to distinguish between groups or categories to make the data easier to read and interpret. Bar graphs have a labeled x-axis (horizontal axis) and y-axis (vertical axis). When experimental data is graphed, the independent variable is graphed on the x-axis, while the dependent variable is graphed on the y-axis.

Type of Bar Graphs
Bar graphs take different forms depending on the type and complexity of the data they represent. They can be as simple, in some cases, as two bars, such as a graph representing the vote totals of two competing political candidates. As the information becomes more complex, so will the graph, which may even take the form of a grouped or clustered bar graph or a stacked bar graph.

Single: Single bar graphs are used to convey the discrete value of the item for each category shown on the opposing axis. An example would be a representation of the number of males in grades 4-6 for each of the years 1995 to 2010. The actual number (discrete value) could be represented by a bar sized to scale, with the scale appearing on the X-axis. The Y-axis would display the corresponding years. The longest bar on the graph would represent the year from 1995 to 2010 in which the number of males in grades 4-6 reached its greatest value. The shortest bar would represent the year in which the number of males in grades 4-6 reached its lowest value.

Grouped: A grouped or clustered bar graph is used to represent discrete values for more than one item that share the same category. In the single bar graph example above, only one item (the number of males in grades 4-6) is represented. But one could very easily modify the graph by adding a second value that includes the number of females in grades 4-6. The bars representing each gender by year would be grouped together and color-coded to make it clear which bars represent the male and female values. This grouped bar graph would then allow readers to easily compare the number of students enrolled in grades 4-6 both by year and by gender.

Stacked: Some bar graphs have each bar divided into subparts that represent the discrete values for items that constitute a portion of the whole group. For instance, in the examples above, students in grades 4-6 are grouped together and represented by a single bar. This bar could be broken into subsections to represent the proportion of students in each grade. Again, color coding would be needed to make the graph readable.

In order to construct bar charts in SAS, we will have to import the datasets folder in SAS, using the given code.
Libname mylib “E:\Study Material\Datasets”;
PROC GCHART DATA = MYLIB.CANDY_SALES_SUMMARY;
HBAR3D SUBCATEGORY;
RUN;
QUIT; 

The above code will create a horizontal 3D bar chart on the basis of the variable called “Subcategory”. This variable belongs to the data set called “Candy_Sales_Summary” (present in the “mylib” library). hbar3d is the option for generating the horizontal 3d bar graph. This form of representing the data is useful when we are representing a spatial data.

PROC GCHART DATA = mylib.CANDY_SALES_SUMMARY;
VBAR3D SUBCATEGORY/SUM SUMVAR=SALE_AMOUNT;
RUN;
QUIT;

This code generates a 3d vertical bar graph for the variable subcategory. But, corresponding to each vertical bar graph for the subcategory it gives the total sale amount on top of each of the vertical bar.

PROC GCHART DATA = MYLIB.CANDY_SALES_SUMMARY;
VBAR3D FISCAL_YEAR/SUM SUMVAR=SALE_AMOUNT
GROUP=CATEGORY SUBGROUP=SUBCATEGORY;
RUN;
QUIT;

This code generates a sub-divided multiple bar diagram. The group generates the bar diagram corresponding to the category and show the sales corresponding to each subcategory for a given fiscal year.

WhatsApp chat