Course sections

Introduction to SAS, Lecture 4

Introduction to Key Concepts on SAS Data Sets

SAS Datasets

• SAS Data Set is a SAS file which holds Data.
• Data must be in the form of a SAS data set to be processed.
• Many of the data processing tasks access data in the form of a SAS data set and analyze, manage, or present the data.
• A SAS data set also points to one or more indexes, which enable SAS to locate records in the data set more efficiently.

A SAS dataset formally defined is thus:
A file stored in a SAS library that SAS creates and processes.
It contains data values that are organized as a table of observations (rows) and variables (columns) that can be processed by SAS software.
It also contains descriptor information such as the data types and lengths of the variables, as well as which engine was used to create the data.

Rules for SAS datasets names
• can be 1 to 32 characters long
• must begin with a letter (A–Z, either uppercase or lowercase) or an underscore “_‟. can continue with any combination of numbers, letters, or underscores.

These are examples of valid data set names:
• _sales1
• Datatelecom

Columns in SAS
Columns are generally known as headings, fields but in SAS columns are called variables. It is a collection of values that describe a particular characteristic. In this table ID, Department, Satisfaction, Years and Status are the name of the variables in the data set.

Rows in SAS
Rows are sometime called Cases or records but in SAS these are called observations. It is a Collection of data values that usually relate to a single object in SAS Data Sets Example- Accounting, Chemistry are the observations under Variable Name (Department).

Missing Values in SAS
If a data is unknown for a particular observation, a missing value is recorded;
• ‘.’ (called period) indicates missing value of a numeric variable. Salary which is a numeric variable has 3 missing values
• “ ” (blank) indicates missing value of a character variable. In this table above Department is a character variable and has 1 missing value in it.

The data that is available to a SAS program for analysis is referred as a SAS Data Set. It is created using the DATA step.SAS can read a variety of files as its data sources like CSV, Excel, Access, SPSS and also raw data. It also has many in-built data sources available for use.
• The Data Sets are called temporary Data Set if they are used by the SAS program and then discarded after the session is run.
• But if it is stored permanently for future use then it is called a permanent Data set. All permanent Data Sets are stored under a specific library.

The SAS Data set is stored in form of rows and columns and also referred as SAS Data table. Below we see the examples of permanent Data sets which are in-built as well as red from external sources.

SAS Built-In Data Sets
These Data Sets are already available in the installed SAS software. They can be explored and used in formulating sample expressions for data analysis. To explore these data sets go to Libraries -> My Libraries -> SASHELP. On expanding it we see the list of names of all the built-in Data Sets available.

WhatsApp chat