Transforming Data in SAS
Overview
Often it is necessary to transform data values. Reasons for this include
stabilizing variance within a variable through mathematical functions,
grouping related values into a single value for crosstabulation (e.g. age
might be transformed into three values signifying child, adult, senior),
or simply to create a new variable based on values found in other
variables.
Data Manipulation
Often you want to examine cases in a data file that meet certain
criteria. Using the if command, we can eliminate cases based
on the value of some key variable. For example, if we were researching
senior citizen access to healthcare we would not want cases from our
data file that were not associated with senior citizens. Assuming we had
a variable age, we could add a line in the data statement that reads:
if age > 64;
The above statement would disregard all data cases where age was less or equal
too 64.
It is possible to have many if statements in a sas program or create
compound statements using the keywords and and or. For
example:
if age > 65 or disabled = 'y';
Specifying Missing Values
Missing values refers to the problem that often you cannot get data values
for every variable. Possibly your subject will not answer a question.
Often we don't want to include cases with missing data because they will
potentially skew our results, thus for many statistical procedures SAS will
ignore cases where needed variables are considered missing.
By default SAS expects that a single period "." will be in the field location
when a data value is missing. This is unique to SAS, often people will specify
an impossible numeric value when the variable is missing. For example, -1 if
age is not specified. If you have a data file that uses certain numeric
values to signify missing data you must use data transformations to inform
sas.
To do this you can use the if statement defined above with a slight
modification. Using our example of age above, we would specify:
if age = -1 then age = . ;
With this form of the if statement we are assigning a new value based
on some condition (age equal to -1). Specifying age = . ; informs
SAS to treat this value as missing.
Creating New Variables
Often we want to create new variables based on values from other variables.
For example, suppose we have gross sales for region by quarter. We would
have four variables, each representing the sales for a quarter, lets call
them q1, q2, q3, and q4. If we wanted to analyze yearly sales by region we
would need to introduce a new variable. We could do this with the
transformation:
total = q1 + q2 + q3 +q4 ;
Another use of new variables is in transforming variables through
mathematical functions. For certain types of variables applying a
mathematical function to the variable's value will assist in analyzing the
variable. for example, to take te natural logirithm of a variable "x"
and create a new variable "y" is done with:
y = log(x);
See also:
Univ of Wis., Introduction to SAS: SAS Data Step. This material
provides a good discussion on how to generate random numbers and use that
in experimental design.
Author - Jack Suess
UMBC University Computing Services
Created - 1/15/96