SASCRUNCH TRAINING
  • Home
  • SAS® Certification Training
    • SAS Certified Specialist Exam Training Program
    • How to Prepare For SAS Certified Specialist Base Programming Exam
  • Online Courses
    • Practical SAS Training Course for Beginners
    • Proc SQL Course
    • SAS Project Training Course
    • Logistic Regression (Credit Scoring) Modeling using SAS
  • Articles
    • Get Started >
      • 18 Free Resources to Help You Learn SAS
      • SAS Tutorial
      • How to Install SAS Studio
      • How to Learn SAS Fast
    • Data Import >
      • Importing Excel Spreadsheet
      • Importing CSV Files
      • Importing Text Files
    • SAS Functions >
      • CAT, CATT, CATS, CATX Functions
      • If-Then-Else statement
      • TRIM Function
      • STRIP Function
      • YEAR, MONTH, DAY Functions
      • Compress Function
      • Do-Loop
      • SCAN Function
      • LIKE Operator
      • INDEX Function
    • Data Manipulations >
      • The Ultimate Guide to Proc SQL
      • Proc Datasets
      • Dictionary Tables
      • Dealing with Missing Values
      • Proc Compare
      • Proc Transpose
      • RETAIN Statement
      • SAS Formats
      • SAS Arrays
    • Statistical Analysis >
      • Proc Means
      • Proc Freq
      • Proc Tabulate
    • Machine Learning >
      • Predicting Fish Species Using K-nearest Neighbor in SAS
      • Classify Product Reviews on Amazon Using Naive Bayes Model in SAS
    • Informational Interviews >
      • How to get a Clinical Trial/Research job without experience
      • Senior Recruiter at a Fortune 500 Retail Company
      • Manager, Non-profit Health Services Research
      • HR Manager
      • Quantitative Analyst
  • Services
    • The Ultimate Job Search Automation Services
    • Statistical Consulting
    • SAS Project or Assignment Help
    • Data Import Services
    • Data Manipulation and Reporting Services
  • In-class Training
    • SAS Training for Job Seekers
  • Guest Lecture
  • Sample Resume
  • About us
  • Contact Us
Practical SAS Training Course for Beginners


Get Access to:
​
  • 90+ Training ​Modules
  • 150+ ​Practice ​Exercises
  • 5 ​Coding ​Projects
  • 1000+​ Satisfied ​Students
Start your Free training!
x
Picture
Need help studying for the new
SAS Certified Specialist Exam?
Get access to:
  • Two Full Certificate Prep Courses
  • ​300+ Practice Exercises
Picture
Start your free training now
How to Prepare for the SAS Certified Specialist Base Programming Exam
Picture
 

Proc Freq: 7 Ways to Compute Frequency Statistics in SAS​


In this article, we will show you 7 different ways to analyze your data using the FREQ procedure.
 
You will learn how to see frequencies of different variables, find the most/least commonly occurring values in your data, check for missing values,…
 
Let's get started!

Software​
Before we continue, make sure you have SAS Studio or SAS 9.4 installed. Don't have the software? Download SAS Studio now. It's free!​
SAS Studio

Data Sets
The examples used in this article are based on the CLASS and CLASSFIT data sets from the SASHelp library.
​
Picture
Picture

You can find the CARS and HEART data sets from the sashelp library:

 
Picture

[Don't have the software yet? Download SAS Studio here for free.]

1. Basic Usage

The most basic usage of Proc Freq is to determine the frequency (number of occurrences) for all values found within each variable of your dataset. 

​Using the CARS dataset as an example, you can determine the frequencies of all variables within your dataset with the following code:
​
Proc freq data = sashelp.cars;
Run;

The code above creates a frequency table for each of the variable in the data set. 

For example, below is a frequency table for the variable MAKE. 

Picture

If you scroll down, you will also see the frequency tables for the variable ORIGIN and DRIVETRAIN:

Picture

By default, the TABLES statement used with Proc Freq will output a table which lists the values found within the variable(s) specified, the frequency of each value, the percentage of that value relative to all other value as well as the cumulative frequencies and cumulative percentages.

​The cumulative frequencies and percentages are rolling totals determined by adding the number from each row to the row above it.
Picture

However, using Proc Freq in this manner without any options is usually not recommended, particularly if you have a large dataset which contains variables that have many unique values (levels). 

​A variable such as Model with a large number of unique (distinct) values will produce a very long output which will be difficult to read and not very useful:
​
Picture

A more efficient and effective use of Proc Freq is to use the TABLES statement to limit the variables that are reported on.
 
Here, the TABLES statement is used to only output the frequencies and percentages of the Origin variable to determine how many cars originate from which continent:
​
Proc freq data = sashelp.cars;
 Tables Origin;
Run;

The resulting table from this code is shown here:
​
Picture

2. Sort output to determine the most/least commonly occurring values

You can use proc freq to determine the most or least commonly occurring values within a variable or multiple variable(s).

​Using the order option, you can easily see the most or least commonly occurring values of both Type and Origin variables:
​
Proc freq data = sashelp.cars order=freq;
 Tables type origin;
Run;

The resulting tables shows the frequency of each variable sorted with the most common variable on top and the least common on the bottom:

Picture

3. Check for Missing Values

Proc freq is an excellent tool to check for missing values in your dataset. 

​For this example, the SASHELP.HEART dataset is used. The SASHELP.HEART dataset can be accessed in the same way as the CARS dataset described above.
 
To check for the frequency of missing values in the DeathCause variable from the HEART dataset, you would use the following code:

Proc freq data=sashelp.heart;
 Tables deathcause;
Run;


​Here, you can see the missing values highlighted at the bottom of the table:

Picture


If you would also like to see the percent, cumulative frequency and cumulative percentage of missing values, you can using the MISSING option with the tables statement:

Proc freq data=sashelp.heart;
 Tables deathcause /missing;
Run; 

With the MISSING option, you can see the frequencies and the percentage of missing values within the table:

Picture

Finally, there is also a way to include the frequency of missing values within your output table without factoring the percentage of missing values into the calculation of the percent and cumulative percentage of the other values. 

​This can be done with the MISSPRINT option:
​
Proc freq data=sashelp.heart;
 Tables deathcause /missprint;
Run;

Notice that in this table, the percentages of each value are lower than the percentages in the previous table, as the missing values are not factored into this calculation. 

​Using the Unknown value as an example, the percentage of records that have an Unknown value for Cause of Death is 5.63% with MISSPRINT, compared to only 2.15% in the previous table with the MISSING option:
​
Picture

Are you totally new to SAS?
Picture
Take our Practical SAS Training Course for Beginners and learn how to code your first SAS program!
Start learning now

4. Create an Output Data Set

Frequencies and percentages calculated using Proc Freq can also be saved to an output dataset using the OUT option combined with the TABLES statement.

​The OUTCUM option can also be added to include the cumulative frequencies in the output dataset if desired:



Proc freq data = sashelp.cars order=freq;
 Tables type /out=cars_freq outcum;
Run;
Picture

5. ​Use the FORMAT statement to categorize and analyze data

When combined with Proc Format and a FORMAT statement, Proc Freq also becomes a powerful tool to categorize and subsequently analyze continuous variables (or variables with a large number of unique values).
 
Using the MSRP (Manufacturer’s Suggested Retail Price) variable in the Cars dataset as an example, you can see that the standard Proc Freq output shown below does not produce very useful information for a variable such as MSRP:

Proc freq data=sashelp.cars;
 Tables msrp;
Run
Picture

However, buy using Proc Format you can create categories (or groups) of MSRPs to see, for example, how many cars fall within a particular price range.
​
Proc format;
 Value msrp_groups
  10000-19999 = '10,000-19,999'
  20000-29999 = '20,000-29,999'
  30000-39999 = '30,000-39,999'
  40000-high = '40,000+'
  ;
Run;

With the new numeric format msrp_groups created, the FORMAT statement can be used together with the Proc Freq call to determine the distribution of MSRPs across the different groups:

Proc freq data = sashelp.cars;
 Tables msrp;
 Format msrp msrp_groups.;
Run;   

As you can see, this produces a much more useful and informative table:

Picture

Picture
Need help studying for the new
SAS Certified Specialist Exam?
Get access to:
  • Two Full Certificate Prep Courses
  • ​300+ Practice Exercises
Start your free training now
How to Prepare for the SAS Certified Specialist Base Programming Exam

6. Cross-tabulation – Create 2x2 or nxn multi-way tables

Proc freq can also be used to produce 2x2 or higher nxn multi-way tables to determine the distribution (or frequency) of records that fall into 2 or more combinations of categories. 

​For example, if you would like to compare the different car DriveTrain types by the continent of Origin from the Cars dataset, you could use the following code:


Proc freq data=sashelp.cars;
 Tables origin*drivetrain;
Run;

In this example, both Origin and DriveTrain each have 3 possible values. As a result, the cross-tabulation produces a 3x3 table which includes a total of 9 combinations (i.e. 3x3 = 9):

Picture

While this table may seem overwhelming at first, let’s walk through it step-by-step to understand what each component refers to.

As shown in the legend, the first row corresponds to the frequencies. For example, the 34 in the top left box indicates that there are 34 cars from Asia that have an “All” for DriveTrain.

​Moving from left to right, the 99 in the top middle box indicates that there are 99 cars from Asia that have a “Front” drivetrain, and so on.


Picture

The second row contains the percentages relative to the other 8 combinations. Using the top left box again as an example, the 7.94% indicates that out of the 9 possible combinations of Origin and DriveTrain, 7.94% of records have Origin=Asia and DriveTrain=All. 

Picture

The third row contains what is known as the row percentages. Starting with the top left box as an example, the 21.52 indicates that of those records with Origin=Asia, 21.52% have a DriveTrain=All. Moving across the row from left to right, you can see that for Origin=Asian cars, 62.66% have DriveTrain=Front, and 15.82% have a DriveTrain=Rear. Notice that these 3 percentages total 100% when summed (added together) across the row.

Picture

The fourth row contains what is known as the column percentages. Starting with the top left box as an example, the 36.96 indicates that of those records with DriveTrain=All, 36.96% have Origin=Asia. Moving down the column from left to right, you can see that for DriveTrain=All cars, 39.13% have Origin=Europe and 23.91% have Origin=USA. Notice that these 3 percentages total 100% when summed (added together) down the column.
​


Picture

Depending on the desired results, you can choose to suppress some of these numbers from the output. The NOCOL, NOROW, NOFREQ and NOPERCENT options can be used to suppress the column percentages, row percentages, frequencies and overall percentages from your output. These options can be used independently or in different combinations together.
 
For example, if you wanted to suppress the row and column percentages, but keep the frequencies and overall percentages, you would use the following code:


Proc freq data=sashelp.cars;
 Tables origin*drivetrain /nocol norow;
Run;

This produces the following table, which contains only the frequencies and overall percentages:


Picture

Two-way or multi-way tables can also be displayed in more of a list format for improved readability. This is especially useful when there are many possible combinations between the two variables. To display a cross tabulation in the long form “list” format, you can simply use the LIST option:


Proc freq data=sashelp.cars;
 Tables origin*drivetrain /list;
Run;

The results are identical to those produced without the LIST option, the only change is in how the information is displayed:

Picture

7. Produce dot and bar plots

Another useful feature of Proc Freq is the ability to create graphical representations of the frequencies and percentages. 

Within Proc Freq, you have the ability to create either dot or bar plots, which can be created based on either the frequencies or the overall percentages.
 
In the following example, the TABLES statement is used to create both a 1-way frequency table for the Origin variable, and a 3x3 frequency table for the DriveTrain variable crossed with Origin. 

​To produce a dot plot for these variables, the plots=freqplot (type=dot) option is added. In order to produce these graphs, ODS graphics must also be turned ON (and subsequently turned OFF) as shown below:



Ods graphics on;
Proc freq data=sashelp.cars order=freq;
 Tables origin drivetrain*origin / plots=freqplot(type=dot);
Run;
Ods graphics off;  

Along with the frequency tables, the following 2 graphs are produced with the code above. The first graph shows a dot plot of the frequencies for each continent of Origin in the Cars dataset:

Picture

​Because the “DriveTrain*Origin” portion of the TABLES statement in this example was also included in the code above, this second graph is produced which shows the frequency distribution of DriveTrain by each Origin in a single graph:

Picture

Alternatively, similar code can be used to produce bar plots based on the percentages instead of the frequencies. Of course, you can also mix and match combinations to produce a dot plot of percentages or a bar plot of frequencies if desired.
 
Using some of the code discussed earlier on this page to group and report on the MSRPs, the type=bar and scale=percent options are added to produce a bar plot that graphically represents the corresponding percentages with bars.

Proc format;
 Value msrp_groups
  10000-19999 = '10,000-19,999'
  20000-29999 = '20,000-29,999'
  30000-39999 = '30,000-39,999'
  40000-high = '40,000+'
  ;
Run;

Ods graphics on;
Proc freq data=sashelp.cars order=freq;
 Tables msrp / plots=freqplot (type=bar scale=percent);
 Format msrp msrp_groups.;
Run;
Ods graphics off;
Picture


 

Master SAS in 30 Days

Start your Free training now!
Copyright © 2012-2019 SASCrunch.com All rights reserved.
  • Home
  • SAS® Certification Training
    • SAS Certified Specialist Exam Training Program
    • How to Prepare For SAS Certified Specialist Base Programming Exam
  • Online Courses
    • Practical SAS Training Course for Beginners
    • Proc SQL Course
    • SAS Project Training Course
    • Logistic Regression (Credit Scoring) Modeling using SAS
  • Articles
    • Get Started >
      • 18 Free Resources to Help You Learn SAS
      • SAS Tutorial
      • How to Install SAS Studio
      • How to Learn SAS Fast
    • Data Import >
      • Importing Excel Spreadsheet
      • Importing CSV Files
      • Importing Text Files
    • SAS Functions >
      • CAT, CATT, CATS, CATX Functions
      • If-Then-Else statement
      • TRIM Function
      • STRIP Function
      • YEAR, MONTH, DAY Functions
      • Compress Function
      • Do-Loop
      • SCAN Function
      • LIKE Operator
      • INDEX Function
    • Data Manipulations >
      • The Ultimate Guide to Proc SQL
      • Proc Datasets
      • Dictionary Tables
      • Dealing with Missing Values
      • Proc Compare
      • Proc Transpose
      • RETAIN Statement
      • SAS Formats
      • SAS Arrays
    • Statistical Analysis >
      • Proc Means
      • Proc Freq
      • Proc Tabulate
    • Machine Learning >
      • Predicting Fish Species Using K-nearest Neighbor in SAS
      • Classify Product Reviews on Amazon Using Naive Bayes Model in SAS
    • Informational Interviews >
      • How to get a Clinical Trial/Research job without experience
      • Senior Recruiter at a Fortune 500 Retail Company
      • Manager, Non-profit Health Services Research
      • HR Manager
      • Quantitative Analyst
  • Services
    • The Ultimate Job Search Automation Services
    • Statistical Consulting
    • SAS Project or Assignment Help
    • Data Import Services
    • Data Manipulation and Reporting Services
  • In-class Training
    • SAS Training for Job Seekers
  • Guest Lecture
  • Sample Resume
  • About us
  • Contact Us