Looking to analyze your data with Proc Means but don't know how to start?
No worries. In this article, we will show you 15 different ways to analyze your data using the MEANS procedure.
You will learn how to compute descriptive statistics and export the analysis results to an external file.
Let's get started!
The examples used in this article are based on the CARS data set from the SASHelp library.
You can find the CARS data set from the sashelp library:
[Don't have the software yet? Download SAS Studio here for free.]
The CARS data set contains 15 variables related to the price, cost, make, model and specifications of a list of cars.
In this article, we will show you how you can use Proc Means to analyze the MSRP (i.e., Manufacturer's Suggested Retail Price) for each car maker, model and type of car:
Of course, you will be able to use the same techniques to analyze your own data sets for your work projects.
1. Basic Structure
Let's first run the MEANS procedure on the sashelp.cars data set:
The basic form of Proc Means computes a set of descriptive statistics:
The descriptive statistics are computed for all the numeric variables in the data set.
By default, the statistics N, Mean, Standard Deviation, Minimum and Maximum are computed:
2. Selecting Variables for Your Analysis
Sometimes you might be interested in only a few selected variables.
You can add the VAR statement to limit the analysis to only the variables you are interested in analyzing.
The VAR statement above limits the analysis to only the MSRP and INVOICE variables. No results are computed for any other variables in the data set.
Note: You can analyze only numeric variables with the MEANS procedure. Running the Proc Means on a character variable will give you an error.
3. Requesting Specific Statistics
Getting the mean, standard deviation, minimum and maximum is nice. However, you might also want to compute additional statistics for your analysis.
You can request additional statistics by adding the corresponding statistics keywords.
Example 1: Lower Quartile, Median and Upper Quartile
Adding the Q1, Median and Q3 keywords tells SAS to compute the Lower Quartile, Median and Upper Quartile:
Example 2: Mean, Standard Error and 95% Confidence Limits
Adding the MEAN, STDERR and CLM keywords computes the mean, standard error and 95% confidence limits:
Following is the complete list of statistics keywords that can be used in Proc Means:
4. Display Different Decimal Places
You can also specify the number of decimal places to display for the statistics using the MAXDEC= option.
The MAXDEC=0 option tells SAS to not display any decimal places.
The analysis results are all integers.
You can also display, say, 2 decimal places by adding the MAXDEX=2 option:
5. Group Your Analysis
The price of a Porsche is likely to be very different from that of a Toyota. Thus, it makes more sense to separate the analysis for each car maker.
A CLASS statement can be added to the MEANS procedure to group your analysis:
By specifying the variable MAKE as the classification variable, there will be a separate analysis completed for each car maker.
You can do the same for your own data as well. Use the CLASS statement to separate the analysis for different categories of your data.
6. Adding Multiple Classification Variables
There is no limit to how many classification variables you can add to your analysis.
Adding two classification variables to the CLASS statement enables you to group your analysis into multiple levels:
By adding both the variables MAKE and TYPE to the CLASS statement, you can analyze the data for each combination of car maker and the types of cars they produce:
7. Changing the Displayed Order of the Classification Variable
There is also an option to change the displayed order of the classification variables.
Example: Order=Freq Option
The ORDER=FREQ option tells SAS to order the variable MAKE from the highest frequency to the lowest.
You can also order the classification variable in descending alphabetical order.
Example: Descending Option
The DESCENDING option displays the MAKE variable in descending alphabetical order.
8. Analyze a Subset of the Observations
Let's assume your two favourite car makers are BMW and Audi.
You are hoping to compute the statistics for these two brands only.
The WHERE statement can be used to limit your analysis to the observations from these brands.
The WHERE statement above defines the subset as "BMW" or "Audi" only. Only these two brands of cars are being analyzed.
9. Create an Output Data Set
You can also save the analysis results in an output data set using the OUTPUT statement.
The OUTPUT statement above creates an output data set called OUTSTAT:
By default, the OUTSTAT data set contains the N, Mean, Standard Deviation, Minimum and Maximum statistics for the INVOICE variable:
When running the code above, the results are also printed on the Results Window by default:
If you don't need the result printed on the Results Window, you can suppress it by adding the NOPRINT option.
No analysis result will be printed on the Results window.
10. Requesting Additional Statistics in the Output Data Set
The OUTPUT statement also allows you to specify the statistics to be included in the output data set.
Example 1: Mean option
The Mean = Mean1 option tells SAS to include the mean statistics in the output data set.
The name of the variable is called MEAN1:
Example 2: Q1, Median and Q3 options
You can also request additional statistics such as lower quartile and upper quartile:
The Q1=, Median= and Q3= options compute the lower quartile, median and upper quartile in the output data set:
The list of statistics options or keywords are the same as in #3 above:
CLM, NMISS, CSS, RANGE, CV, SKEWNESS|SKEW, KURTOSIS|KURT, STDDEV|STD, LCLM, STDERR, MAX, SUM, MEAN, SUMWGT, MIN, UCLM, MODE, USS, N, VAR, MEDIAN|P50, P1, QRANGE, PROBT|PRTT.
11. Autoname the Output Variables
In #10 above, we name the output variables MEAN1, LOWERQ, MEDIAN and UPPERQ.
Naming the variables is not necessary for Proc Means.
The AUTONAME option can be used and SAS will automatically name the variables for the statistics requested:
The code above requests the mean and standard deviation to be computed.
The AUTONAME option is added. The variables are automatically assigned the name of INVOICE_MEAN and INVOICE_STDDEV by SAS, respectively.
12. Analyze Multiple Variables Within a Single Output Statement (Advance)
You can even compute statistics for multiple variables within a single OUTPUT statement.
The Mean(MSRP)= option computes the mean of the MSRP. The same applies to the Mean(Invoice)= option and the Mean(Horsepower)= option.
13. Understand the _TYPE_ Variable in the Output Data Set
The _TYPE_ variable is automatically created in the OUTPUT data set from the MEANS procedure.
It is used to identify the combination of classification values.
Let's look at an example.
Example 1: No Classification (i.e., no CLASS statement)
The MEANS procedure above does not have a CLASS statement. The _TYPE_ variable is 0 for the one observation in the output data set.
Example 2: One Classification Variable
The MEANS procedure above has 1 classification variable (i.e., ORIGIN). There are two groups of statistics generated in the output data set:
The observation with _TYPE_ = 0 identifies the "overall" analysis. The statistics are computed for all values with no classification level.
The observations with _TYPE_ = 1 identify the analysis for each classification level. The statistics, in our example, are computed for each car origin:
Example 3: Two Classification Variables
The MEANS procedure above has a CLASS statement with two classification variables (i.e., ORIGIN and DRIVETRAIN). In total, there are 4 groups of statistics generated in the output data set:
Again, when _TYPE_ = 0, the statistics computed are for the overall analysis. No classification level is used.
When _TYPE_ = 1, the statistics are computed for each of the DRIVETRAIN levels (i.e., All, Front and Rear). The classification from the ORIGIN variable is not considered.
When _TYPE_ = 2, the statistics are computed for each of the ORIGIN levels (i.e., Asia, Europe, USA). The classification from the DRIVETRAIN variable is not considered.
Finally, when _TYPE_ = 3, the statistics are computed for each combination of the ORIGIN and DRIVETRAIN values. Both of the classification variables are used.
14. Simplify the Output Data Set with NWAY Option
More often than not, you want to get statistics for each combination of the classification values only (e.g., ORIGIN x DRIVETRAIN).
You might not care about the overall analysis or any individual classification variable alone (e.g., ORIGIN alone or DRIVETRAIN alone).
You can use the NWAY option to remove these statistics from the output data set.
The NWAY option tells SAS to keep only the observations where the variable _TYPE_ has the highest value.
In our example, only _TYPE_=3 will be kept:
15. Printing the Results to an External PDF File
You can easily print the statistical results to an external file such as PDF or RTF using ODS (Output Delivery System).
The ODS statement above prints the results from the MEANS procedure to an external PDF file.
You can also export the results to a RTF file.
Simply replace the PDF keyword to RTF and you will be able to print the results to a RTF file.
That's it! If you have any questions, feel free to leave a comment below.