Using BFI in SAS
Hassan Pazira
2024-09-04
SAS.Rmd
Introduction
The BFI
package is a powerful tool in R
designed to execute the Bayesian Federated Inference
(BFI) methodology, supporting a wide range of
regression models including linear, logistic, and
survival regression. While SAS offers robust
statistical capabilities, it currently lacks a dedicated package for
implementing the BFI method. Consequently, this vignette serves
to bridge that gap by illustrating how to utilize the R package
BFI
within the SAS environment. By seamlessly integrating
R’s BFI
package with the analytical prowess of SAS, users
gain access to a comprehensive suite of statistical techniques,
enhancing their ability to conduct sophisticated data analyses,
particularly when working with small datasets.
In this guide, we’ll explore how you can leverage SAS to effectively
utilize the BFI
package for your data analysis needs.
To utilize R within SAS, it’s assumed that you have access to a SAS server. This access allows you to establish a connection to a SAS session by providing the necessary connection parameters, such as the SAS server hostname, port number, and authentication credentials.
More information about configuring the SAS system to call functions in the R language is documented in the SAS Online Help.
Accessing SAS Services
To access SAS services, you typically need to connect to a SAS server. Here’s how you can do it using SAS Studio, which is a web-based interface for SAS:
Open a web browser and navigate to the URL provided by your SAS administrator for accessing SAS Studio. Enter your credentials to log in to SAS Studio. Once logged in, you can access SAS services such as analysis and reporting through the SAS Studio interface.
Install R and Configure with SAS
SAS requires two configuration options in order to communicate with R. First the RLANG option must be set when SAS is started. This may be set either in a custom configuration file or on the SAS command line. Second, SAS needs an R_HOME environment variable to point it to the correct, available version of R.
The RLANG System Option
The RLANG system option determines whether you have permission to call R from the SAS system. You can determine the value of the RLANG option by submitting the following SAS statements:
proc options option=RLANG;
run;
The result is one of the following statements in the SAS log:
- NORLANG: Do not support access to R language interfaces
If the SAS log contains this statement, it means that R integration is disabled, and you do not have permission to call R from the SAS system :( You may need to consult with your SAS administrator or IT department to enable it.
- RLANG: Support access to R language interfaces
If the SAS log contains this statement, it means that R integration is enabled, and you can call R from the SAS system :)
Install R
Download and install R from the official R website (https://www.r-project.org/). Follow the installation instructions provided for your operating system.
Install SAS/IML Interface to R
The SAS/IML Interface to R allows you to call R
functions from within PROC IML
(Interactive Matrix
Language). Check if the interface is installed by running the following
code within SAS:
proc options option=R_HOME;
run;
If the path to your R installation directory is displayed, the SAS/IML Interface to R is installed. If not, you may need to install or reinstall it.
Using PROC IML
and Rsubmit
You can use R inside SAS through
the use of the PROC IML
procedure. PROC IML
allows you to execute R code within a SAS session,
enabling integration between SAS and R
for data analysis and statistical modeling.
Installing R packages from CRAN and GitHub in SAS:
To install an R package from CRAN and
GitHub, you can use the base
,
stats
and remotes
packages, respectively. It
can be done in R or in SAS. Here’s how you can do it within SAS:
proc iml;
rsubmit;
/* First install and load 'base', 'stats' and 'BFI'from CRAN */
install.packages("base")
install.packages("stats")
install.packages("BFI") /* To install BFI from CRAN */
library(base)
library(stats)
library(BFI)
/* To install BFI from GitHub (if nessecary) */
/* install.packages("remotes") */
/* library(remotes) */
/* remotes::install_github("hassanpazira/BFI", force = TRUE) */
endrsubmit;
quit;
Now that you have the BFI
package installed and
configured, let’s explore its functionality through the following
example.
Example
Now we generate two datasets independently from Gaussian
distribution, and then apply main functions in the BFI
package to these datasets:
Simulate data for two local centers
First generate 30 samples randomly from Gaussian distribution N(0, 1) with p=3 covariates:
proc iml;
/****************************************************************/
/* Center 1: Data simulation for local center 1 with 30 samples */
/****************************************************************/
p = 3; /* Number of variables */
n1 = 30; /* Number of samples for center 1 */
theta = {1, 2, 2, 2, 1.5}; /* Define theta values directly */
X1 = j(n1, p); /* Initialize matrix X1 */
mu1 = j(n1, 1); /* Initialize vector mu1 */
y1 = j(n1, 1); /* Initialize vector y1 */
/* Generate data for center 1 */
call randseed(1123);
X1 = randfun(n1 || p, "Normal", 0, 1);
mu1 = theta[1] + X1 * theta[2:4];
y1 = randfun(n1 || 1, "Normal", mu1, sqrt(theta[5]));
/* Create dataset for center 1 */
create y1 var {"y1"};
append;
close y1;
create X1 from X1[colname={"X1_1" "X1_2" "X1_3"}];
append from X1;
close X1;
call ExportMatrixToR(X1, "X1");
call ExportMatrixToR(y1, "y1");
quit;
Now generate 50 samples randomly from N(0, 1) with 3 covariates:
proc iml;
/****************************************************************/
/* Center 2: Data simulation for local center 2 with 50 samples */
/****************************************************************/
p = 3; /* Number of variables */
n2 = 50; /* Number of samples for center 2 */
theta = {1, 2, 2, 2, 1.5}; /* Define theta values directly */
X2 = j(n2, p);
mu2 = j(n2, 1);
y2 = j(n2, 1);
/* Generate data for center 2 */
call randseed(1123);
X2 = randfun(n2 || p, "Normal", 0, 1);
mu2 = theta[1] + X2 * theta[2:4];
y2 = randfun(n2 || 1, "Normal", mu2, sqrt(theta[5]));
/* Create dataset for center 1 */
create y2 var {"y2"};
append;
close y2;
create X2 from X2[colname={"X2_1" "X2_2" "X2_3"}];
append from X2;
close X2;
call ExportMatrixToR(X2, "X2");
call ExportMatrixToR(y2, "y2");
quit;
We have transferred SAS data to the R session and are currently
initiating an analysis using the BFI method in R. All communications
with R are facilitated through SAS’s PROC IML
. It’s
important to note that capitalization matters in R, and character
variables are automatically converted into factors.
MAP estimates at the local centers
The following compute the Maximum A Posterior (MAP) estimators of the parameters for center 1:
proc iml;
rsubmit;
#---------------------------
# Inverse Covariance Matrix
#---------------------------
# Creating the inverse covariance matrix for the Gaussian prior distribution:
Lambda <- inv.prior.cov(X1, lambda=0.05, family=gaussian)
#--------------------------
# MAP estimates at center 1
#--------------------------
fit1 <- MAP.estimation(y1, X1, family=gaussian, Lambda)
theta_hat1 <- fit1$theta_hat # intercept and coefficient estimates
A_hat1 <- fit1$A_hat # minus the curvature matrix
summary(fit1, cur_mat=TRUE)
endrsubmit;
quit;
Obtaining the MAP estimators of the parameters for center 2 using the following:
proc iml;
rsubmit;
# Creating the inverse covariance matrix for the Gaussian prior distribution:
Lambda <- inv.prior.cov(X2, lambda=0.05, family=gaussian)
#--------------------------
# MAP estimates at center 2
#--------------------------
fit2 <- MAP.estimation(y2, X2, family=gaussian, Lambda)
theta_hat2 <- fit2$theta_hat
A_hat2 <- fit2$A_hat
summary(fit2, cur_mat=TRUE)
endrsubmit;
quit;
BFI at central center
Now, you can utilize the primary function bfi()
to
acquire the BFI estimates:
proc iml;
rsubmit;
# Creating the inverse covariance matrix for central server:
Lambda <- inv.prior.cov(X1, lambda=0.05, family=gaussian) # the same as other centers
#----------------------
# BFI at central center
#----------------------
A_hats <- list(A_hat1, A_hat2)
theta_hats <- list(theta_hat1, theta_hat2)
bfi <- bfi(theta_hats, A_hats, Lambda)
summary(bfi, cur_mat=TRUE)
endrsubmit;
call ImportMatrixFromR(bfi, "bfi");
quit;
Datasets included in the BFI
package
In order to find and use the datasets available from the
BFI
package, use the following codes:
proc iml;
rsubmit;
# To find a list of all datasets included in the package
print(data(package = "BFI"))
# To use the 'Nurses' data
BFI::Nurses
cat("Dimension of the 'Nurses' data: \n", dim(Nurses))
cat("Colnames of the 'Nurses' data: \n", colnames(Nurses))
# To use the 'trauma' data
BFI::trauma
cat("Dimension of the 'trauma' data: \n", dim(trauma))
cat("Colnames of the 'trauma' data: \n", colnames(trauma))
endrsubmit;
quit;
Importing the data from R
R objects and data may be brought back into SAS as well, for any manipulation you might want to do in SAS. Here, we just grab the bfi object and the Nurses data from R and print the data in SAS.
proc iml;
submit / R; * 'rsubmit' is equivalent to 'submit / R' ;
# Export 'bfi' object
ExportDataSetToSAS(bfi)
# Export dataset 'Nurses'
ExportDataSetToSAS(Nurses)
endsubmit;
run;
proc print data=Nurses;
run;
BFI as a SAS Package
In the near future, we will be releasing the SAS/IML package for BFI,
which can be installed by the PACKAGE INSTALL
statement in
the SAS environment.
Contact
If you find any errors, have any suggestions, or would like to request that something be added, please file an issue at issue report or send an email to: hassan.pazira@radboudumc.nl.