Multi-snpnet-Cox (mrcox) Efficient Group-Sparse Lasso solver for multi-response Cox model

This package provide functions that solve the following multi-response, group-sparse regularized Cox regression:

$\min_{\beta_1,\cdots, \beta_K} \sum_{k=1}^K \frac{1}{n_k} \left[\sum_{i:O_i^k = 1} -\beta_k^T X_i + \log \left(\sum_{j:T^k_j \ge T^k_i} \exp(\beta_k^T X_j)\right)\right] + \lambda \left( \sum_{j=1}^d \|\beta^j\|_1 + \alpha \|\beta^j\|_2 \right)$

Where $\beta_k$ is the coefficient vector of the kth (out of K) responses, and $\beta^j$ are the coefficients of the jth variable on the K responses.

For genetics data in PLINK2 format, we provide a screening procedure similar to the one in this paper.

Installation

Currently mrcox only supports linux and 64-bit intel processors. It requires Intel’s Math Kernel Library (MKL). I suspect you can still get it to run on AMD processors but performance could be significantly worse. To install MKL:

Register and download MKL from https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library/choose-download/linux.html
Under Choose Product to Download, select Intel Math Kernel Library for linux
Untar the downloaded file, run ./install.sh. After installation is done, source intel/mkl/bin/mklvars.sh intel64.

You will also need to install the R dependencies of mrcox (Rcpp, RcppEigen).

Additional dependencies for genetics data

zstd(>=1.4.4). It can be built from source or simply available from conda, pip or brew
PLINK2

library(devtools)
install_github("chrchang/plink-ng", subdir="/2.0/cindex")
install_github("chrchang/plink-ng", subdir="/2.0/pgenlibr")

Once dependencies are installed, run the following in R

devtools::install_github("rivas-lab/multisnpnet-Cox")

Example of Usage

library(mrcox)
# Simulate some data
n = 1000
p = 5000
X = matrix(rnorm(n*p), n, p)
y1 = rexp(n) * exp(X %*% rbinom(p, 1, 0.1))
y2 = rexp(n) * exp(X %*% rbinom(p, 1, 0.1))
s1 = rbinom(n, 1, 0.3)
s2 = rbinom(n, 1, 0.3)
y_list = list(y1, y2)
s_list = list(s1, s2)
# Initialize coefficient matrix at 0
B = matrix(0.0, p, 2)
# Compute residual at B
res = get_residual(X, y_list, s_list, B)
# Compute the gradient of B
g = t(X) %*% res
# Get the dual norm and lambda sequence
alpha = sqrt(2)
dnorm = get_dual_norm(g, alpha)
lambda_max = max(dnorm)
lambda_min = 0.001 * lambda_max
lambda_seq = exp(seq(from = log(lambda_max), to = log(lambda_min), length.out = 100))
# Fit a model
fit = solve_aligned(X, y_list, s_list, lambda_seq, lambda_seq * alpha)
fit = fit[['result']] # fit also contains the residuals at each solution