Regularizations for General Linear Regression

The lasso [Tibshirani, 1996] refers to L1-regularization of linear models:

\Omega_\lambda(\vec{w}) = \lambda \sum_{i=1}^D |w_i|

In group lasso [Yuan and Lin 2006], the D predictors are divided into G groups, and the g-th group has D_g predictors. The regularization term becomes:

\Omega_\lambda(\vec{w}) = \lambda \sum_{g=1}^G \sqrt{D_g} ||\vec{w}_g||_2,

where ||\cdot||_2 denotes the L2-norm: ||\vec{x}||_2 =  \sqrt{\sum x_i^2}.  So, when all D_g‘s are 1, group lasso becomes lasso.

A further extension to group lasso is sparse group lasso:

\Omega_{\lambda,r}(\vec{w}) =\lambda \sum_{l=1}^G \sqrt{D_g}||\vec{w}_g||_2 + r_g |\vec{w}_g|

But in practice, we may not need this very complex regularization.