Centre for Research in Environmental Epidemiology (CREAL)

 Juan R González 

 

curriculum vitae

software

research interest

personal


 

Software

Genetics

     CNVassoc

     mlpastats
     SNPassoc

     dfnbpro

Recurrent events

     frailtypack

     gmcrec

     survrec

 

 

R libraries

We have developed some packages included in the R project in collaboration with other researches from different institutions. Some of these libraries are related to genetics and other ones to survival analysis with recurrent events.

Genetics

Package CNVassoc

We are interested in assessing association between CNVs and traits using information obtained from MLPA, Illumina, aCGH or any other platform that provides quantitative measurements. To do so, we propose a class of latent models that incorporates uncertainty when copy number status is inferred. The functions for assessing association are implemented in an R package (tar.gz file (Linux) or zip file (Windows)). The package requires libraries ‘mixdist’ and ‘mclust’ to be installed. We have included two real data sets to illustrate how the model works. They are described in the vignette (the scripts can be downloaded here MLPA example and aCGH example). The statistical methods and the examples are described in the paper:

·         Gonzalez JR, Subirana I, Escaramis G, Peraza S, Caceres A, Estivill X, Armengol L. Latent Class Model to Assess Association between Copy Number and Disease. BMC Bioinformatics 2009, 10:172.

 

 

Package MLPAstats

 

Multiplex ligation-dependent probe amplification (MLPA) method is a potentially useful semi-quantitative method to detect copy number alterations in targeted regions. In this project we are developing statistical models and methods to determine the statistical significance of altered probes. The functions are implemented in an R package (tar.gz file (Linux) or zip file (Windows)) that contains an R GUI application. The package has two real MLPA data sets that can be analyzed as described in the vignette. The script can be downloaded here.

The statistical methods are described in the paper:

·         Gonzalez JR, Carrasco JL, Armengol J, Villatoro S, Jover L, Yasui Y, Estivill X. Probe-specific mixed-model approach to detect copy number differences using multiplex ligation-dependent probe amplification (MLPA). BMC Bioinformatics 2008, 9:261.

 

Package SNPassoc

This package was built when I was working at Xavier Estivill’s lab at Center for Genomic Regulation and it is written in collaboration with Victor Moreno and his colleagues. The R package SNPassoc contains classes and methods to help the analysis of whole genome association studies. SNPassoc utilizes S4 classes and extends haplo.stats R package to facilitate haplotype analyses. The package is useful to carry out most common analysis when performing whole genome association studies. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Permutation test and related tests (sum statistic and truncated product) are also implemented.

The methodology is described in:

·         JR Gonzalez, L Armengol, X Sole, E Guino, JM Mercader, X Estivill, V Moreno (2007). SNPassoc: an R package to perform whole genome association studies. Bioinformatics, 23:644-5

and it has been used in:

·         Mercader JM, Ribasés M, Gratacòs M, González JR, Bayés M, de Cid R, Badía A, Fernández-Aranda F, Estivill X (2007). Altered brain-derived neurotrophic factor blood levels and gene variability are associated with anorexia and bulimia. Genes Brain Behav. [Epub ahead of print]

·         Gratacòs M, Soria V, Urretavizcaya M, González JR, Crespo JM, Bayés M, de Cid R, Menchón JM, Vallejo J, Estivill X (2007). A brain-derived neurotrophic factor (BDNF) haplotype is associated with antidepressant treatment outcome in mood disorders. Pharmacogenomics J. [Epub ahead of print]

This package is available from CRAN (source code, manual and vignettes)

R function dfnbpro

The function requires BayesMedel R package in particular a C program which computes the probability of observing the phenotypes for the whole pedigree (deaf or hearing) given the genotype of the proband. This package is available upon request at BayesMendel lab from The Johns Hopkins University.

The methodology is described in:

·         Gonzalez JR, Wang W, Ballana E, Estivill X (2006). A Recessive Mendelian Model to Predict Carrier Probabilities of DFNB1 for Nonsyndromic Deafness. Human Mutation, 27(11):1135-1142.

 

Packages for dealing with recurrent events

Package frailtypack

This package is written joint with Virginie Rondeau. Frailtypack can be used to estimate the parameters in a shared gamma frailty model with potentially right censored, left truncated and stratified survival data, using maximum penalized likelihood estimation. Time-dependent structure for the explanatory variables and/or estension of the Cox regression model to recurrent events are also allowed. This program can also be used simply to obtain directly a smooth estimates of the baseline hazard function.

The methodology is described and the package used in:

·         V Rondeau, JR Gonzalez (2005). Frailtypack: a computer program for the analysis of correlated failure time data using penalized likelihood estimation. Computer Methods and Programs in Biomedicine, 80:154-64.

This package is available from CRAN: source code and manual

A new version of frailtypack will be available soon!. This new version will include functions for analyzing hierarchical (nested) models, recurrent event data with terminal event as well as an additive frailty model to model the random treatment × trial interaction and the random trial effect jointly in an individual patient data meta-analysis.

Package gcmrec

This package is written joint with Edsel A Peña and Elizabeth Slate. Gcmrec estimates the parameters involved in a general class of models for recurrent event data proposed by Pena and Hollander. This software also estimate a model designed for analyzing relapses in patients diagnosed with cancer considering the effect of treatment after treatment as described in Gonzalez JR, Peña E, Slate (2006).

The methodology is described and the package used in:

·         E Peña, EH Slate, JR Gonzalez (2007). Semiparametric inference for a general class of models for recurrent event data. J Stat Planning Inference, 137:1727-1747.

·         JR Gonzalez, E Peña, E Slate (2005). Modelling intervention effects after cancer relapses. Stat Med, 24:3959-75.

This package is available from CRAN: source code and manual

Package survrec

This package is written joint with Edsel A Peña and Robert Strawderman. Survrec is designed to estimate the survival function for recurrent event data using Pena-Stawderman-Hollander and Wang-Chang estimators and  MLE estimation under a gamma frailty model.

This package is available from CRAN: source code and manual

 

Up