A simulation study to assess a variable selection method for selecting single nucleotide polymorphisms associated with disease
Saunders, Ian W
MetadataShow full item record
In genome-wide association studies, where hundreds of thousands of single nucleotide polymorphisms (SNPs) are genotyped, the potential for false positives is high and methods for selecting models with only a few SNPs are required. Methods for variable selection giving sets of SNPs associated with disease have been developed, but are still less common than evaluation of individual SNPs one at a time. To assess the potential improvement available from multi-SNP approaches, we examined the performance of the software GeneRaVE as a variable selection method when applied to SNP data in case-control studies. The method was assessed via simulations, in which a haplotype identified by three SNPs was taken to be associated with the disease. Simulated data sets reflecting different levels and patterns of genetic association with the disease were generated. In order to have a baseline level of performance to assess the method against, we used a generalized linear model using only the three disease susceptibility SNPs to provide an upper bound on the possible performance of the selection methods. To investigate the advantage of using variable selection method as a multivariate method over a single SNP approach, we used chi-squared tests for each of the disease susceptibility (DS) SNPs with correction for multiple testing. Simulation results showed that GeneRaVE performed well and outperformed single SNP analysis using the chi-squared method in identifying disease-related SNPs. In application to a large dataset, it identified SNPs known to be associated with disease that were not identified by single SNP methods.
- Faculty of Science