Proceedings of the XLV Italian Society of Agricultural Genetics - SIGA Annual Congress
Salsomaggiore Terme, Italy - 26/29 September, 2001
SEQUENCE DIVERSITY AND SNP MARKER DEVELOPMENT IN NORWAY SPRUCE
DEGLI IVANISSEVICH S.*, MORGANTE M.*,**
* Dipartimento di Produzione Vegetale e Tecnologie Agrarie Università degli Studi di Udine, via delle Scienze 208, 33100 Udine
** DuPont Crop Genetics, Molecular Genetics Group, Delaware Technology Park 200, Newark, DE, USA
SNPs, molecular markers, Norway spruce, genetic diversity
Direct analysis of genetic variation at the sequence level (Single Nucleotide Polymorphisms, SNPs) offers several advantages over other types of DNA marker systems. SNPs are rapidly becoming the marker of choice for many applications in genome analysis due to their abundance (especially important in linkage disequilibrium based mapping approaches) and to the fact that high throughput genotyping methods are being developed for their analysis. The additional advantage offered by this approach lies in the phylogenetic information gathered through sequence variation analysis that allows to draw inferences on allele and population history that cannot be gathered with any of the other marker systems available. However, information on the frequency and distribution of SNPs in plants is limited so far.
With the aim of developing SNPs markers in Norway spruce, we designed 60 primer pairs on cDNA sequences. In such a large (1C=15x109bp) and highly repetitive (80% repetitive DNA) genome, before even attempting to identify SNPs one has to find single-copy regions. EST sequences provide an attractive source of such regions even if the frequency of SNPs may be lower in the protein encoding portions. Introns and untranslated regions should therefore be preferentially targeted, also to get more frequently locus-specific amplification products, especially since the presence of large gene families has been reported for conifers.
Norway spruce is an outcrossing highly heterozygous species, with very large effective population sizes. Based on isozyme and microsatellite data it appears to carry high levels of variability, most of which (>95%) resides within populations. Conifers in general are considered among the most genetically variable plant species. We therefore set out to first estimate the levels and distribution of DNA sequence variation in expressed portions of the spruce genome and secondly to verify the feasibility of SNP marker development from EST sequences. We amplified 300-500 bp long fragments from DNA extracted from seed endosperms (megagametophytes), that are haploid tissues, followed by direct sequencing of the PCR products. The use of haploid tissue and of direct sequencing offers several advantages, namely the possibility of direct identification of sequence haplotypes (multilocus haploid genotypes) without the need for their statistical reconstruction, the possibility of recognizing real allelic polymorphism from sequence variation between different gene family members (based on the assumption that no polymorphism has to be observed within each individual haploid tissue), and the almost complete elimination of false SNPs due to mutations introduced by Taq polymerase.
Panels of 12 endosperms were used that are representative of different European spruce populations. The sequences were aligned using specific software, the single point mutations were identified, their frequencies estimated and the haplotypes were determined.
Based on preliminary data from 13 EST loci, the frequency of nucleotide changes appears to be high, with an average of one SNP every 88 bases overall and one SNP every 30 bp for the introns. These frequencies, which are more than order of magnitude greater that those observed in humans, appear to be even higher that those observed in maize, which is commonly considered a species with extremely high levels of variability. We will present data on additional loci, as well as estimates of relative frequencies of transitions versus transversion, synonymous and non synonymous substitutions, insertion/deletion events, different population diversity parameters and number and distribution of haplotypes. These data will be discussed in light of the characteristics of spruce populations and of our findings on nucleotide substitution rates in the Pinaceae as well as used to derive inferences on the past population history. The possibilities for the development of SNP markers for practical applications such as genetic mapping of traits using whole genome linkage disequilibrium based methods or candidate gene association studies will also be discussed.