Abstract Detail

Molecular Biology

Schoch, Conrad L. [1], Hotton, Carol [1], Klimke, William [1], Federhen, Scott [1].

A Resource for Curating High Quality Reference Sequences for Phylogenetic Analysis in GenBank.

The GenBank/DDBJ/EMBL sequence databases contain an immense amount of primary sequence data (over 100 million sequences to date) of wildly varying quality. GenBank has an obligation to archive these data, but variable annotation quality (particularly with respect to taxonomic identifications) poses problems for users in picking out reliable, well-annotated sequences for phylogenetic or other analyses. NCBI has developed subsidiary databases to curate entries derived from the primary GenBank archive, such as RefSeq Genomes http://www.ncbi.nlm.nih.gov/RefSeq/). The RefSeq Targeted Loci Project will contain curated sets of phylogenetic markers, with16S sequences from the type strains of prokaryotes as the prototype of this new resource. We propose to extend this reference sequence set, first to the fungi but to ultimately include all groups of eukaryotes. Complete sequences of selected genes from type strains of fungal species (or from other reliably identified specimens where type material is not available) will be selected from GenBank for inclusion in this dataset. Minimal annotation will include specimen voucher or culture collection accession (as appropriate), type strain, coequivalent strains, and information about the collection and identification of the specimen. Sequence corrections may also be made (vector/primer removal). The RefSeq Targeted Loci Project datasets will be available by ftp, and will include specialized BLAST databases and pre-computed phylogenetic trees. The goal is to produce a gold standard set of reference sequences for various sequence analysis applications. This presentation is intended to invite input on ways to develop and maintain this resource in collaboration with the mycological and botanical communities.

RefSeq Targeted Loci (gold standard sequences for phylogenetic analyses)

1 - National Center for Biological Information (GenBank), National Library of Medicine, National Institutes of Heath, 45 Center Drive, MSC 6510, Bethesda, Maryland, 20892, USA

data mining
fungi and plants
data curation

Presentation Type: Poster:Posters for Topics
Session: P1
Location: Event Tent/Cliff Lodge
Date: Monday, July 27th, 2009
Time: 5:30 PM
Number: P1MB001
Abstract ID:597

