Chunk #4 — THE UNIPROT DATABASES — The UniProt Reference Clusters (UniRef)

Source: The Universal Protein Resource (UniProt) in 2010.
Embedded: yes

Text

sequence and name; the number of members and the lowest common taxonomy node are also included. UniRef100 is one of the most comprehensive non-redundant protein sequence datasets available. The reduced size of the UniRef90 and UniRef50 datasets provide faster sequence similarity searches and reduce the research bias in similarity searches by providing a more even sampling of sequence space. UniRef is used for a broad range of applications in the areas of automated genome annotation, family classification, systems biology, structural genomics, phylogenetic analysis and mass spectrometry.