Characterization of the Eukaryotic Selenoproteome
In selenoproteins, incorporation of the amino acid selenocysteine is
specified by the UGA codon, usually a stop signal. The alternative
decoding of UGA is conferred by an mRNA structure, the SECIS element,
located in the 3'-untranslated region of the selenoprotein mRNA
(see figure). Because of
the non-standard use of the UGA codon, current
computational gene prediction methods are unable to identify
selenoproteins in the sequence of eukaryotic genomes.
We are developing methods to predict selenoproteins in genomic
sequences. The methods rely on the prediction of SECIS elements in
coordination with the prediction of genes in which the strong codon
bias characteristic of protein coding regions extends beyond a TGA
codon interrupting the open reading frame. The program geneid and its ability to
handle external data to tune the predicted genes, together with its capacity to predict
genes with a TGA in-frame is basic for our approach. However, prediction of selenoprotein genes in genomic sequences is particularly difficult and misprediction of only a single amino acid (the selenocysteine residue) may lead to misannotation of selenoproteins. In consequence, eukaryotic selenoproteomes (set of all selenoproteins) remain poorly characterized. The correct identification of selenoproteins is important since they are thought to mediate the biological functions of selenium, which is implicated in processes as diverse as male infertility, prevention of cancer and heart diseases, reduction of viral expression, ageing and the immune function (Hatfield, 2001).
First, in collaboration with two
experimental groups leaded by Montserrat Coromines and Florenci Serras
at the Universitat de Barcelona and Marla J. Berry at the University
of Harvard, we applied the method to the Drosophila melanogaster
genome, and predicted
3 selenoprotein candidates. One of them belongs to a known
family of selenoproteins (SPS2), and we have tested experimentally the two other
predictions with positive results
(Castellano et al. 2001). They belong to the SelH and SelK selenoprotein families.
Second, in collaboration with Vadim Gladyshev's group at the University of Nebraska, we have also used this method and more sophisticated SECIS prediction tools (SECISearch 2.0) to analyze mammalian genomes. After gene and SECIS prediction paired with extensive human-rodent comparisons, we believe the human selenoproteome consists of 25 selenoproteins.
SECIS and gene prediction. (A) General form 1 SECIS divided
into structural units. Form 2 has an extra short stem-loop in the
apical loop. (B) PatScan SECIS pattern to search for both form 1 and
form 2 SECIS. The extra stem-loop in form 2 is not taken into account
when searching. (C) The two possible ways of
prediction for an ideal two exons gene: as a normal gene or as a selenoprotein
gene with a TGA in-frame and a SECIS. Exon defining signals are shown.
(D) False positive selenoprotein genes with either a TGA in-frame or a
SECIS. These partial predictions are not permitted in the gene
Third, we have screened other nonmammalian vertebrate genomes in collaboration with Vadim Gladyshev's group and Alain Krol's lab at the Institut de Biologie Moléculaire et Cellulaire in Strasbourg. By means of a comparative gene prediction method between human and a puffer fish (Takifugu rubripes), a novel nonmammalian selenoprotein family was found, termed SelU.
Finally, we have contributed to the annotation of selenoprotein genes in vertebrate genomes (Tetraodon and chicken). In collaboration with Vadim Gladyshev's group, we are testing a potential new selenoprotein family in fishes.
As a result of this studies and previous works from many groups, 19-20 selenoprotein families have been so far identified in eukaryotic genomes, some of them containing several members. Different families do not show sequence similarity, or related functions. Although selenoproteins have been studied in only a few eukaryotic organisms, existing data suggests that selenoprotein genes and their Cys-containing homologs, are distributed across the whole eukaryotic spectrum in what appears to be a quite species-specific fashion. In any case, if the results obtained here through the analysis of model organisms are representative of more divergent eukaryotic genomes, the certain conclusion is that we comprehend today only a fraction of the selenium-dependent world.
- International Chicken Genome Sequencing Consortium
Sequencing and comparative analysis of the chicken genome
Nature, in press
- International Tetraodon Genome Sequencing Consortium
Analysis of the Tetraodon nigroviridis genome reveals the vertebrate protokaryotype and its duplication in fish
Nature, in press
- S. Castellano, S.V. Novoselov, G.V. Kryukov, A. Lescure, E. Blanco, A. Krol. V.N. Gladyshev and R. Guigó.
Reconsidering the evolution of eukaryotic selenoproteins: a novel nonmammalian family with scattered phylogenetic distribution.
EMBO reports, 5(1):71-77 (2004) [Abstract] [Full text] [Datasets] [Commentary to this paper]
- G.V. Kryukov, S. Castellano, S.V. Novoselov, A.V. Lobanov, O. Zehtab, R. Guigó and V.N. Gladyshev
Characterization of mammalian selenoproteomes.
Science, 300(5624):1439-1443 (2003) [Abstract] [Full Text] [Datasets]
- S. Castellano, N. Morozova, M. Morey, M.J. Berry, F. Serras, M. Corominas and R. Guigó.
In silico identification of novel selenoproteins in the Drosophila melanogaster genome.
EMBO Reports 2(8):697-702 (2001) [Abstract] [Full Text] [Datasets]