No subject


Mon Jun 5 16:42:55 EDT 2006


entries of Human Nuclear DNA including a Gene with Complete CDS and with
more than one exon have been selected according to assessed selection
criteria (file genbank_filtered.inf).

4450 exons and 3752 introns have been extracted from these entries (files
exons.seq and introns.seq).

Several statistics for such exons and introns (overall nucleotides, average
GC content, number of exons/introns including not AGCT bases, number of
exons/introns in which the annotated end is not found, exon/intron minimum
length, exon/intron maximum length, exon/intron average length, exon/intron
length standard deviation, number of introns in which the sequence does not
start with GT, number of introns in which the sequence does not end with AG)
are reported (files exons.stat and introns.stat).

Then 3762 + 3762 donor and acceptor sites have been extracted as windows of
140 nucleotides around each splice site. After discarding sequences not
including canonical GT-AG junctions (176 +191), including insufficient data
(not enough material for a 140 nucleotide window) (590+547), and including
not AGCT bases (30+32), there are 2955+2992 windows (files GT_true.seq and
AG_true.seq).

Information and several statistics about the splice sites extraction are
reported (files GT_true.inf, AG_true.inf, GT_true.stat, and AG_true.stat).
Finally, there are 287,296+348,370 windows of false splice sites, selected
by searching canonical GT-AG pairs in not splicing positions. The false
sites in a range+/- 60 from a true splice site are marked as proximal (files
GT_false.seq, and AG_false.seq) (Related information: GT_false.inf, and
AG_false.inf).

HS3D is available at the Web server of the University of Sannio
http://www.sci.unisannio.it/docenti/rampone/

-----------
Salvatore Rampone
Facolt di Scienze MM.FF.NN. and INFM
Universit del Sannio
Via Port'Arsa 11
I-82100 Benevento ITALY
E-mail:     rampone at unisannio.it





More information about the Connectionists mailing list