Publication: A modified Kohonen network for DNA splice junction classification
Issued Date
2004-12-01
Resource Type
Other identifier(s)
2-s2.0-27944473303
Rights
Mahidol University
Rights Holder(s)
SCOPUS
Bibliographic Citation
IEEE Region 10 Annual International Conference, Proceedings/TENCON. Vol.B, (2004)
Suggested Citation
Thanakorn Naenna, Robert A. Bress, Mark J. Embrechts A modified Kohonen network for DNA splice junction classification. IEEE Region 10 Annual International Conference, Proceedings/TENCON. Vol.B, (2004). Retrieved from: https://repository.li.mahidol.ac.th/handle/20.500.14594/21290
Research Projects
Organizational Units
Authors
Journal Issue
Thesis
Title
A modified Kohonen network for DNA splice junction classification
Author(s)
Other Contributor(s)
Abstract
This piper describes an application of Kohonen network, Self-organizing Maps (SOMs), for exon/intron classification in DNA using windowed splice junction data. Splice junctions are groups of nucleotides that serve as boundaries between sections of DNA that code for genetic material and sections that do not. Genes are often interrupted by sections of non-coding DNA sequences. The data used for this study is human DNA data taken from the National Center for Bioinformatics Information (http://www.ncbi.nih.gov/). The DNA dataset contains 1,424 DNA sequences with 128 descriptors for each sequence. SOMs were used to classify each DNA sequence into three categories that are sequences that transition from gene (exon) to non-gene (intron), non-gene (intron) to gene (exon), and no transition categories where the two-basepair code for the splice junction was coincidental. The multidimensional sequences are clustered into a two-dimensional space that was graphically displayed for data exploration and classification. Visual and graphical capabilities of SOMs are applied to classify the DNA dataset. The topographic properties of SOMs preserve similar sequences close to each other on the output map. Clusters of the dataset are determined and labeled based on the classes of the output neuron in the cluster. The highest frequency classes mapped on the output neuron are labeled as the classes of the output neurons. © 2004 IEEE.