Trinucleotides Based Species Identification by Genomic Taxonomy Using Self Organizing Feature Map
G. Manimannan1, S. Jasmine Farzana2, R. Lakshmi Priya3, V. Suriya4
Citation : G. Manimannan, S. Jasmine Farzana, R. Lakshmi Priya, V. Suriya, Trinucleotides Based Species Identification by Genomic Taxonomy Using Self Organizing Feature Map International Journal of Scientific and Innovative Mathematical Research 2019 , 7(3) : 25-33.
On recent times, the collection of biological data has shown rapid increase with the availability of improvised technologies. These massive amounts of biological data are organized and maintained to assist conducting of experiments and research programs at large scale. This study deals with the extraction of genome sequences, identification of species and their similarities by applying data mining techniques for genomic data and using clustering algorithms to group the related species with similar DNA sequences. Each species have their own specific genome sequences and considering these sequences, we represent the DNA string in numerical characterization. Thus, the frequency counts for each three lettered words of nucleotides are framed. Thus it gives 43=64 keywords of three lettered strings which are recognized as tri-nucleotides. Then these species are represented geometrically for the identification of the species using polar plot. Finally, the method of artificial neural network is used to reduce high dimensionality of the data. The use of SelfOrganizing Feature Map (SOFM) demonstrates the clustering of similar species in close 2D neighborhood and dispersion among the clusters of dissimilar species.