SeqWord Project - genome linguistics approaches for
comparative genomics, phylogenomics and mobilomics
Principal Investigator: Dr Oleg Reva
This project addresses the development of an integrated
research environment for data mining in DNA sequences by using genome
linguistics.
This is none-commercial academic project supported by the
National Research Foundation of South Africa (NRF ):
The project was started at the University of Pretoria in
2006. All tools of the SeqWord project were created by post-graduate students as
part of their Hons, MSc or PhD projects. Following tools are available now:
- Genome browser – a tool to visualize the genomic loci belonging to different functional categories (gene islands, functionally indispensable regions, non-coding loci, etc.);
- Genomic Island
Sniffer – a tool for an automatic identification of horizontally acquired genomic islands in bacterial genomes;
- Sniffer GI
Browser – a database and viewer of genomic islands predicted in multiple bacterial genomes;
- GI Databases
– Databases of predicted genomic islands in i) prokaryotes & archaea;
ii) in eukaryots; iii) in human genome;
- Interactive GI
maps – several interactive SVG files representing phylogenetic links between genomic islands identified in different bacteria;
- SWPhylo
– phylogenetic inferencing using parametric whole genome versus gene based comparison;
- GenomeBarcoder
– interactive program for creation and application of diagnostic genetic barcodes;
- OligoDBViewer - a Python program with GUI to
count and store the frequencies of signature 8-14 mer oligonucleotides in
whole sequences of bacterial chromosomes (beta-version);
- MetaLingvo -
general genome linguistic analysis program written in Python
(beta-version).
- LingvoCom -
command line tools for linguistic analysis (Python 2.5).
Visiting of the project Web-sites is monitored by Google
Analytics. A graphical report for the last year is shown below: