42 results on '"Rahman, M. Sohel"'
Search Results
2. Succinylated lysine residue prediction revisited.
- Author
-
Ahmed SS, Rifat ZT, Rahman MS, and Rahman MS
- Subjects
- Amino Acids chemistry, Sensitivity and Specificity, Protein Processing, Post-Translational, Lysine metabolism, Algorithms
- Abstract
Lysine succinylation is a kind of post-translational modification (PTM) that plays a crucial role in regulating the cellular processes. Aberrant succinylation may cause inflammation, cancers, metabolism diseases and nervous system diseases. The experimental methods to detect succinylation sites are time-consuming and costly. This thus calls for computational models with high efficacy, and attention has been given in the literature to develop such models, albeit with only moderate success in the context of different evaluation metrics. One crucial aspect in this context is the biochemical and physicochemical properties of amino acids, which appear to be useful as features for such computational predictors. However, some of the existing computational models did not use the biochemical and physicochemical properties of amino acids. In contrast, some others used them without considering the inter-dependency among the properties. The combinations of biochemical and physicochemical properties derived through our optimization process achieve better results than the results achieved by combining all the properties. We propose three deep learning architectures: CNN+Bi-LSTM (CBL), Bi-LSTM+CNN (BLC) and their combination (CBL_BLC). We find that CBL_BLC outperforms the other two. Ensembling of different models successfully improves the results. Notably, tuning the threshold of the ensemble classifiers further improves the results. Upon comparing our work with other existing works on two datasets, we successfully achieve better sensitivity and specificity by varying the threshold value., (© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)
- Published
- 2023
- Full Text
- View/download PDF
3. Quantifying imbalanced classification methods for leukemia detection.
- Author
-
Depto DS, Rizvee MM, Rahman A, Zunair H, Rahman MS, and Mahdy MRC
- Subjects
- Humans, Algorithms, Precursor Cell Lymphoblastic Leukemia-Lymphoma diagnosis, Precursor Cell Lymphoblastic Leukemia-Lymphoma pathology
- Abstract
Uncontrolled proliferation of B-lymphoblast cells is a common characterization of Acute Lymphoblastic Leukemia (ALL). B-lymphoblasts are found in large numbers in peripheral blood in malignant cases. Early detection of the cell in bone marrow is essential as the disease progresses rapidly if left untreated. However, automated classification of the cell is challenging, owing to its fine-grained variability with B-lymphoid precursor cells and imbalanced data points. Deep learning algorithms demonstrate potential for such fine-grained classification as well as suffer from the imbalanced class problem. In this paper, we explore different deep learning-based State-Of-The-Art (SOTA) approaches to tackle imbalanced classification problems. Our experiment includes input, GAN (Generative Adversarial Networks), and loss-based methods to mitigate the issue of imbalanced class on the challenging C-NMC and ALLIDB-2 dataset for leukemia detection. We have shown empirical evidence that loss-based methods outperform GAN-based and input-based methods in imbalanced classification scenarios., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2022 Elsevier Ltd. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
4. Multiobjective Formulation of Multiple Sequence Alignment for Phylogeny Inference.
- Author
-
Nayeem MA, Bayzid MS, Rahman AH, Shahriyar R, and Rahman MS
- Subjects
- Phylogeny, Sequence Alignment, Algorithms, Software
- Abstract
Multiple sequence alignment (MSA) is a preliminary task for estimating phylogenies. It is used for homology inference among the sequences of a set of species. Generally, the MSA task is handled as a single-objective optimization process. The alignments computed under one criterion may be different from the alignments generated by other criteria, inferring discordant homologies and thus leading to different hypothesized evolutionary histories relating the sequences. The multiobjective (MO) formulation of MSA has recently been advocated by several researchers, to address this issue. An MO approach independently optimizes multiple (often conflicting) objective functions at the same time and outputs a set of competitive alignments. However, no conceptual or experimental rational from a real-world application perspective has been reported so far for any MO formulation of MSA. This article work investigates the impact of MO formulation in the context of an important scientific problem, namely, phylogeny estimation. Employing popular evolutionary MO algorithms, we show that: 1) trees inferred based on alignments produced by the existing MSA methods used in practice are substantially worse in quality than the trees inferred based on the alignment's output by an MO algorithm and 2) even high-quality alignments (according to popular measures available in the literature) may fail to achieve acceptable accuracy in generating phylogenetic trees. Thus, we essentially ask the following natural question: "can a phylogeny-aware (i.e., application-aware) metric guide in selecting appropriate MO formulations to ensure better phylogeny estimation?" Here, we report a carefully designed extensive experimental study that positively answers this question.
- Published
- 2022
- Full Text
- View/download PDF
5. SSG-LUGIA: Single Sequence based Genome Level Unsupervised Genomic Island Prediction Algorithm.
- Author
-
Ibtehaz N, Ahmed I, Ahmed MS, Rahman MS, Azad RK, and Bayzid MS
- Subjects
- Algorithms, Bacteria genetics, Computational Biology, Gene Transfer, Horizontal, Genome, Bacterial, Genomic Islands
- Abstract
Background: Genomic Islands (GIs) are clusters of genes that are mobilized through horizontal gene transfer. GIs play a pivotal role in bacterial evolution as a mechanism of diversification and adaptation to different niches. Therefore, identification and characterization of GIs in bacterial genomes is important for understanding bacterial evolution. However, quantifying GIs is inherently difficult, and the existing methods suffer from low prediction accuracy and precision-recall trade-off. Moreover, several of them are supervised in nature, and thus, their applications to newly sequenced genomes are riddled with their dependency on the functional annotation of existing genomes., Results: We present SSG-LUGIA, a completely automated and unsupervised approach for identifying GIs and horizontally transferred genes. SSG-LUGIA is a novel method based on unsupervised anomaly detection technique, accompanied by further refinement using cues from signal processing literature. SSG-LUGIA leverages the atypical compositional biases of the alien genes to localize GIs in prokaryotic genomes. SSG-LUGIA was assessed on a large benchmark dataset `IslandPick' and on a set of 15 well-studied genomes in the literature and followed by a thorough analysis on the well-understood Salmonella typhi CT18 genome. Furthermore, the efficacy of SSG-LUGIA in identifying horizontally transferred genes was evaluated on two additional bacterial genomes, namely, those of Corynebacterium diphtheria NCTC13129 and Pseudomonas aeruginosa LESB58. SSG-LUGIA was examined on draft genomes and was demonstrated to be efficient as an ensemble method., Conclusions: Our results indicate that SSG-LUGIA achieved superior performance in comparison to frequently used existing methods. Importantly, it yielded a better trade-off between precision and recall than the existing methods. Its nondependency on the functional annotation of genomes makes it suitable for analyzing newly sequenced, yet uncharacterized genomes. Thus, our study is a significant advance in identification of GIs and horizontally transferred genes. SSG-LUGIA is available as an open source software at https://nibtehaz.github.io/SSG-LUGIA/., (© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)
- Published
- 2021
- Full Text
- View/download PDF
6. ADACT: a tool for analysing (dis)similarity among nucleotide and protein sequences using minimal and relative absent words.
- Author
-
Akon M, Akon M, Kabir M, Rahman MS, and Rahman MS
- Subjects
- Phylogeny, Sequence Alignment, Sequence Analysis, DNA, Software, Algorithms, Nucleotides
- Abstract
Motivation: Researchers and practitioners use a number of popular sequence comparison tools that use many alignment-based techniques. Due to high time and space complexity and length-related restrictions, researchers often seek alignment-free tools. Recently, some interesting ideas, namely, Minimal Absent Words (MAW) and Relative Absent Words (RAW), have received much interest among the scientific community as distance measures that can give us alignment-free alternatives. This drives us to structure a framework for analysing biological sequences in an alignment-free manner., Results: In this application note, we present Alignment-free Dissimilarity Analysis & Comparison Tool (ADACT), a simple web-based tool that computes the analogy among sequences using a varied number of indexes through the distance matrix, species relation list and phylogenetic tree. This tool basically combines absent word (MAW or RAW) computation, dissimilarity measures, species relationship and thus brings all required software in one platform for the ease of researchers and practitioners alike in the field of bioinformatics. We have also developed a restful API., Availability and Implementation: ADACT has been hosted at http://research.buet.ac.bd/ADACT/., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.)
- Published
- 2021
- Full Text
- View/download PDF
7. Protein structure prediction from inaccurate and sparse NMR data using an enhanced genetic algorithm.
- Author
-
Islam ML, Shatabda S, Rashid MA, Khan MGM, and Rahman MS
- Subjects
- Proteins genetics, Algorithms, Nuclear Magnetic Resonance, Biomolecular, Protein Conformation, Proteins chemistry
- Abstract
Nuclear Magnetic Resonance Spectroscopy (most commonly known as NMR Spectroscopy) is used to generate approximate and partial distances between pairs of atoms of the native structure of a protein. To predict protein structure from these partial distances by solving the Euclidean distance geometry problem from the partial distances obtained from NMR Spectroscopy, we can predict three-dimensional (3D) structure of a protein. In this paper, a new genetic algorithm is proposed to efficiently address the Euclidean distance geometry problem towards building 3D structure of a given protein applying NMR's sparse data. Our genetic algorithm uses (i) a greedy mutation and crossover operator to intensify the search; (ii) a twin removal technique for diversification in the population; (iii) a random restart method to recover from stagnation; and (iv) a compaction factor to reduce the search space. Reducing the search space drastically, our approach improves the quality of the search. We tested our algorithms on a set of standard benchmarks. Experimentally, we show that our enhanced genetic algorithms significantly outperforms the traditional genetic algorithms and a previously proposed state-of-the-art method. Our method is capable of producing structures that are very close to the native structures and hence, the experimental biologists could adopt it to determine more accurate protein structures from NMR data., (Copyright © 2019 Elsevier Ltd. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
8. DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC.
- Author
-
Rahman MS, Shatabda S, Saha S, Kaykobad M, and Rahman MS
- Subjects
- Amino Acid Sequence, Amino Acids chemistry, Amino Acids genetics, Amino Acids metabolism, DNA chemistry, DNA genetics, DNA metabolism, DNA-Binding Proteins chemistry, DNA-Binding Proteins genetics, Databases, Protein, Models, Molecular, Nucleic Acid Conformation, Protein Domains, Reproducibility of Results, Algorithms, Computational Biology methods, DNA-Binding Proteins metabolism, Support Vector Machine
- Abstract
A DNA-binding protein (DNA-BP) is a protein that can bind and interact with a DNA. Identification of DNA-BPs using experimental methods is expensive as well as time consuming. As such, fast and accurate computational methods are sought for predicting whether a protein can bind with a DNA or not. In this paper, we focus on building a new computational model to identify DNA-BPs in an efficient and accurate way. Our model extracts meaningful information directly from the protein sequences, without any dependence on functional domain or structural information. After feature extraction, we have employed Random Forest (RF) model to rank the features. Afterwards, we have used Recursive Feature Elimination (RFE) method to extract an optimal set of features and trained a prediction model using Support Vector Machine (SVM) with linear kernel. Our proposed method, named as DNA-binding Protein Prediction model using Chou's general PseAAC (DPP-PseAAC), demonstrates superior performance compared to the state-of-the-art predictors on standard benchmark dataset. DPP-PseAAC achieves accuracy values of 93.21%, 95.91% and 77.42% for 10-fold cross-validation test, jackknife test and independent test respectively. The source code of DPP-PseAAC, along with relevant dataset and detailed experimental results, can be found at https://github.com/srautonu/DNABinding. A publicly accessible web interface has also been established at: http://77.68.43.135:8080/DPP-PseAAC/., (Copyright © 2018 Elsevier Ltd. All rights reserved.)
- Published
- 2018
- Full Text
- View/download PDF
9. Absent words and the (dis)similarity analysis of DNA sequences: an experimental study.
- Author
-
Rahman MS, Alatabbi A, Athar T, Crochemore M, and Rahman MS
- Subjects
- Animals, Base Sequence, Humans, Phylogeny, Algorithms, Sequence Analysis, DNA methods
- Abstract
Background: An absent word with respect to a sequence is a word that does not occur in the sequence as a factor; an absent word is minimal if all its factors on the other hand occur in that sequence. In this paper we explore the idea of using minimal absent words (MAW) to compute the distance between two biological sequences. The motivation and rationale of our work comes from the potential advantage of being able to extract as little information as possible from large genomic sequences to reach the goal of comparing sequences in an alignment-free manner., Findings: We report an experimental study on the use of absent words as a distance measure among biological sequences. We provide recommendations to use the best index based on our analysis. In particular, our analysis reveals that the best performers are: the length weighted index of relative absent word sets, the length weighted index of the symmetric difference of the MAW sets, and the Jaccard distance between the MAW sets. We also found that during the computation of the absent words, the reverse complements of the sequences should also be considered., Conclusion: The use of MAW to compute the distance between two biological sequences has potential advantage over alignment based methods. It is expected that this potential advantage would encourage researchers and practitioners to use this as a (dis)similarity measure in the context of sequence comparison and phylogeny reconstruction. Therefore, we present here a comparison among different possible models and indexes and pave the path for the biologists and researchers to choose an appropriate model for such comparisons.
- Published
- 2016
- Full Text
- View/download PDF
10. A Simple, Fast, Filter-Based Algorithm for Approximate Circular Pattern Matching.
- Author
-
Azim MA, Iliopoulos CS, Rahman MS, and Samiruzzaman M
- Subjects
- Algorithms, Computational Biology methods, DNA, Circular analysis, DNA, Circular chemistry, DNA, Circular genetics, Pattern Recognition, Automated methods, Sequence Analysis, DNA methods
- Abstract
This paper deals with the approximate version of the circular pattern matching (ACPM) problem, which appears as an interesting problem in many biological contexts. The circular pattern matching problem consists in finding all occurrences of the rotations of a pattern P of length m in a text T of length n. In ACPM, we consider occurrences with k -mismatches under the Hamming distance model. In this paper, we present a simple and fast filter-based algorithm to solve the ACPM problem. We compare our algorithm with the state of the art algorithms and the results are found to be excellent. In particular, our algorithm runs almost twice as fast than the state of the art. Much of the efficiency of our algorithm can be attributed to its filters that are effective but extremely simple and lightweight.
- Published
- 2016
- Full Text
- View/download PDF
11. Impact of heuristics in clustering large biological networks.
- Author
-
Shafin MK, Kabir KL, Ridwan I, Anannya TT, Karim RS, Hoque MM, and Rahman MS
- Subjects
- Algorithms, Cluster Analysis, Computational Biology methods, Heuristics
- Abstract
Traditional clustering algorithms often exhibit poor performance for large networks. On the contrary, greedy algorithms are found to be relatively efficient while uncovering functional modules from large biological networks. The quality of the clusters produced by these greedy techniques largely depends on the underlying heuristics employed. Different heuristics based on different attributes and properties perform differently in terms of the quality of the clusters produced. This motivates us to design new heuristics for clustering large networks. In this paper, we have proposed two new heuristics and analyzed the performance thereof after incorporating those with three different combinations in a recently celebrated greedy clustering algorithm named SPICi. We have extensively analyzed the effectiveness of these new variants. The results are found to be promising., (Copyright © 2015 Elsevier Ltd. All rights reserved.)
- Published
- 2015
- Full Text
- View/download PDF
12. An Integer Programming Formulation of the Minimum Common String Partition Problem.
- Author
-
Ferdous SM and Rahman MS
- Subjects
- Animals, Computational Biology statistics & numerical data, Humans, Algorithms, Genome, Programming, Linear
- Abstract
We consider the problem of finding a minimum common string partition (MCSP) of two strings, which is an NP-hard problem. The MCSP problem is closely related to genome comparison and rearrangement, an important field in Computational Biology. In this paper, we map the MCSP problem into a graph applying a prior technique and using this graph, we develop an Integer Linear Programming (ILP) formulation for the problem. We implement the ILP formulation and compare the results with the state-of-the-art algorithms from the literature. The experimental results are found to be promising.
- Published
- 2015
- Full Text
- View/download PDF
13. Pattern matching in indeterminate and Arc-annotated sequences.
- Author
-
Aumi MT, Moosa TM, and Rahman MS
- Subjects
- Base Sequence, Chromosomes, Human chemistry, Chromosomes, Human genetics, Computational Biology, Humans, Patents as Topic, Pattern Recognition, Automated, RNA, Untranslated chemistry, Algorithms
- Abstract
In this paper, we present efficient algorithms for finding indeterminate Arc-Annotated patterns in indeterminate Arc-Annotated references. Our algorithms run in O(m+ (nm) w) time where n and m are respectively the length of our reference and pattern strings and w is the target machine word size. Here we have assumed the alphabet size to be constant, because, indeterminate Arc-Annotated sequences are used to model biological sequences. Clearly, for short patterns, our algorithms run in linear time and efficient algorithms for matching short patterns to reference genomes have huge applications in practical settings. We have also applied our algorithms to scan the ncRNAs without pseudoknots. We scanned three whole human chromosomes and it took only 2.5 - 4 minutes to scan one whole chromosome for an ncRNA family. Some relevant patents are discussed in.
- Published
- 2013
- Full Text
- View/download PDF
14. The 1.375 approximation algorithm for sorting by transpositions can run in O(n log n) time.
- Author
-
Firoz JS, Hasan M, Khan AZ, and Rahman MS
- Subjects
- Algorithms, Computational Biology methods
- Abstract
Sorting a permutation by transpositions (SPbT) is an important problem in bioinformtics. In this article, we improve the running time of the best known approximation algorithm for SPbT.
- Published
- 2011
- Full Text
- View/download PDF
15. Palindromic Subsequence Automata and Longest Common Palindromic Subsequence
- Author
-
Hasan, Md. Mahbubul, Islam, A. S. M. Sohidull, Rahman, M. Sohel, and Sen, Ayon
- Published
- 2017
- Full Text
- View/download PDF
16. Protein Folding in 2D-Triangular Lattice Revisited : (Extended Abstract)
- Author
-
Islam, A. S. M. Shohidull, Rahman, M. Sohel, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Lecroq, Thierry, editor, and Mouchard, Laurent, editor
- Published
- 2013
- Full Text
- View/download PDF
17. Indexing a sequence for mapping reads with a single mismatch
- Author
-
Crochemore, Maxime, Langiu, Alessio, and Rahman, M. Sohel
- Published
- 2014
18. Cache Oblivious Algorithms for the RMQ and the RMSQ Problems
- Author
-
Hasan, Masud, Moosa, Tanaeem M., and Rahman, M. Sohel
- Published
- 2010
- Full Text
- View/download PDF
19. Indexing Factors with Gaps
- Author
-
Iliopoulos, Costas S. and Rahman, M. Sohel
- Published
- 2009
- Full Text
- View/download PDF
20. A New Efficient Algorithm for Computing the Longest Common Subsequence
- Author
-
Iliopoulos, Costas S. and Rahman, M. Sohel
- Published
- 2009
- Full Text
- View/download PDF
21. Neural Network-Based Undersampling Techniques.
- Author
-
Arefeen, Md Adnan, Nimi, Sumaiya Tabassum, and Rahman, M. Sohel
- Subjects
MACHINE learning ,KEY performance indicators (Management) ,RETINAL blood vessels ,ALGORITHMS ,SAMPLING (Process) - Abstract
Machine learning models have gained popularity nowadays for their potential to solve real-life issues when trained on pertinent data. In many cases, the real-life data are class imbalanced and hence the corresponding machine learning models trained on the data tend to perform poorly on metrics like precision, recall, AUC, F1, and G-mean score. Since class imbalance issue poses serious challenges to the performance of trained models, a multitude of research works have addressed this issue. Two common data-based sampling techniques have mostly been proposed-undersampling the data of the majority class and oversampling the data of the minority class. In this article, we focus on the former approach. We propose two novel algorithms that employ neural network-based approaches to remove majority samples that are found to reside in the vicinity of the minority samples, thereby undersampling the former to remove (or alleviate) the imbalance issue. We delineate the proposed algorithms and then test the proposed algorithms on some publicly available imbalanced datasets. We then compare the performance of our proposed algorithms to other popular undersampling algorithms. Finally, we conclude that our proposed algorithms outperform most of the existing undersampling approaches on most performance metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. Phylogenetic Analyses of SARS-CoV-2 Strains Reveal Its Link to the Spread of COVID-19 Across the Globe.
- Author
-
Sawmya, Shashata, Saha, Arpita, Tasnim, Sadia, Anjum, Naser, Toufikuzzaman, Md., Rafid, Ali Haisam Muhamad, Rahman, Mohammad Saifur, Rahman, M. Sohel, and Alam, Tanvir
- Subjects
PHYLOGENY ,COVID-19 ,SEQUENCE analysis ,WORLD health ,CONFERENCES & conventions ,GENOMES ,FACTOR analysis ,CLUSTER analysis (Statistics) ,SARS disease ,ALGORITHMS - Abstract
This study leveraged the phylogenetic analysis of more than 10K strains of novel coronavirus (SARS-CoV-2) from 67 countries. Due to the requirement of high-end computational power for phylogenetic analysis, we leverage a fast yet highly accurate alignment-free method to develop the phylogenetic tree out of all the strains of novel coronavirus. K-Means clustering and PCA-based dimension reduction technique were used to identify a representative strain from each location. The resulting phylogenetic tree was able to highlight evolutionary relationships of SARS-CoV-2 genome and, subsequently, linked to the interpretation of facts and figures across the globe for the spread of COVID-19. Our analysis revealed that the geographical boundaries could not be explained by the phylogenetic analysis of novel coronavirus as it placed different countries from Asia, Europe and the USA in very close proximity in the tree. Instead, the commute of people from one country to another is the key to the spread of COVID-19. We believe our study will support the policymakers to contain the spread of COVID-19 globally. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
23. Approximation Algorithms for Three Dimensional Protein Folding.
- Author
-
Shaw, Dipan Lal, Karmaker, Shuvasish, Islam, A. S. M. Shohidull, and Rahman, M. Sohel
- Subjects
ALGORITHMS ,PROTEIN folding ,LATTICE theory ,BIOINFORMATICS ,HYDROPHOBIC compounds - Abstract
Predicting the secondary structure of a protein using a lattice model is one of the most studied computational problems in bioinformatics. Here the secondary structure or three dimensional structure of a protein is predicted from its amino acid sequence. The secondary structure refers to the local sub-structures of a protein. Simplified energy models have been proposed in the literature on the basis of interaction of amino acid residues in proteins. We focus on a well researched model known as the Hydrophobic-Polar (HP) energy model. In this paper, we propose the hexagonal prism lattice with diagonals that can overcome the problems of other lattice structures, e.g., parity problem. We give two approximation algorithms for protein folding on this lattice using HP model. Our first algorithm leads us to a similar structure of helix structure that is commonly found in a protein structure. This motivates us to propose the next algorithm with a better approximation ratio. Finally, we analyze the algorithms on the basis of intensity of the chemical forces along the different types of edges of hexagonal prism lattice with diagonals. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
24. Computing a Longest Common Subsequence of two strings when one of them is Run Length Encoded
- Author
-
Ahsan, Shegufta Bakht, Moosa, Tanaeem M., Rahman, M. Sohel, and Shahriyar, Shampa
- Subjects
run length encoded strings ,longest common subsequence ,algorithms - Abstract
Given two strings, the longest common subsequence (LCS) problem computes a common subsequence that has the maximum length. In this paper, we present new and efficient algorithms for solving the LCS problem for two strings one of which is run length encoded (RLE). We first present an algorithm that runs in O(gN) time, where g is the length of the RLE string and N is the length of uncompressed string. Then based on the ideas of the above algorithm we present another algorithm that runs in O(Rlog(log g)+N) time, where R is the total number of ordered pairs of positions at which the two strings match. Our first algorithm matches the best algorithm in the literature for the same problem. On the other hand, for R < gN/ log(log)g, our second algorithm outperforms the best algorithms in the literature.
- Published
- 2011
25. CRISPRpred: A flexible and efficient tool for sgRNAs on-target activity prediction in CRISPR/Cas9 systems.
- Author
-
Rahman, Md. Khaledur and Rahman, M. Sohel
- Subjects
- *
GENOME editing , *CRISPRS , *VIRUS diseases , *DNA , *RNA - Abstract
The CRISPR/Cas9-sgRNA system has recently become a popular tool for genome editing and a very hot topic in the field of medical research. In this system, Cas9 protein is directed to a desired location for gene engineering and cleaves target DNA sequence which is complementary to a 20-nucleotide guide sequence found within the sgRNA. A lot of experimental efforts, ranging from in vivo selection to in silico modeling, have been made for efficient designing of sgRNAs in CRISPR/Cas9 system. In this article, we present a novel tool, called CRISPRpred, for efficient in silico prediction of sgRNAs on-target activity which is based on the applications of Support Vector Machine (SVM) model. To conduct experiments, we have used a benchmark dataset of 17 genes and 5310 guide sequences where there are only 20% true values. CRISPRpred achieves Area Under Receiver Operating Characteristics Curve (AUROC-Curve), Area Under Precision Recall Curve (AUPR-Curve) and maximum Matthews Correlation Coefficient (MCC) as 0.85, 0.56 and 0.48, respectively. Our tool shows approximately 5% improvement in AUPR-Curve and after analyzing all evaluation metrics, we find that CRISPRpred is better than the current state-of-the-art. CRISPRpred is enough flexible to extract relevant features and use them in a learning algorithm. The source code of our entire software with relevant dataset can be found in the following link: . [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
26. An efficient algorithm to detect common ancestor genes for non-overlapping inversion and applications.
- Author
-
Zohora, Fatema Tuz and Rahman, M. Sohel
- Subjects
- *
MOLECULAR genetics , *GEOMETRY , *MATRICES (Mathematics) , *ALGEBRA , *ALGORITHMS - Abstract
In this paper, an algorithm is proposed that detects the existence of common ancestor gene sequences for non-overlapping inversion (reversed complement) metric given two input DNA sequences. Theoretical worst case running time complexity of the algorithm is proven to be O ( n 4 ) , where n is the length of each input sequence. However, by experiment, the running time complexity is found to be O ( n 3 ) for the worst case and O ( n 2 ) for average case. Moreover, the worst case occurs when both input sequences have the similarity of around 90% which is very rare. This work is motivated by the purpose of diagnosing an unknown genetic disease that shows allelic heterogeneity , a case where a normal gene mutates in different orders resulting in two different gene sequences causing two different genetic diseases. Our algorithm can potentially save huge energy and cost of the existing diagnostic approaches. The algorithm can be useful as well in the study of breed-related hereditary conditions to determine the genetic spread of a defective gene in the population. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
27. Algorithms for Longest Common Abelian Factors.
- Author
-
Alatabbi, Ali, Iliopoulos, Costas S., Langiu, Alessio, and Rahman, M. Sohel
- Subjects
ABELIAN groups ,ALGORITHMS ,LINEAR systems ,STRING theory ,QUADRATIC equations - Abstract
In this paper we consider the problem of computing the longest common abelian factor (LCAF) between two given strings. We present a simple time algorithm, where n is the length of the strings and is the alphabet size, and a sub-quadratic running time solution for the binary string case, both having linear space requirement. Furthermore, we present a modified algorithm applying some interesting tricks and experimentally show that the resulting algorithm runs faster. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
28. Finding Patterns in Given Intervals.
- Author
-
Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Kučera, Luděk, Kučera, Antonín, Crochemore, Maxime, Iliopoulos, Costas S., and Rahman, M. Sohel
- Abstract
In this paper, we study the pattern matching problem in given intervals. Depending on whether the intervals are given a priori for pre-processing, or during the query along with the pattern or, even in both cases, we develop solutions for different variants of this problem. In particular, we present efficient indexing schemes for each of the above variants of the problem. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
29. Finding Patterns with Variable Length Gaps or Don't Cares.
- Author
-
Chen, Danny Z., Lee, D. T., Rahman, M. Sohel, Iliopoulos, Costas S., Lee, Inbok, Mohamed, Manal, and Smyth, William F.
- Abstract
In this paper we have presented new algorithms to handle the pattern matching problem where the pattern can contain variable length gaps. Given a pattern P with variable length gaps and a text T our algorithm works in O(n + m + α log(max1<=i<=l(bi-ai))) time where n is the length of the text, m is the summation of the lengths of the component subpatterns, α is the total number of occurrences of the component subpatterns in the text and ai and bi are, respectively, the minimum and maximum number of don't cares allowed between the ith and (i+1)st component of the pattern. We also present another algorithm which, given a suffix array of the text, can report whether P occurs in T in O(m + α loglogn) time. Both the algorithms record information to report all the occurrences of P in T. Furthermore, the techniques used in our algorithms are shown to be useful in many other contexts. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
30. Protein folding in HP model on hexagonal lattices with diagonals.
- Author
-
Shaw, Dipan Lal, Islam, ASM Shohidull, Rahman, M. Sohel, and Hasan, Masud
- Subjects
PROTEIN folding ,ALGORITHMS ,PROTEIN conformation ,ALGEBRA ,BIOINFORMATICS - Abstract
Three dimensional structure prediction of a protein from its amino acid sequence, known as protein folding, is one of the most studied computational problem in bioinformatics and computational biology. Since, this is a hard problem, a number of simplified models have been proposed in literature to capture the essential properties of this problem. In this paper we introduce the hexagonal lattices with diagonals to handle the protein folding problem considering the well researched HP model. We give two approximation algorithms for protein folding on this lattice. Our first algorithm is a 5/3 -approximation algorithm, which is based on the strategy of partitioning the entire protein sequence into two pieces. Our next algorithm is also based on partitioning approaches and improves upon the first algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
31. A graph-theoretic model to solve the approximate string matching problem allowing for translocations.
- Author
-
Ahmed, Pritom, Islam, A.S.M. Shohidull, and Rahman, M. Sohel
- Abstract
Abstract: In this paper, we study the approximate string matching problem under a string distance whose edit operations are translocations of equal length factors. We extend a graph-theoretic approach proposed by Rahman and Illiopoulos (2008) to model our problem. In the sequel, we devise efficient algorithms based on this model to solve a number of variants of the string matching problem with translocations. [Copyright &y& Elsevier]
- Published
- 2013
- Full Text
- View/download PDF
32. Computing a longest common subsequence that is almost increasing on sequences having no repeated elements.
- Author
-
Moosa, Johra Muhammad, Rahman, M. Sohel, and Zohora, Fatema Tuz
- Abstract
Abstract: Given two permutations A and B of and a fixed constant c, we introduce the notion of a longest common almost increasing subsequence (LCAIS) as a longest common subsequence that can be converted to an increasing subsequence by possibly adding a value, that is at most c, to each of the elements. We present an algorithm for computing LCAIS in space, time. [Copyright &y& Elsevier]
- Published
- 2013
- Full Text
- View/download PDF
33. Indeterminate string inference algorithms.
- Author
-
Nazeen, Sumaiya, Rahman, M. Sohel, and Reaz, Rezwana
- Subjects
DIOPHANTINE analysis ,INFERENCE (Logic) ,ALGORITHMS ,ANALYTIC functions ,MATHEMATICAL analysis ,MOLECULAR biology - Abstract
Abstract: Regularities in indeterminate strings have recently been a matter of interest because of their use in the fields of molecular biology, musical text analysis, cryptanalysis and so on. In this paper, we study the problem of reconstructing an indeterminate string from a border array. We present two efficient algorithms to reconstruct an indeterminate string from a valid border array – one using an unbounded alphabet and the other using minimum sized alphabet. We also propose an algorithm for reconstructing an indeterminate string from suffix array and LCP array. [Copyright &y& Elsevier]
- Published
- 2012
- Full Text
- View/download PDF
34. Sub-quadratic time and linear space data structures for permutation matching in binary strings.
- Author
-
Moosa, Tanaeem M. and Rahman, M. Sohel
- Subjects
QUADRATIC fields ,VECTOR spaces ,DATA structures ,PERMUTATIONS ,BINARY number system ,PATTERN perception ,LINEAR systems ,MATHEMATICAL analysis - Abstract
Abstract: Given a pattern P of length n and a text T of length m, the permutation matching problem asks whether any permutation of P occurs in T. Indexing a string for permutation matching seems to be quite hard in spite of the existence of a simple non-indexed solution. In this paper, we devise several time data structures for a binary string capable of answering permutation queries in time. In particular, we first present two time data structures and then improve the data structure construction time to . The space complexity of the data structures remains linear. [Copyright &y& Elsevier]
- Published
- 2012
- Full Text
- View/download PDF
35. Finding Patterns In Given Intervals.
- Author
-
Crochemore, Maxime, Iliopoulos, Costas S., Kubica, Marcin, Rahman, M. Sohel, and Waleń, Tomasz
- Subjects
ALGORITHMS ,DATA structures ,COMPUTER science ,INFORMATION retrieval ,DATA mining - Abstract
In this paper, we study the pattern matching problem in given intervals. Depending on whether the intervals are given a priori for pre-processing, or during the query along with the pattern or, even in both the cases, we develop efficient solutions for different variants of this problem. In particular, we present efficient indexing schemes for each of the above variants of the problem. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
36. Pancake Flipping with Two Spatulas.
- Author
-
Sharmin, Mahfuza, Yeasmin, Rukhsana, Hasan, Masud, Rahman, Atif, and Rahman, M. Sohel
- Subjects
BIOMATHEMATICS ,GENOMES ,BIOLOGICAL variation ,ALGORITHMS ,APPROXIMATION theory ,PERMUTATIONS - Abstract
Abstract: In this paper, we give approximation algorithms for several variations of the pancake flipping problem, which is also well known as the problem of sorting by prefix reversals. We consider the variations in the sorting process by adding prefix transpositions, prefix transreversals etc. along with the prefix reversals. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
37. Finite automata based algorithms on subsequences and supersequences of degenerate strings.
- Author
-
Iliopoulos, Costas, Rahman, M. Sohel, Voráček, Michal, and Vagner, Ladislav
- Subjects
SEQUENTIAL machine theory ,ALGORITHMS ,MATHEMATICAL sequences ,CONSTRAINED optimization ,MATHEMATICAL analysis - Abstract
Abstract: In this paper, we present linear-time algorithms for the construction two novel types of finite automata and show how they can be used to efficiently solve the Longest Common Subsequence (LCS), Shortest Common Supersequence (SCS) and Constrained Longest Common Subsequence (CLCS) problems for degenerate strings. [Copyright &y& Elsevier]
- Published
- 2010
- Full Text
- View/download PDF
38. A multiprocessor based heuristic for multi-dimensional multiple-choice knapsack problem.
- Author
-
Shahriar, Abu Zafar M., Akbar, M. Mostofa, Rahman, M. Sohel, and Newton, Muhammad Abdul Hakim
- Subjects
MULTIPROCESSORS ,KNAPSACK problems ,ALGORITHMS ,HEURISTIC ,INTEGER programming ,METHODOLOGY - Abstract
This paper presents a multiprocessor based heuristic algorithm for the Multi-dimensional Multiple Choice Knapsack Problem (MMKP). MMKP is a variant of the classical 0–1 knapsack problem, where items having a value and a number of resource requirements are divided into groups. Exactly one item has to be picked up from each group to achieve a maximum total value without exceeding the resource constraint of each type. MMKP has many real world applications including admission control in adaptive multimedia server system. Exact solution to this problem is NP-Hard, and hence is not feasible for real time applications like admission control. Therefore, heuristic solutions have been developed to solve the MMKP. M-HEU is one such heuristic, which solves the MMKP achieving a reasonable percentage of optimality. In this paper, we present a multiprocessor algorithm based on M-HEU, which runs in O( T/ p+ s( p)) time, where T is the time required by the algorithm using single processor, p is the number of processors and s( p), a function of p, is the synchronization overhead. We also present the worst-case analysis of our algorithm, the computation of the optimal number of processors as well as the lower bound of the total value that can be achieved by the heuristic. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
39. IDENTIFYING RHYTHMS IN MUSICAL TEXTS.
- Author
-
CHRISTODOULAKIS, MANOLIS, ILIOPOULOS, COSTAS S., RAHMAN, M. SOHEL, and SMYTH, WILLIAM F.
- Subjects
RHYTHM ,SEQUENCES (Musical composition) ,ALGORITHMS ,MUSICAL notation ,AESTHETICS - Abstract
A fundamental problem in music is to classify songs according to their rhythm. A rhythm is represented by a sequence of “Quick” (Q) and “Slow” (S) symbols, which correspond to the (relative) duration of notes, such that S = 2Q. In this paper, we present an efficient algorithm for locating the maximum-length substring of a music text t that can be covered by a given rhythm r. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
40. Computing the longest common almost-increasing subsequence.
- Author
-
Bhuiyan, Mohammad Tawhidul Hasan, Alam, Muhammad Rashed, and Rahman, M. Sohel
- Subjects
- *
PROBLEM solving - Abstract
In this paper, we revisit the problem of computing longest common almost increasing subsequence (LCAIS) where, given two input sequences, the goal is to compute a common subsequence that is 'almost' increasing. Here the concept of an almost increasing subsequence offers an interesting relaxation over the increasing condition. This problem has been studied in the literature albeit with some constraints. Here, we present a number of a number of algorithmic approaches to solve the problem more generally and efficiently. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
41. The swap matching problem revisited.
- Author
-
Ahmed, Pritom, Iliopoulos, Costas S., Islam, A.S.M. Sohidull, and Rahman, M. Sohel
- Subjects
- *
MATCHING theory , *PROBLEM solving , *GRAPH theory , *ALGORITHMS , *STRING theory - Abstract
In this paper, we revisit the much studied problem of Pattern Matching with Swaps (Swap Matching problem, for short). We first present a graph-theoretic model, which opens a new and so far unexplored avenue to solve the problem. Then, using the model, we devise two efficient algorithms to solve the swap matching problem. The resulting algorithms are adaptations of the classic shift-and algorithm. For patterns having length similar to the word-size of the target machine, both the algorithms run in linear time considering a fixed alphabet. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
42. Improved algorithms for the range next value problem and applications
- Author
-
Crochemore, Maxime, Iliopoulos, Costas S., Kubica, Marcin, Rahman, M. Sohel, Tischler, German, and Waleń, Tomasz
- Subjects
- *
COMPUTER algorithms , *COMPUTER science , *PATTERN recognition systems , *CYBERNETICS , *NUMBER theory , *MATCHING theory - Abstract
Abstract: The Range Next Value problem (problem RNV) is a recent interesting variant of the range search problems, where the query is for the immediate next (or equal) value of a given number within a given interval of an array. Problem RNV was introduced and studied very recently by Crochemore et al. [Maxime Crochemore, Costas S. Iliopoulos, M. Sohel Rahman, Finding patterns in given intervals, in: Antonin Kucera, Ludek Kucera (Eds.), MFCS, 22 in: Lecture Notes in Computer Science, vol. 4708, Springer, 2007, pp. 645–656]. In this paper, we present improved algorithms for problem RNV and algorithms for extended versions of the RNV problem. We also show how this problem can be used to achieve optimal query time for a number of interesting variants of the classic pattern matching problems. [Copyright &y& Elsevier]
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.