Author: "Rahman, M. Sohel" / Topic: algorithms - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Rahman, M. Sohel"' showing total 42 results

Start Over Author "Rahman, M. Sohel" Topic algorithms

42 results on '"Rahman, M. Sohel"'

1. MAMMLE: A Framework for Phylogeny Estimation Based on Multiobjective Application-aware Multiple Sequence Alignment and Maximum Likelihood Ensemble.

Author: Nayeem MA, Samudro NA, Rahman MS, and Rahman MS
Subjects: Phylogeny, Sequence Alignment, Algorithms, Software
Abstract: Motivation: Phylogenetic trees are often inferred from a multiple sequence alignment (MSA) where the tree accuracy is heavily impacted by the nature of estimated alignment. Carefully equipping an MSA tool with multiple application-aware objectives positively impacts its capability to yield better trees. Results: We introduce Multiobjective Application-aware Multiple Sequence Alignment and Maximum Likelihood Ensemble (MAMMLE), a framework for inferring better phylogenetic trees from unaligned sequences by hybridizing two MSA tools [i.e., Multiple Sequence Comparison by Log-Expectation (MUSCLE) and Multiple Alignment using Fast Fourier Transform (MAFFT)] with multiobjective optimization strategy and leveraging multiple maximum likelihood hypotheses. In our experiments, MAMMLE exhibits 5.57% (4.77%) median improvement (deterioration) over MUSCLE on 50.34% (37.41%) of instances.
Published: 2023
Full Text: View/download PDF

2. Succinylated lysine residue prediction revisited.

Author: Ahmed SS, Rifat ZT, Rahman MS, and Rahman MS
Subjects: Amino Acids chemistry, Sensitivity and Specificity, Protein Processing, Post-Translational, Lysine metabolism, Algorithms
Abstract: Lysine succinylation is a kind of post-translational modification (PTM) that plays a crucial role in regulating the cellular processes. Aberrant succinylation may cause inflammation, cancers, metabolism diseases and nervous system diseases. The experimental methods to detect succinylation sites are time-consuming and costly. This thus calls for computational models with high efficacy, and attention has been given in the literature to develop such models, albeit with only moderate success in the context of different evaluation metrics. One crucial aspect in this context is the biochemical and physicochemical properties of amino acids, which appear to be useful as features for such computational predictors. However, some of the existing computational models did not use the biochemical and physicochemical properties of amino acids. In contrast, some others used them without considering the inter-dependency among the properties. The combinations of biochemical and physicochemical properties derived through our optimization process achieve better results than the results achieved by combining all the properties. We propose three deep learning architectures: CNN+Bi-LSTM (CBL), Bi-LSTM+CNN (BLC) and their combination (CBL_BLC). We find that CBL_BLC outperforms the other two. Ensembling of different models successfully improves the results. Notably, tuning the threshold of the ensemble classifiers further improves the results. Upon comparing our work with other existing works on two datasets, we successfully achieve better sensitivity and specificity by varying the threshold value., (© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)
Published: 2023
Full Text: View/download PDF

3. Quantifying imbalanced classification methods for leukemia detection.

Author: Depto DS, Rizvee MM, Rahman A, Zunair H, Rahman MS, and Mahdy MRC
Subjects: Humans, Algorithms, Precursor Cell Lymphoblastic Leukemia-Lymphoma diagnosis, Precursor Cell Lymphoblastic Leukemia-Lymphoma pathology
Abstract: Uncontrolled proliferation of B-lymphoblast cells is a common characterization of Acute Lymphoblastic Leukemia (ALL). B-lymphoblasts are found in large numbers in peripheral blood in malignant cases. Early detection of the cell in bone marrow is essential as the disease progresses rapidly if left untreated. However, automated classification of the cell is challenging, owing to its fine-grained variability with B-lymphoid precursor cells and imbalanced data points. Deep learning algorithms demonstrate potential for such fine-grained classification as well as suffer from the imbalanced class problem. In this paper, we explore different deep learning-based State-Of-The-Art (SOTA) approaches to tackle imbalanced classification problems. Our experiment includes input, GAN (Generative Adversarial Networks), and loss-based methods to mitigate the issue of imbalanced class on the challenging C-NMC and ALLIDB-2 dataset for leukemia detection. We have shown empirical evidence that loss-based methods outperform GAN-based and input-based methods in imbalanced classification scenarios., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2022 Elsevier Ltd. All rights reserved.)
Published: 2023
Full Text: View/download PDF

4. Multiobjective Formulation of Multiple Sequence Alignment for Phylogeny Inference.

Author: Nayeem MA, Bayzid MS, Rahman AH, Shahriyar R, and Rahman MS
Subjects: Phylogeny, Sequence Alignment, Algorithms, Software
Abstract: Multiple sequence alignment (MSA) is a preliminary task for estimating phylogenies. It is used for homology inference among the sequences of a set of species. Generally, the MSA task is handled as a single-objective optimization process. The alignments computed under one criterion may be different from the alignments generated by other criteria, inferring discordant homologies and thus leading to different hypothesized evolutionary histories relating the sequences. The multiobjective (MO) formulation of MSA has recently been advocated by several researchers, to address this issue. An MO approach independently optimizes multiple (often conflicting) objective functions at the same time and outputs a set of competitive alignments. However, no conceptual or experimental rational from a real-world application perspective has been reported so far for any MO formulation of MSA. This article work investigates the impact of MO formulation in the context of an important scientific problem, namely, phylogeny estimation. Employing popular evolutionary MO algorithms, we show that: 1) trees inferred based on alignments produced by the existing MSA methods used in practice are substantially worse in quality than the trees inferred based on the alignment's output by an MO algorithm and 2) even high-quality alignments (according to popular measures available in the literature) may fail to achieve acceptable accuracy in generating phylogenetic trees. Thus, we essentially ask the following natural question: "can a phylogeny-aware (i.e., application-aware) metric guide in selecting appropriate MO formulations to ensure better phylogeny estimation?" Here, we report a carefully designed extensive experimental study that positively answers this question.
Published: 2022
Full Text: View/download PDF

5. SSG-LUGIA: Single Sequence based Genome Level Unsupervised Genomic Island Prediction Algorithm.

Author: Ibtehaz N, Ahmed I, Ahmed MS, Rahman MS, Azad RK, and Bayzid MS
Subjects: Algorithms, Bacteria genetics, Computational Biology, Gene Transfer, Horizontal, Genome, Bacterial, Genomic Islands
Abstract: Background: Genomic Islands (GIs) are clusters of genes that are mobilized through horizontal gene transfer. GIs play a pivotal role in bacterial evolution as a mechanism of diversification and adaptation to different niches. Therefore, identification and characterization of GIs in bacterial genomes is important for understanding bacterial evolution. However, quantifying GIs is inherently difficult, and the existing methods suffer from low prediction accuracy and precision-recall trade-off. Moreover, several of them are supervised in nature, and thus, their applications to newly sequenced genomes are riddled with their dependency on the functional annotation of existing genomes., Results: We present SSG-LUGIA, a completely automated and unsupervised approach for identifying GIs and horizontally transferred genes. SSG-LUGIA is a novel method based on unsupervised anomaly detection technique, accompanied by further refinement using cues from signal processing literature. SSG-LUGIA leverages the atypical compositional biases of the alien genes to localize GIs in prokaryotic genomes. SSG-LUGIA was assessed on a large benchmark dataset `IslandPick' and on a set of 15 well-studied genomes in the literature and followed by a thorough analysis on the well-understood Salmonella typhi CT18 genome. Furthermore, the efficacy of SSG-LUGIA in identifying horizontally transferred genes was evaluated on two additional bacterial genomes, namely, those of Corynebacterium diphtheria NCTC13129 and Pseudomonas aeruginosa LESB58. SSG-LUGIA was examined on draft genomes and was demonstrated to be efficient as an ensemble method., Conclusions: Our results indicate that SSG-LUGIA achieved superior performance in comparison to frequently used existing methods. Importantly, it yielded a better trade-off between precision and recall than the existing methods. Its nondependency on the functional annotation of genomes makes it suitable for analyzing newly sequenced, yet uncharacterized genomes. Thus, our study is a significant advance in identification of GIs and horizontally transferred genes. SSG-LUGIA is available as an open source software at https://nibtehaz.github.io/SSG-LUGIA/., (© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)
Published: 2021
Full Text: View/download PDF

6. ADACT: a tool for analysing (dis)similarity among nucleotide and protein sequences using minimal and relative absent words.

Author: Akon M, Akon M, Kabir M, Rahman MS, and Rahman MS
Subjects: Phylogeny, Sequence Alignment, Sequence Analysis, DNA, Software, Algorithms, Nucleotides
Abstract: Motivation: Researchers and practitioners use a number of popular sequence comparison tools that use many alignment-based techniques. Due to high time and space complexity and length-related restrictions, researchers often seek alignment-free tools. Recently, some interesting ideas, namely, Minimal Absent Words (MAW) and Relative Absent Words (RAW), have received much interest among the scientific community as distance measures that can give us alignment-free alternatives. This drives us to structure a framework for analysing biological sequences in an alignment-free manner., Results: In this application note, we present Alignment-free Dissimilarity Analysis & Comparison Tool (ADACT), a simple web-based tool that computes the analogy among sequences using a varied number of indexes through the distance matrix, species relation list and phylogenetic tree. This tool basically combines absent word (MAW or RAW) computation, dissimilarity measures, species relationship and thus brings all required software in one platform for the ease of researchers and practitioners alike in the field of bioinformatics. We have also developed a restful API., Availability and Implementation: ADACT has been hosted at http://research.buet.ac.bd/ADACT/., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.)
Published: 2021
Full Text: View/download PDF

7. Protein structure prediction from inaccurate and sparse NMR data using an enhanced genetic algorithm.

Author: Islam ML, Shatabda S, Rashid MA, Khan MGM, and Rahman MS
Subjects: Proteins genetics, Algorithms, Nuclear Magnetic Resonance, Biomolecular, Protein Conformation, Proteins chemistry
Abstract: Nuclear Magnetic Resonance Spectroscopy (most commonly known as NMR Spectroscopy) is used to generate approximate and partial distances between pairs of atoms of the native structure of a protein. To predict protein structure from these partial distances by solving the Euclidean distance geometry problem from the partial distances obtained from NMR Spectroscopy, we can predict three-dimensional (3D) structure of a protein. In this paper, a new genetic algorithm is proposed to efficiently address the Euclidean distance geometry problem towards building 3D structure of a given protein applying NMR's sparse data. Our genetic algorithm uses (i) a greedy mutation and crossover operator to intensify the search; (ii) a twin removal technique for diversification in the population; (iii) a random restart method to recover from stagnation; and (iv) a compaction factor to reduce the search space. Reducing the search space drastically, our approach improves the quality of the search. We tested our algorithms on a set of standard benchmarks. Experimentally, we show that our enhanced genetic algorithms significantly outperforms the traditional genetic algorithms and a previously proposed state-of-the-art method. Our method is capable of producing structures that are very close to the native structures and hence, the experimental biologists could adopt it to determine more accurate protein structures from NMR data., (Copyright © 2019 Elsevier Ltd. All rights reserved.)
Published: 2019
Full Text: View/download PDF

8. DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC.

Author: Rahman MS, Shatabda S, Saha S, Kaykobad M, and Rahman MS
Subjects: Amino Acid Sequence, Amino Acids chemistry, Amino Acids genetics, Amino Acids metabolism, DNA chemistry, DNA genetics, DNA metabolism, DNA-Binding Proteins chemistry, DNA-Binding Proteins genetics, Databases, Protein, Models, Molecular, Nucleic Acid Conformation, Protein Domains, Reproducibility of Results, Algorithms, Computational Biology methods, DNA-Binding Proteins metabolism, Support Vector Machine
Abstract: A DNA-binding protein (DNA-BP) is a protein that can bind and interact with a DNA. Identification of DNA-BPs using experimental methods is expensive as well as time consuming. As such, fast and accurate computational methods are sought for predicting whether a protein can bind with a DNA or not. In this paper, we focus on building a new computational model to identify DNA-BPs in an efficient and accurate way. Our model extracts meaningful information directly from the protein sequences, without any dependence on functional domain or structural information. After feature extraction, we have employed Random Forest (RF) model to rank the features. Afterwards, we have used Recursive Feature Elimination (RFE) method to extract an optimal set of features and trained a prediction model using Support Vector Machine (SVM) with linear kernel. Our proposed method, named as DNA-binding Protein Prediction model using Chou's general PseAAC (DPP-PseAAC), demonstrates superior performance compared to the state-of-the-art predictors on standard benchmark dataset. DPP-PseAAC achieves accuracy values of 93.21%, 95.91% and 77.42% for 10-fold cross-validation test, jackknife test and independent test respectively. The source code of DPP-PseAAC, along with relevant dataset and detailed experimental results, can be found at https://github.com/srautonu/DNABinding. A publicly accessible web interface has also been established at: http://77.68.43.135:8080/DPP-PseAAC/., (Copyright © 2018 Elsevier Ltd. All rights reserved.)
Published: 2018
Full Text: View/download PDF

9. Absent words and the (dis)similarity analysis of DNA sequences: an experimental study.

Author: Rahman MS, Alatabbi A, Athar T, Crochemore M, and Rahman MS
Subjects: Animals, Base Sequence, Humans, Phylogeny, Algorithms, Sequence Analysis, DNA methods
Abstract: Background: An absent word with respect to a sequence is a word that does not occur in the sequence as a factor; an absent word is minimal if all its factors on the other hand occur in that sequence. In this paper we explore the idea of using minimal absent words (MAW) to compute the distance between two biological sequences. The motivation and rationale of our work comes from the potential advantage of being able to extract as little information as possible from large genomic sequences to reach the goal of comparing sequences in an alignment-free manner., Findings: We report an experimental study on the use of absent words as a distance measure among biological sequences. We provide recommendations to use the best index based on our analysis. In particular, our analysis reveals that the best performers are: the length weighted index of relative absent word sets, the length weighted index of the symmetric difference of the MAW sets, and the Jaccard distance between the MAW sets. We also found that during the computation of the absent words, the reverse complements of the sequences should also be considered., Conclusion: The use of MAW to compute the distance between two biological sequences has potential advantage over alignment based methods. It is expected that this potential advantage would encourage researchers and practitioners to use this as a (dis)similarity measure in the context of sequence comparison and phylogeny reconstruction. Therefore, we present here a comparison among different possible models and indexes and pave the path for the biologists and researchers to choose an appropriate model for such comparisons.
Published: 2016
Full Text: View/download PDF

10. A Simple, Fast, Filter-Based Algorithm for Approximate Circular Pattern Matching.

Author: Azim MA, Iliopoulos CS, Rahman MS, and Samiruzzaman M
Subjects: Algorithms, Computational Biology methods, DNA, Circular analysis, DNA, Circular chemistry, DNA, Circular genetics, Pattern Recognition, Automated methods, Sequence Analysis, DNA methods
Abstract: This paper deals with the approximate version of the circular pattern matching (ACPM) problem, which appears as an interesting problem in many biological contexts. The circular pattern matching problem consists in finding all occurrences of the rotations of a pattern P of length m in a text T of length n. In ACPM, we consider occurrences with k -mismatches under the Hamming distance model. In this paper, we present a simple and fast filter-based algorithm to solve the ACPM problem. We compare our algorithm with the state of the art algorithms and the results are found to be excellent. In particular, our algorithm runs almost twice as fast than the state of the art. Much of the efficiency of our algorithm can be attributed to its filters that are effective but extremely simple and lightweight.
Published: 2016
Full Text: View/download PDF

11. Impact of heuristics in clustering large biological networks.

Author: Shafin MK, Kabir KL, Ridwan I, Anannya TT, Karim RS, Hoque MM, and Rahman MS
Subjects: Algorithms, Cluster Analysis, Computational Biology methods, Heuristics
Abstract: Traditional clustering algorithms often exhibit poor performance for large networks. On the contrary, greedy algorithms are found to be relatively efficient while uncovering functional modules from large biological networks. The quality of the clusters produced by these greedy techniques largely depends on the underlying heuristics employed. Different heuristics based on different attributes and properties perform differently in terms of the quality of the clusters produced. This motivates us to design new heuristics for clustering large networks. In this paper, we have proposed two new heuristics and analyzed the performance thereof after incorporating those with three different combinations in a recently celebrated greedy clustering algorithm named SPICi. We have extensively analyzed the effectiveness of these new variants. The results are found to be promising., (Copyright © 2015 Elsevier Ltd. All rights reserved.)
Published: 2015
Full Text: View/download PDF

12. An Integer Programming Formulation of the Minimum Common String Partition Problem.

Author: Ferdous SM and Rahman MS
Subjects: Animals, Computational Biology statistics & numerical data, Humans, Algorithms, Genome, Programming, Linear
Abstract: We consider the problem of finding a minimum common string partition (MCSP) of two strings, which is an NP-hard problem. The MCSP problem is closely related to genome comparison and rearrangement, an important field in Computational Biology. In this paper, we map the MCSP problem into a graph applying a prior technique and using this graph, we develop an Integer Linear Programming (ILP) formulation for the problem. We implement the ILP formulation and compare the results with the state-of-the-art algorithms from the literature. The experimental results are found to be promising.
Published: 2015
Full Text: View/download PDF

13. Pattern matching in indeterminate and Arc-annotated sequences.

Author: Aumi MT, Moosa TM, and Rahman MS
Subjects: Base Sequence, Chromosomes, Human chemistry, Chromosomes, Human genetics, Computational Biology, Humans, Patents as Topic, Pattern Recognition, Automated, RNA, Untranslated chemistry, Algorithms
Abstract: In this paper, we present efficient algorithms for finding indeterminate Arc-Annotated patterns in indeterminate Arc-Annotated references. Our algorithms run in O(m+ (nm) w) time where n and m are respectively the length of our reference and pattern strings and w is the target machine word size. Here we have assumed the alphabet size to be constant, because, indeterminate Arc-Annotated sequences are used to model biological sequences. Clearly, for short patterns, our algorithms run in linear time and efficient algorithms for matching short patterns to reference genomes have huge applications in practical settings. We have also applied our algorithms to scan the ncRNAs without pseudoknots. We scanned three whole human chromosomes and it took only 2.5 - 4 minutes to scan one whole chromosome for an ncRNA family. Some relevant patents are discussed in.
Published: 2013
Full Text: View/download PDF

14. The 1.375 approximation algorithm for sorting by transpositions can run in O(n log n) time.

Author: Firoz JS, Hasan M, Khan AZ, and Rahman MS
Subjects: Algorithms, Computational Biology methods
Abstract: Sorting a permutation by transpositions (SPbT) is an important problem in bioinformtics. In this article, we improve the running time of the best known approximation algorithm for SPbT.
Published: 2011
Full Text: View/download PDF

15. Palindromic Subsequence Automata and Longest Common Palindromic Subsequence

Author: Hasan, Md. Mahbubul, Islam, A. S. M. Sohidull, Rahman, M. Sohel, and Sen, Ayon
Published: 2017
Full Text: View/download PDF

16. Protein Folding in 2D-Triangular Lattice Revisited : (Extended Abstract)

Author: Islam, A. S. M. Shohidull, Rahman, M. Sohel, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Lecroq, Thierry, editor, and Mouchard, Laurent, editor
Published: 2013
Full Text: View/download PDF

17. Indexing a sequence for mapping reads with a single mismatch

Author: Crochemore, Maxime, Langiu, Alessio, and Rahman, M. Sohel
Published: 2014

18. Cache Oblivious Algorithms for the RMQ and the RMSQ Problems

Author: Hasan, Masud, Moosa, Tanaeem M., and Rahman, M. Sohel
Published: 2010
Full Text: View/download PDF

19. Indexing Factors with Gaps

Author: Iliopoulos, Costas S. and Rahman, M. Sohel
Published: 2009
Full Text: View/download PDF

20. A New Efficient Algorithm for Computing the Longest Common Subsequence

Author: Iliopoulos, Costas S. and Rahman, M. Sohel
Published: 2009
Full Text: View/download PDF

21. Neural Network-Based Undersampling Techniques.

Author: Arefeen, Md Adnan, Nimi, Sumaiya Tabassum, and Rahman, M. Sohel
Subjects: MACHINE learning, KEY performance indicators (Management), RETINAL blood vessels, ALGORITHMS, SAMPLING (Process)
Abstract: Machine learning models have gained popularity nowadays for their potential to solve real-life issues when trained on pertinent data. In many cases, the real-life data are class imbalanced and hence the corresponding machine learning models trained on the data tend to perform poorly on metrics like precision, recall, AUC, F1, and G-mean score. Since class imbalance issue poses serious challenges to the performance of trained models, a multitude of research works have addressed this issue. Two common data-based sampling techniques have mostly been proposed-undersampling the data of the majority class and oversampling the data of the minority class. In this article, we focus on the former approach. We propose two novel algorithms that employ neural network-based approaches to remove majority samples that are found to reside in the vicinity of the minority samples, thereby undersampling the former to remove (or alleviate) the imbalance issue. We delineate the proposed algorithms and then test the proposed algorithms on some publicly available imbalanced datasets. We then compare the performance of our proposed algorithms to other popular undersampling algorithms. Finally, we conclude that our proposed algorithms outperform most of the existing undersampling approaches on most performance metrics. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

22. Phylogenetic Analyses of SARS-CoV-2 Strains Reveal Its Link to the Spread of COVID-19 Across the Globe.

Author: Sawmya, Shashata, Saha, Arpita, Tasnim, Sadia, Anjum, Naser, Toufikuzzaman, Md., Rafid, Ali Haisam Muhamad, Rahman, Mohammad Saifur, Rahman, M. Sohel, and Alam, Tanvir
Subjects: PHYLOGENY, COVID-19, SEQUENCE analysis, WORLD health, CONFERENCES & conventions, GENOMES, FACTOR analysis, CLUSTER analysis (Statistics), SARS disease, ALGORITHMS
Abstract: This study leveraged the phylogenetic analysis of more than 10K strains of novel coronavirus (SARS-CoV-2) from 67 countries. Due to the requirement of high-end computational power for phylogenetic analysis, we leverage a fast yet highly accurate alignment-free method to develop the phylogenetic tree out of all the strains of novel coronavirus. K-Means clustering and PCA-based dimension reduction technique were used to identify a representative strain from each location. The resulting phylogenetic tree was able to highlight evolutionary relationships of SARS-CoV-2 genome and, subsequently, linked to the interpretation of facts and figures across the globe for the spread of COVID-19. Our analysis revealed that the geographical boundaries could not be explained by the phylogenetic analysis of novel coronavirus as it placed different countries from Asia, Europe and the USA in very close proximity in the tree. Instead, the commute of people from one country to another is the key to the spread of COVID-19. We believe our study will support the policymakers to contain the spread of COVID-19 globally. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

23. Approximation Algorithms for Three Dimensional Protein Folding.

Author: Shaw, Dipan Lal, Karmaker, Shuvasish, Islam, A. S. M. Shohidull, and Rahman, M. Sohel
Subjects: ALGORITHMS, PROTEIN folding, LATTICE theory, BIOINFORMATICS, HYDROPHOBIC compounds
Abstract: Predicting the secondary structure of a protein using a lattice model is one of the most studied computational problems in bioinformatics. Here the secondary structure or three dimensional structure of a protein is predicted from its amino acid sequence. The secondary structure refers to the local sub-structures of a protein. Simplified energy models have been proposed in the literature on the basis of interaction of amino acid residues in proteins. We focus on a well researched model known as the Hydrophobic-Polar (HP) energy model. In this paper, we propose the hexagonal prism lattice with diagonals that can overcome the problems of other lattice structures, e.g., parity problem. We give two approximation algorithms for protein folding on this lattice using HP model. Our first algorithm leads us to a similar structure of helix structure that is commonly found in a protein structure. This motivates us to propose the next algorithm with a better approximation ratio. Finally, we analyze the algorithms on the basis of intensity of the chemical forces along the different types of edges of hexagonal prism lattice with diagonals. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

24. Computing a Longest Common Subsequence of two strings when one of them is Run Length Encoded

Author: Ahsan, Shegufta Bakht, Moosa, Tanaeem M., Rahman, M. Sohel, and Shahriyar, Shampa
Subjects: run length encoded strings, longest common subsequence, algorithms
Abstract: Given two strings, the longest common subsequence (LCS) problem computes a common subsequence that has the maximum length. In this paper, we present new and efficient algorithms for solving the LCS problem for two strings one of which is run length encoded (RLE). We first present an algorithm that runs in O(gN) time, where g is the length of the RLE string and N is the length of uncompressed string. Then based on the ideas of the above algorithm we present another algorithm that runs in O(Rlog(log g)+N) time, where R is the total number of ordered pairs of positions at which the two strings match. Our first algorithm matches the best algorithm in the literature for the same problem. On the other hand, for R < gN/ log(log)g, our second algorithm outperforms the best algorithms in the literature.
Published: 2011

25. CRISPRpred: A flexible and efficient tool for sgRNAs on-target activity prediction in CRISPR/Cas9 systems.

Author: Rahman, Md. Khaledur and Rahman, M. Sohel
Subjects: *GENOME editing, *CRISPRS, *VIRUS diseases, *DNA, *RNA
Abstract: The CRISPR/Cas9-sgRNA system has recently become a popular tool for genome editing and a very hot topic in the field of medical research. In this system, Cas9 protein is directed to a desired location for gene engineering and cleaves target DNA sequence which is complementary to a 20-nucleotide guide sequence found within the sgRNA. A lot of experimental efforts, ranging from in vivo selection to in silico modeling, have been made for efficient designing of sgRNAs in CRISPR/Cas9 system. In this article, we present a novel tool, called CRISPRpred, for efficient in silico prediction of sgRNAs on-target activity which is based on the applications of Support Vector Machine (SVM) model. To conduct experiments, we have used a benchmark dataset of 17 genes and 5310 guide sequences where there are only 20% true values. CRISPRpred achieves Area Under Receiver Operating Characteristics Curve (AUROC-Curve), Area Under Precision Recall Curve (AUPR-Curve) and maximum Matthews Correlation Coefficient (MCC) as 0.85, 0.56 and 0.48, respectively. Our tool shows approximately 5% improvement in AUPR-Curve and after analyzing all evaluation metrics, we find that CRISPRpred is better than the current state-of-the-art. CRISPRpred is enough flexible to extract relevant features and use them in a learning algorithm. The source code of our entire software with relevant dataset can be found in the following link: . [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

26. An efficient algorithm to detect common ancestor genes for non-overlapping inversion and applications.

Author: Zohora, Fatema Tuz and Rahman, M. Sohel
Subjects: *MOLECULAR genetics, *GEOMETRY, *MATRICES (Mathematics), *ALGEBRA, *ALGORITHMS
Abstract: In this paper, an algorithm is proposed that detects the existence of common ancestor gene sequences for non-overlapping inversion (reversed complement) metric given two input DNA sequences. Theoretical worst case running time complexity of the algorithm is proven to be O ( n 4 ) , where n is the length of each input sequence. However, by experiment, the running time complexity is found to be O ( n 3 ) for the worst case and O ( n 2 ) for average case. Moreover, the worst case occurs when both input sequences have the similarity of around 90% which is very rare. This work is motivated by the purpose of diagnosing an unknown genetic disease that shows allelic heterogeneity , a case where a normal gene mutates in different orders resulting in two different gene sequences causing two different genetic diseases. Our algorithm can potentially save huge energy and cost of the existing diagnostic approaches. The algorithm can be useful as well in the study of breed-related hereditary conditions to determine the genetic spread of a defective gene in the population. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

27. Algorithms for Longest Common Abelian Factors.

Author: Alatabbi, Ali, Iliopoulos, Costas S., Langiu, Alessio, and Rahman, M. Sohel
Subjects: ABELIAN groups, ALGORITHMS, LINEAR systems, STRING theory, QUADRATIC equations
Abstract: In this paper we consider the problem of computing the longest common abelian factor (LCAF) between two given strings. We present a simple time algorithm, where n is the length of the strings and is the alphabet size, and a sub-quadratic running time solution for the binary string case, both having linear space requirement. Furthermore, we present a modified algorithm applying some interesting tricks and experimentally show that the resulting algorithm runs faster. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

28. Finding Patterns in Given Intervals.

Author: Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Kučera, Luděk, Kučera, Antonín, Crochemore, Maxime, Iliopoulos, Costas S., and Rahman, M. Sohel
Abstract: In this paper, we study the pattern matching problem in given intervals. Depending on whether the intervals are given a priori for pre-processing, or during the query along with the pattern or, even in both cases, we develop solutions for different variants of this problem. In particular, we present efficient indexing schemes for each of the above variants of the problem. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

29. Finding Patterns with Variable Length Gaps or Don't Cares.

Author: Chen, Danny Z., Lee, D. T., Rahman, M. Sohel, Iliopoulos, Costas S., Lee, Inbok, Mohamed, Manal, and Smyth, William F.
Abstract: In this paper we have presented new algorithms to handle the pattern matching problem where the pattern can contain variable length gaps. Given a pattern P with variable length gaps and a text T our algorithm works in O(n + m + α log(max1<=i<=l(bi-ai))) time where n is the length of the text, m is the summation of the lengths of the component subpatterns, α is the total number of occurrences of the component subpatterns in the text and ai and bi are, respectively, the minimum and maximum number of don't cares allowed between the ith and (i+1)st component of the pattern. We also present another algorithm which, given a suffix array of the text, can report whether P occurs in T in O(m + α loglogn) time. Both the algorithms record information to report all the occurrences of P in T. Furthermore, the techniques used in our algorithms are shown to be useful in many other contexts. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

30. Protein folding in HP model on hexagonal lattices with diagonals.

Author: Shaw, Dipan Lal, Islam, ASM Shohidull, Rahman, M. Sohel, and Hasan, Masud
Subjects: PROTEIN folding, ALGORITHMS, PROTEIN conformation, ALGEBRA, BIOINFORMATICS
Abstract: Three dimensional structure prediction of a protein from its amino acid sequence, known as protein folding, is one of the most studied computational problem in bioinformatics and computational biology. Since, this is a hard problem, a number of simplified models have been proposed in literature to capture the essential properties of this problem. In this paper we introduce the hexagonal lattices with diagonals to handle the protein folding problem considering the well researched HP model. We give two approximation algorithms for protein folding on this lattice. Our first algorithm is a 5/3 -approximation algorithm, which is based on the strategy of partitioning the entire protein sequence into two pieces. Our next algorithm is also based on partitioning approaches and improves upon the first algorithm. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

31. A graph-theoretic model to solve the approximate string matching problem allowing for translocations.

Author: Ahmed, Pritom, Islam, A.S.M. Shohidull, and Rahman, M. Sohel
Abstract: Abstract: In this paper, we study the approximate string matching problem under a string distance whose edit operations are translocations of equal length factors. We extend a graph-theoretic approach proposed by Rahman and Illiopoulos (2008) to model our problem. In the sequel, we devise efficient algorithms based on this model to solve a number of variants of the string matching problem with translocations. [Copyright &y& Elsevier]
Published: 2013
Full Text: View/download PDF

32. Computing a longest common subsequence that is almost increasing on sequences having no repeated elements.

Author: Moosa, Johra Muhammad, Rahman, M. Sohel, and Zohora, Fatema Tuz
Abstract: Abstract: Given two permutations A and B of and a fixed constant c, we introduce the notion of a longest common almost increasing subsequence (LCAIS) as a longest common subsequence that can be converted to an increasing subsequence by possibly adding a value, that is at most c, to each of the elements. We present an algorithm for computing LCAIS in space, time. [Copyright &y& Elsevier]
Published: 2013
Full Text: View/download PDF

33. Indeterminate string inference algorithms.

Author: Nazeen, Sumaiya, Rahman, M. Sohel, and Reaz, Rezwana
Subjects: DIOPHANTINE analysis, INFERENCE (Logic), ALGORITHMS, ANALYTIC functions, MATHEMATICAL analysis, MOLECULAR biology
Abstract: Abstract: Regularities in indeterminate strings have recently been a matter of interest because of their use in the fields of molecular biology, musical text analysis, cryptanalysis and so on. In this paper, we study the problem of reconstructing an indeterminate string from a border array. We present two efficient algorithms to reconstruct an indeterminate string from a valid border array – one using an unbounded alphabet and the other using minimum sized alphabet. We also propose an algorithm for reconstructing an indeterminate string from suffix array and LCP array. [Copyright &y& Elsevier]
Published: 2012
Full Text: View/download PDF

34. Sub-quadratic time and linear space data structures for permutation matching in binary strings.

Author: Moosa, Tanaeem M. and Rahman, M. Sohel
Subjects: QUADRATIC fields, VECTOR spaces, DATA structures, PERMUTATIONS, BINARY number system, PATTERN perception, LINEAR systems, MATHEMATICAL analysis
Abstract: Abstract: Given a pattern P of length n and a text T of length m, the permutation matching problem asks whether any permutation of P occurs in T. Indexing a string for permutation matching seems to be quite hard in spite of the existence of a simple non-indexed solution. In this paper, we devise several time data structures for a binary string capable of answering permutation queries in time. In particular, we first present two time data structures and then improve the data structure construction time to . The space complexity of the data structures remains linear. [Copyright &y& Elsevier]
Published: 2012
Full Text: View/download PDF

35. Finding Patterns In Given Intervals.

Author: Crochemore, Maxime, Iliopoulos, Costas S., Kubica, Marcin, Rahman, M. Sohel, and Waleń, Tomasz
Subjects: ALGORITHMS, DATA structures, COMPUTER science, INFORMATION retrieval, DATA mining
Abstract: In this paper, we study the pattern matching problem in given intervals. Depending on whether the intervals are given a priori for pre-processing, or during the query along with the pattern or, even in both the cases, we develop efficient solutions for different variants of this problem. In particular, we present efficient indexing schemes for each of the above variants of the problem. [ABSTRACT FROM AUTHOR]
Published: 2010
Full Text: View/download PDF

36. Pancake Flipping with Two Spatulas.

Author: Sharmin, Mahfuza, Yeasmin, Rukhsana, Hasan, Masud, Rahman, Atif, and Rahman, M. Sohel
Subjects: BIOMATHEMATICS, GENOMES, BIOLOGICAL variation, ALGORITHMS, APPROXIMATION theory, PERMUTATIONS
Abstract: Abstract: In this paper, we give approximation algorithms for several variations of the pancake flipping problem, which is also well known as the problem of sorting by prefix reversals. We consider the variations in the sorting process by adding prefix transpositions, prefix transreversals etc. along with the prefix reversals. [ABSTRACT FROM AUTHOR]
Published: 2010
Full Text: View/download PDF

37. Finite automata based algorithms on subsequences and supersequences of degenerate strings.

Author: Iliopoulos, Costas, Rahman, M. Sohel, Voráček, Michal, and Vagner, Ladislav
Subjects: SEQUENTIAL machine theory, ALGORITHMS, MATHEMATICAL sequences, CONSTRAINED optimization, MATHEMATICAL analysis
Abstract: Abstract: In this paper, we present linear-time algorithms for the construction two novel types of finite automata and show how they can be used to efficiently solve the Longest Common Subsequence (LCS), Shortest Common Supersequence (SCS) and Constrained Longest Common Subsequence (CLCS) problems for degenerate strings. [Copyright &y& Elsevier]
Published: 2010
Full Text: View/download PDF

38. A multiprocessor based heuristic for multi-dimensional multiple-choice knapsack problem.

Author: Shahriar, Abu Zafar M., Akbar, M. Mostofa, Rahman, M. Sohel, and Newton, Muhammad Abdul Hakim
Subjects: MULTIPROCESSORS, KNAPSACK problems, ALGORITHMS, HEURISTIC, INTEGER programming, METHODOLOGY
Abstract: This paper presents a multiprocessor based heuristic algorithm for the Multi-dimensional Multiple Choice Knapsack Problem (MMKP). MMKP is a variant of the classical 0–1 knapsack problem, where items having a value and a number of resource requirements are divided into groups. Exactly one item has to be picked up from each group to achieve a maximum total value without exceeding the resource constraint of each type. MMKP has many real world applications including admission control in adaptive multimedia server system. Exact solution to this problem is NP-Hard, and hence is not feasible for real time applications like admission control. Therefore, heuristic solutions have been developed to solve the MMKP. M-HEU is one such heuristic, which solves the MMKP achieving a reasonable percentage of optimality. In this paper, we present a multiprocessor algorithm based on M-HEU, which runs in O( T/ p+ s( p)) time, where T is the time required by the algorithm using single processor, p is the number of processors and s( p), a function of p, is the synchronization overhead. We also present the worst-case analysis of our algorithm, the computation of the optimal number of processors as well as the lower bound of the total value that can be achieved by the heuristic. [ABSTRACT FROM AUTHOR]
Published: 2008
Full Text: View/download PDF

39. IDENTIFYING RHYTHMS IN MUSICAL TEXTS.

Author: CHRISTODOULAKIS, MANOLIS, ILIOPOULOS, COSTAS S., RAHMAN, M. SOHEL, and SMYTH, WILLIAM F.
Subjects: RHYTHM, SEQUENCES (Musical composition), ALGORITHMS, MUSICAL notation, AESTHETICS
Abstract: A fundamental problem in music is to classify songs according to their rhythm. A rhythm is represented by a sequence of “Quick” (Q) and “Slow” (S) symbols, which correspond to the (relative) duration of notes, such that S = 2Q. In this paper, we present an efficient algorithm for locating the maximum-length substring of a music text t that can be covered by a given rhythm r. [ABSTRACT FROM AUTHOR]
Published: 2008
Full Text: View/download PDF

40. Computing the longest common almost-increasing subsequence.

Author: Bhuiyan, Mohammad Tawhidul Hasan, Alam, Muhammad Rashed, and Rahman, M. Sohel
Subjects: *PROBLEM solving
Abstract: In this paper, we revisit the problem of computing longest common almost increasing subsequence (LCAIS) where, given two input sequences, the goal is to compute a common subsequence that is 'almost' increasing. Here the concept of an almost increasing subsequence offers an interesting relaxation over the increasing condition. This problem has been studied in the literature albeit with some constraints. Here, we present a number of a number of algorithmic approaches to solve the problem more generally and efficiently. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

41. The swap matching problem revisited.

Author: Ahmed, Pritom, Iliopoulos, Costas S., Islam, A.S.M. Sohidull, and Rahman, M. Sohel
Subjects: *MATCHING theory, *PROBLEM solving, *GRAPH theory, *ALGORITHMS, *STRING theory
Abstract: In this paper, we revisit the much studied problem of Pattern Matching with Swaps (Swap Matching problem, for short). We first present a graph-theoretic model, which opens a new and so far unexplored avenue to solve the problem. Then, using the model, we devise two efficient algorithms to solve the swap matching problem. The resulting algorithms are adaptations of the classic shift-and algorithm. For patterns having length similar to the word-size of the target machine, both the algorithms run in linear time considering a fixed alphabet. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

42. Improved algorithms for the range next value problem and applications

Author: Crochemore, Maxime, Iliopoulos, Costas S., Kubica, Marcin, Rahman, M. Sohel, Tischler, German, and Waleń, Tomasz
Subjects: *COMPUTER algorithms, *COMPUTER science, *PATTERN recognition systems, *CYBERNETICS, *NUMBER theory, *MATCHING theory
Abstract: Abstract: The Range Next Value problem (problem RNV) is a recent interesting variant of the range search problems, where the query is for the immediate next (or equal) value of a given number within a given interval of an array. Problem RNV was introduced and studied very recently by Crochemore et al. [Maxime Crochemore, Costas S. Iliopoulos, M. Sohel Rahman, Finding patterns in given intervals, in: Antonin Kucera, Ludek Kucera (Eds.), MFCS, 22 in: Lecture Notes in Computer Science, vol. 4708, Springer, 2007, pp. 645–656]. In this paper, we present improved algorithms for problem RNV and algorithms for extended versions of the RNV problem. We also show how this problem can be used to achieve optimal query time for a number of interesting variants of the classic pattern matching problems. [Copyright &y& Elsevier]
Published: 2012
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

42 results on '"Rahman, M. Sohel"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources