Author: "Srinivasan, Parthasarathy" / Publisher: ieee - Searchworks@Jio Institute Digital Library Search Results

1. Transformer-Based Models for Named Entity Recognition: A Comparative Study

Author: R, Prasanna Kumar, primary, G, Bharathi Mohan, additional, Srinivasan, Parthasarathy, additional, and R, Venkatakrishnan, additional
Published: 2023
Full Text: View/download PDF

2. Revisiting Link Prediction on Heterogeneous Graphs with a Multi-view Perspective

Author: Anasua Mitra, Priyesh Vijayan, Sanasam Ranbir Singh, Diganta Goswami, Srinivasan Parthasarathy, and Balaraman Ravindran
Published: 2022

3. DistMILE: A Distributed Multi-Level Framework for Scalable Graph Embedding

Author: Yuntian He, Saket Gurukar, Pouya Kousha, Hari Subramoni, Dhabaleswar K. Panda, and Srinivasan Parthasarathy
Published: 2021

4. Semi-Supervised Community Detection Using Structure and Size

Author: Srinivasan Parthasarathy, Kannan Srinivasan, and Arjun Bakshi
Subjects: Training set, Social network, Computer science, business.industry, Feature vector, Feature extraction, 02 engineering and technology, Machine learning, computer.software_genre, Graph, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: In recent years there have been a few semi-supervised community detection approaches that use community membership information, or node metadata to improve their performance. However, communities have always been thought of as clique-like structures, while the idea of finding and leveraging other patterns in communities is relatively unexplored. Online social networks provide a corpus of real communities in large graphs which can be used to understand dataset specific community patterns. In this paper, we design a way to represent communities concisely in an easy to compute feature space. We design an efficient community detection algorithm that uses size and structural information of communities from a training set to find communities in the rest of the graph. We show that our approach achieves 10% higher F1 scores on average compared to several other methods on large real-world graph datasets, even when the training set is small.
Published: 2018

5. Characterizing I/O optimization opportunities for array-centric applications on HDFS

Author: Srinivasan Parthasarathy, Spyros Blanas, Donghe Kang, Yang Wang, Kalyan Khandrika, and Vedang Patel
Subjects: 010308 nuclear & particles physics, Computer science, business.industry, Interface (computing), Distributed computing, Context (language use), Object (computer science), 01 natural sciences, POSIX, Analytics, 0103 physical sciences, Object-relational impedance mismatch, Data ingestion, Distributed File System, business, 010303 astronomy & astrophysics
Abstract: An impedance mismatch exists between the increasing sophistication of array-centric analytics and the bytestream-based POSIX interface of parallel file systems. This mismatch is particularly acute in data-intensive scientific applications. This paper examines performance bottlenecks and describes optimizations to alleviate them in the context of computational astronomy pipelines and the Hadoop distributed file system (HDFS). We find that fast data ingestion and intelligent object consolidation promise to accelerate I/O performance by two orders of magnitude.
Published: 2018

6. A Pareto Framework for Data Analytics on Heterogeneous Systems: Implications for Green Energy Usage and Performance

Author: Aniket Chakrabarti, Srinivasan Parthasarathy, and Christopher Stewart
Subjects: Computer science, business.industry, Distributed computing, Skew, Pareto principle, Workload, 02 engineering and technology, Energy consumption, Partition (database), 020202 computer hardware & architecture, Renewable energy, Analytics, Distributed algorithm, 0202 electrical engineering, electronic engineering, information engineering, Data analysis, 020201 artificial intelligence & image processing, business, Energy source
Abstract: Distributed algorithms for data analytics partition their input data across many machines for parallel execution. At scale, it is likely that some machines will perform worse than others because they are slower, power constrained or dependent on undesirable, dirty energy sources. It is challenging to balance analytics workloads across heterogeneous machines because the algorithms are sensitive to statistical skew in data partitions. A skewed partition can slow down the whole workload or degrade the quality of results. Sizing partitions in proportion to each machine's performance may introduce or further exacerbate skew. In this paper, we propose a scheme that controls the statistical distribution of each partition and sizes partitions according to the heterogeneity of the computing environment. We model heterogeneity as a multi-objective optimization, with the objectives being functions for execution time and dirty energy consumption. We use stratification to control skew. Experiments show that our computational heterogeneity-aware (Het-Aware) partitioning strategy speeds up running time by up to 51% over the stratified partitioning scheme baseline. We also have a heterogeneity and energy aware (Het-Energy-Aware) partitioning scheme which is slower than the Het-Aware solution but can lower the dirty energy footprint by up to 26%. For some analytic tasks, there is also a significant qualitative benefit when using such partitioning strategies.
Published: 2017

7. Role Discovery in Graphs Using Global Features: Algorithms, Applications and a Novel Evaluation Strategy

Author: Balaraman Ravindran, Srinivasan Parthasarathy, and Pratik Vinay Gupte
Subjects: Clique, Evaluation strategy, Theoretical computer science, Computer science, Feature extraction, Graph partition, 02 engineering and technology, computer.software_genre, Graph, 020204 information systems, Scalability, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Algorithm design, Data mining, Social network analysis, computer, Algorithm
Abstract: In social network analysis, the fundamental idea behind the notion of roles is to discover actors who have similar structural signatures. Actors performing the same role have similar behavioural and functional characteristics. Few examples of structural roles are bridge nodes, clique members and star centers. Role discovery involves partitioning the nodes in a network based on their structural characteristics. The notion of roles is complementary to the notion of community detection, which involves partitioning the network into cohesive subgroups. In this paper we propose a novel algorithm RID"Rs (Role Identification and Discovery using "-equitable Refinements): a graph partitioning approach for extracting soft roles in networks. RID"Rs discovers structural roles based on the global graph characteristics of a network. Evaluating the quality of roles discovered is nontrivial due to the lack of ground-truth role datasets, we present a novel framework for evaluating and comparing various role discovery approaches. We also demonstrate the e ectiveness of RID"Rs on diverse graph mining tasks: role identification/discovery and for finding top-k nodes that are most similar to a given node. Further, the empirical scalability analysis of our proposed algorithm on random power-law graphs shows that our approach is highly scalable.
Published: 2017

8. Connecting Opinions to Opinion-Leaders: A Case Study on Brazilian Political Protests

Author: Bortik Bandyopadhyay, Dárlinton Barbosa Feres Carvalho, Srinivasan Parthasarathy, Renato Ferreira, Ramon Vieira, Fernando Mourão, Alan Neves, and Leonardo Rocha
Subjects: Point (typography), business.industry, Computer science, Process (engineering), Sentiment analysis, Opinion leadership, Context (language use), Sample (statistics), 02 engineering and technology, Public relations, computer.software_genre, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Social media, Data mining, business, Set (psychology), computer
Abstract: Social media applications have assumed an important role in decision-making process of users, affecting their choices about products and services. In this context, understanding and modeling opinions, as well as opinion-leaders, have implications for several tasks, such as recommendation, advertising, brand evaluation etc. Despite the intrinsic relation between opinions and opinion-leaders, most recent works focus exclusively on either understanding the opinions, by Sentiment Analysis (SA) proposals, or identifying opinion-leaders using Influential Users Detection (IUD). This paper presents a preliminary evaluation about a combined analysis of SA and IUD. In this sense, we propose a methodology to quantify factors in real domains that may affect such analysis, as well as the potential benefits of combining SA Methods with IUD ones. Empirical assessments on a sample of tweets about the Brazilian president reveal that the collective opinion and the set of top opinion-leaders over time are inter-related. Further, we were able to identify distinct characteristics of opinion propagation, and that the collective opinion may be accurately estimated by using a few top-k opinion-leaders. These results point out the combined analysis of SA and IUD as a promising research direction to be further exploited.
Published: 2016

9. Green- and heterogeneity-aware partitioning for data analytics

Author: Christopher Stewart, Aniket Chakrabarti, and Srinivasan Parthasarathy
Subjects: 010302 applied physics, business.industry, Computer science, Skew, 02 engineering and technology, Webgraph, computer.software_genre, 01 natural sciences, Partition (database), 020202 computer hardware & architecture, Data modeling, Analytics, Distributed algorithm, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Data analysis, Data mining, business, Energy source, computer
Abstract: Distributed algorithms for analytics partition their input data across many machines for parallel execution. At scale, it is likely that some machines will perform worse than others because they are slower, power constrained or dependent on undesirable, dirty energy sources. It is even more challenging to balance analytics workloads as the algorithms are sensitive to statistical skew across data partitions and not just partition size. Here, we propose a lightweight framework that controls the statistical distribution of each partition and sizes partitions according to the heterogeneity of the environment. We model heterogeneity as a multi-objective optimization, with the objectives being functions for execution time and dirty energy consumption. We use stratification to control data skew and model heterogeneity-aware partitioning. We then discover Pareto-optimal partitioning strategies. We built our partitioning framework atop Redis and measured its performance on data mining workloads with realistic data sets. Our framework simultaneously achieved 34% reduction in time and 21% reduction in dirty energy usage for a popular webgraph compression algorithm using 8 partitions.
Published: 2016

10. Towards a parameter-free and parallel itemset mining algorithm in linearithmic time

Author: Gregory Buehrer, David Fuhry, Srinivasan Parthasarathy, and Roberto L. de Oliveira
Subjects: Data set, Theoretical computer science, Complete lattice, Computer science, Scalability, Proxy (statistics), Time complexity, Data mining algorithm
Abstract: Extracting interesting patterns from large data stores efficiently is a challenging problem in many domains. In the data mining literature, pattern frequency has often been touted as a proxy for interestingness and has been leveraged as a pruning criteria to realize scalable solutions. However, while there exist many frequent pattern algorithms in the literature, all scale exponentially in the worst case, restricting their utility on very large data sets. Furthermore, as we theoretically argue in this article, the problem is very hard to approximate within a reasonable factor, with a polynomial time algorithm. As a counter point to this theoretical result, we present a practical algorithm called Localized Approximate Miner (LAM) that scales linearithmically with the input data. Instead of fully exploring the top of the search lattice to a user-defined point, as traditional mining algorithms do, we instead explore different parts of the complete lattice, efficiently. The key to this efficient exploration is the reliance on min-wise independent permutations to collect the data into highly similar subsets of a partition. It is straightforward to implement and scales to very large data sets. We illustrate its utility on a range of data sets, and demonstrate that the algorithm finds more patterns of higher utility in much less time than several state-of-the-art algorithms. Moreover, we realize a natural multi-level parallelization of LAM that further reduces runtimes by up to 193-fold when leveraging 256 CMP cores spanning 32 machines.
Published: 2015

11. A fast implementation of MLR-MCL algorithm on multi-core processors

Author: P. Sadayappan, Qingpeng Niu, S. M. Faisal, Srinivasan Parthasarathy, and Pai-Wei Lai
Subjects: Speedup, Computer science, Correlation clustering, Markov process, Parallel computing, Graph, symbols.namesake, Scalability, Canopy clustering algorithm, symbols, Algorithm design, Cluster analysis, Algorithm, Clustering coefficient, Sparse matrix
Abstract: Widespread use of stochastic flow based graph clustering algorithms, e.g. Markov Clustering (MCL), has been hampered by their lack of scalability and fragmentation of output. Multi-Level Regularized Markov Clustering (MLR-MCL) is an improvement over Markov Clustering (MCL), providing faster performance and better quality of clusters for large graphs. However, a closer look at MLR-MCL's performance reveals potential for further improvement. In this paper we present a fast parallel implementation of MLR-MCL algorithm via static work partitioning based on analysis of memory footprints. By parallelizing the most time consuming region of the sequential MLR-MCL algorithm, we report up to 10.43x (5.22x in average) speedup on CPU, using 8 datasets from SNAP and 3 PPI datasets. In addition, our algorithm can be adapted to perform general sparse matrix-matrix multiplication (SpGEMM), and our experimental evaluation shows up to 3.50x (1.92x in average) speedup on CPU, and up to 5.12x (2.20x in average) speedup on MIC, comparing to the SpGEMM kernel provided by Intel Math Kernel Library (MKL).
Published: 2014

12. Hash in a flash: Hash tables for flash devices

Author: Srinivasan Parthasarathy, S. M. Faisal, Charu C. Aggarwal, Shirish Tatikonda, and Tyler Clemons
Subjects: Hash tree, Theoretical computer science, Computer engineering, Hash list, HAT-trie, Computer science, Dynamic perfect hashing, Hash function, Data structure, Hash table, Randomness
Abstract: Conservative estimates place the amount of data expected to be created by mankind this year to exceed several thousand exabytes. Given the enormous data deluge, and in spite of recent advances in main memory capacities, there is a clear and present need to move beyond algorithms that assume in-core (main-memory) computation. One fundamental task in Information Retrieval and text analytics requires the maintenance of local and global term frequencies from within large enterprise document corpora. This can be done with a counting hash-table; they associate keys to frequencies. In this paper, we will study the design landscape for the development of such an out-of-core counting hash table targeted at flash storage devices. Flash devices have clear benefits over traditional hard drives in terms of latency of access and energy efficiency. However, due to intricacies in their design, random writes can be relatively expensive and can degrade the life of the flash device. Counting hash tables are a challenging case for the flash drive because this data structure is inherently dependent upon the randomness of the hash function; frequency updates are random and may incur random expensive random writes. We demonstrate how to overcome this challenge by designing a hash table with two related hash functions, one of which exhibits a data placement property with respect to the other. Specifically, we focus on three designs and evaluate the trade-offs among them along the axes of query performance, insert and update times, and I/O time using real-world data and an implementation of TF-IDF.
Published: 2013

13. Stratification driven placement of complex data: A framework for distributed data analytics

Author: Ye Wang, P. Sadayappan, and Srinivasan Parthasarathy
Subjects: Complex data type, Service layer, Database, business.industry, computer.internet_protocol, Computer science, Distributed computing, computer.software_genre, Data analysis, The Internet, business, Programmer, computer, XML, Agile software development
Abstract: With the increasing popularity of XML data stores, social networks and Web 2.0 and 3.0 applications, complex data formats, such as trees and graphs, are becoming ubiquitous. Managing and processing such large and complex data stores, on modern computational eco-systems, to realize actionable information efficiently, is an important challenge. A critical element at the heart of this challenge relates to the placement, storage and access of such tera- and peta- scale data. In this work we develop a novel distributed framework to ease the burden on the programmer and propose an agile and intelligent placement service layer as a flexible yet unified means to address this challenge. Central to our framework is the notion of stratification which seeks to initially group structurally (or semantically) similar entities into strata. Subsequently strata are partitioned within this ecosystem according to the needs of the application to maximize locality, balance load, or minimize data skew. Results on several real-world applications validate the efficacy and efficiency of our approach.
Published: 2013

14. GADBMS: A Framework for Scalable Array Analytics

Author: Tyler Clemons, Srinivasan Parthasarathy, and P. Sadayappan
Subjects: Database, Computer science, Analytics, business.industry, Scalability, Global Arrays, Petabyte, computer.software_genre, business, Database design, computer, Array DBMS, Data administration
Abstract: With the help of advancing technology, the scientific community and data mining community are producing an increasing amount of complex data. This data can be stored in multidimensional arrays and has been known to scale in the petabyte range. An obvious solution is to distribute the data across many nodes and work in parallel. However, optimizing storage for space limitations and access, as well as optimizing in memory execution is not intuitive. Array Database Management Systems (ADBMS) can be used to store these large datasets. This position paper will present an ADBMS supported by the Global Arrays framework that will allow users in both the scientific and data mining communities to efficiently store, access, and operate over large datasets in an easy to use framework we call GADBMS (Global-arrays Array Database Management System).
Published: 2012

15. Extracting Analyzing and Visualizing Triangle K-Core Motifs within Networks

Author: Srinivasan Parthasarathy and Yang Zhang
Subjects: Theoretical computer science, Computational complexity theory, Computer science, Graph drawing, Graph theory, Upper and lower bounds, Graph, MathematicsofComputing_DISCRETEMATHEMATICS, Visualization
Abstract: Cliques are topological structures that usually provide important information for understanding the structure of a graph or network. However, detecting and extracting cliques efficiently is known to be very hard. In this paper, we define and introduce the notion of a Triangle K-Core, a simpler topological structure and one that is more tractable and can moreover be used as a proxy for extracting clique-like structure from large graphs. Based on this definition we first develop a localized algorithm for extracting Triangle K-Cores from large graphs. Subsequently we extend the simple algorithm to accommodate dynamic graphs (where edges can be dynamically added and deleted). Finally, we extend the basic definition to support various template pattern cliques with applications to network visualization and event detection on graphs and networks. Our empirical results reveal the efficiency and efficacy of the proposed methods on many real world datasets.
Published: 2012

16. A Spectral Framework for Detecting Inconsistency across Multi-source Object Relationships

Author: Jiawei Han, Wei Fan, Jing Gao, Deepak S. Turaga, and Srinivasan Parthasarathy
Subjects: business.industry, Pattern recognition, Object (computer science), computer.software_genre, Object detection, Spectral clustering, Set (abstract data type), Graph (abstract data type), Anomaly detection, Artificial intelligence, Data mining, Cluster analysis, business, computer, Multi-source, Mathematics
Abstract: In this paper, we propose to conduct anomaly detection across multiple sources to identify objects that have inconsistent behavior across these sources. We assume that a set of objects can be described from various perspectives (multiple information sources). The underlying clustering structure of normal objects is usually shared by multiple sources. However, anomalous objects belong to different clusters when considering different aspects. For example, there exist movies that are expected to be liked by kids by genre, but are liked by grown-ups based on user viewing history. To identify such objects, we propose to compute the distance between different eigen decomposition results of the same object with respect to different sources as its anomalous score. We also give interpretations from the perspectives of constrained spectral clustering and random walks over graph. Experimental results on several UCI as well as DBLP and Movie Lens datasets demonstrate the effectiveness of the proposed approach.
Published: 2011

17. Approximation algorithms for throughput maximization in wireless networks with delay constraints

Author: Srinivasan Parthasarathy, Anil Vullikanti, Guanhong Pei, and Aravind Srinivasan
Subjects: Mathematical optimization, Computer Networks and Communications, Computer science, Wireless network, End-to-end delay, Approximation algorithm, Throughput, Topology, Throughput maximization, Computer Science Applications, Flow (mathematics), Session (computer science), Electrical and Electronic Engineering, Algorithm, Throughput (business), Software
Abstract: We study the problem of throughput maximization in multihop wireless networks with end-to-end delay constraints for each session. This problem has received much attention starting with the work of Grossglauser and Tse (2002), and it has been shown that there is a significant tradeoff between the end-to-end delays and the total achievable rate. We develop algorithms to compute such tradeoffs with provable performance guarantees for arbitrary instances, with general interference models. Given a target delay-bound Δ(c) for each session c, our algorithm gives a stable flow vector with a total throughput within a factor of O(log Δm/loglog Δm) of the maximum, so that the per-session (end-to-end) delay is O(((log Δm/loglog Δm)Δ(c))2), where Δm=maxc{Δ(c)}; note that these bounds depend only on the delays, and not on the network size, and this is the first such result, to our knowledge.
Published: 2011

18. Locality Sensitive Outlier Detection: A ranking driven approach

Author: Ye Wang, Shirish Tatikonda, and Srinivasan Parthasarathy
Subjects: Artificial neural network, business.industry, Computer science, Locality, Machine learning, computer.software_genre, Ranking (information retrieval), Locality-sensitive hashing, Ranking, Outlier, Anomaly detection, Artificial intelligence, Data mining, business, Cluster analysis, computer
Abstract: Outlier detection is fundamental to a variety of database and analytic tasks. Recently, distance-based outlier detection has emerged as a viable and scalable alternative to traditional statistical and geometric approaches. In this article we explore the role of ranking for the efficient discovery of distance-based outliers from large high dimensional data sets. Specifically, we develop a light-weight ranking scheme that is powered by locality sensitive hashing, which reorders the database points according to their likelihood of being an outlier. We provide theoretical arguments to justify the rationale for the approach and subsequently conduct an extensive empirical study highlighting the effectiveness of our approach over extant solutions. We show that our ranking scheme improves the efficiency of the distance-based outlier discovery process by up to 5-fold. Furthermore, we find that using our approach the top outliers can often be isolated very quickly, typically by scanning less than 3% of the data set.
Published: 2011

19. Hashing tree-structured data: Methods and applications

Author: Srinivasan Parthasarathy and Shirish Tatikonda
Subjects: Theoretical computer science, Universal hashing, Computer science, Dynamic perfect hashing, Hash function, 2-choice hashing, computer.software_genre, Hash table, K-independent hashing, Locality-sensitive hashing, Hopscotch hashing, Tree (data structure), Open addressing, Locality preserving hashing, Data mining, Feature hashing, computer, Perfect hash function
Abstract: In this article we propose a new hashing framework for tree-structured data. Our method maps an unordered tree into a multiset of simple wedge-shaped structures refered to as pivots. By coupling our pivot multisets with the idea of minwise hashing, we realize a fixed sized signature-sketch of the tree-structured datum yielding an effective mechanism for hashing such data. We discuss several potential pivot structures and study some of the theoretical properties of such structures, and discuss their implications to tree edit distance and properties related to perfect hashing. We then empirically demonstrate the efficacy and efficiency of the overall approach on a range of real-world datasets and applications.
Published: 2010

20. Distributed Strategies for Channel Allocation and Scheduling in Software-Defined Radio Networks

Author: Bo Han, Aravind Srinivasan, Srinivasan Parthasarathy, Madhav V. Marathe, and V. S. A. Kumar
Subjects: Schedule, Channel allocation schemes, business.industry, Wireless network, Computer science, Distributed computing, ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS, Hash function, Throughput, Software-defined radio, Scheduling (computing), Channel capacity, Distributed algorithm, Wireless, Radio resource management, business, Computer Science::Information Theory, Computer network, Communication channel
Abstract: Equipping wireless nodes with multiple radios can significantly increase the capacity of wireless networks, by making these radios simultaneously transmit over multiple non-overlapping channels. However, due to the limited number of radios and available orthogonal channels, designing efficient channel assignment and scheduling algorithms in such networks is a major challenge. In this paper, we present provably-good distributed algorithms for simultaneous channel allocation of individual links and packet-scheduling, in software-defined radio (SDR) wireless networks. Our distributed algorithms are very simple to implement, and do not require any coordination even among neighboring nodes. A novel access hash function or random oracle methodology is one of the key drivers of our results. With this access hash function, each radio can know the transmitters' decisions for links in its interference set for each time slot without introducing any extra communication overhead between them. Further, by utilizing the inductive-scheduling technique, each radio can also backoff appropriately to avoid collisions. Extensive simulations demonstrate that our bounds are valid in practice.
Published: 2009

21. Improvement on coined solder surface on organic substrate for flip chip attach yield improvement

Author: W S Ooi, Azlina Nayan, D H Ding, Robert Newman, X.L. Zhao, and Srinivasan Parthasarathy
Subjects: Thermal copper pillar bump, Land grid array, Materials science, Soldering, Surface roughness, Nanotechnology, Substrate (electronics), Integrated circuit packaging, Composite material, Flip chip, Eutectic system
Abstract: This paper describes the improvement on coined solder surface of organic substrate to reduce flip chip assembly defects namely chip misalignment and contact non-wet. Roughening of the eutectic solder surface of the substrate helped to reduce bump misalignment for all packages especially for the tight bump pitch package. Additional pin reflow process for land grid array (LGA) substrates had proven to eliminate contact non wet issue. The surface morphology of the eutectic Sn/Pb bumps in the evaluations is characterized by Scanning Electron Microscopy (SEM), Atomic Force Microscopy (AFM) and X-ray Photoelectron Spectroscopy (XPS). The condition of the solder joint is confirmed by chip pull test, x-ray and electrical test, using open/short test program.
Published: 2008

22. Capacity of Asynchronous Random-Access Scheduling in Wireless Networks

Author: Aravind Srinivasan, Dave Levin, Vineet Kumar, Srinivasan Parthasarathy, Deepti Chafekar, and Madhav V. Marathe
Subjects: Rate-monotonic scheduling, business.industry, Wireless network, Computer science, Distributed computing, Radio Link Protocol, Processor scheduling, Throughput, Dynamic priority scheduling, Round-robin scheduling, Fair-share scheduling, Scheduling (computing), law.invention, Asynchronous communication, law, Two-level scheduling, Maximum throughput scheduling, business, Random access, Computer network
Abstract: We study the throughput capacity of wireless networks which employ (asynchronous) random-access scheduling as opposed to deterministic scheduling. The central question we answer is: how should we set the channel-access probability for each link in the network so that the network operates close to its optimal throughput capacity? We design simple and distributed channel-access strategies for random-access networks which are provably competitive with respect to the optimal scheduling strategy, which is deterministic, centralized, and computationally infeasible. We show that the competitiveness of our strategies are nearly the best achievable via random-access scheduling, thus establishing fundamental limits on the performance of random- access. A notable outcome of our work is that random access compares well with deterministic scheduling when link transmission durations differ by small factors, and much worse otherwise. The distinguishing aspects of our work include modeling and rigorous analysis of asynchronous communication, asymmetry in link transmission durations, and hidden terminals under arbitrary link-conflict based wireless interference models.
Published: 2008

23. Association Control in Mobile Wireless Networks

Author: Srinivasan Parthasarathy, Hao Yang, Zhen Liu, Minkyong Kim, and Dimitrios Pendarakis
Subjects: Mobile radio, Competitive analysis, Wireless network, Computer science, business.industry, Quality of service, Distributed computing, Mobile computing, Handover, Wireless, Algorithm design, Mobile telephony, Online algorithm, business, Greedy algorithm, Mobile device, Computer network
Abstract: As mobile nodes roam in a wireless network, they continuously associate with different access points and perform handoff operations. However, frequent handoffs can potentially incur unacceptable delays and even interruptions for interactive applications. To alleviate these negative impacts, we present novel association control algorithms that can minimize the frequency of handoffs occurred to mobile devices. Specifically, we show that a greedy LookAhead algorithm is optimal in the offline setting, where the user's future mobility is known. Inspired by such optimality, we further propose two online algorithms, namely LookBack and Track, that operate without any future mobility information. Instead, they seek to predict the lifetime of an association using randomization and statistical approaches, respectively. We evaluate the performance of these algorithms using both analysis and trace-driven simulations. The results show that the simple LookBack algorithm has surprisingly a competitive ratio .of (log k + 2), where k is the maximum number of APs that a user can hear at any time, and the Track algorithm can achieve near-optimal performance in practical scenarios.
Published: 2008

24. Approximation Algorithms for Computing Capacity of Wireless Networks with SINR Constraints

Author: V.S.A. Kumart, Deepti Chafekar, Aravind Srinivasan, Madhav V. Marathe, and Srinivasan Parthasarathy
Subjects: Mathematical optimization, Wireless network, Computer science, business.industry, Computer Science::Networking and Internet Architecture, Signal-to-interference-plus-noise ratio, Approximation algorithm, Wireless, Throughput, business, Throughput (business), Time complexity, Computer Science::Information Theory
Abstract: A fundamental problem in wireless networks is to estimate its throughput capacity - given a set of wireless nodes, and a set of connections, what is the maximum rate at which data can be sent on these connections. Most of the research in this direction has focused on either random distributions of points, or has assumed simple graph-based models for wireless interference. In this paper, we study capacity estimation problem using the more general Signal to Interference Plus Noise Ratio (SINR) model for interference, on arbitrary wireless networks. The problem becomes much harder in this setting, because of the non-locality of the SINR model. Recent work by Moscibroda et al. (2006) has shown that the throughput in this model can differ from graph based models significantly. We develop polynomial time algorithms to provably approximate the total throughput in this setting.
Published: 2008

25. Power Efficient Throughput Maximization in Multi-Hop Wireless Networks

Author: Deepti Chafekar, V.S.A. Kumar, Madhav V. Marathe, and Srinivasan Parthasarathy
Subjects: Schedule, Optimization problem, Linear programming, Computer science, Wireless network, Distributed computing, Throughput, Software-defined radio, Throughput maximization, Throughput (business), Wireless sensor network, Hop (networking), Power control
Abstract: We study the problem of total throughput maximization in arbitrary multi-hop wireless networks, with constraints on the total power usage (denoted by PETM), when nodes have the capability to adaptively choose their power levels, which is the case with software defined radio devices. The underlying interference graph changes when power levels change, making PETM a complex cross-layer optimization problem. We develop a linear programming formulation for this problem, that leads to a constant factor approximation to the total throughput rate, for any given bound on the total power usage. Our result is a rigorously provable worst case approximation guarantee, which holds for any instance. Our formulation is generic and can accommodate different interference models and objective functions. We complement our theoretical analysis with simulations and compute the explicit tradeoffs between fairness, total throughput and power usage.
Published: 2008

26. Local Probabilistic Models for Link Prediction

Author: Srinivasan Parthasarathy, Chao Wang, and Venu Satuluri
Subjects: Social network, Semantic similarity, business.industry, Computer science, Probabilistic logic, Data mining, Graphical model, computer.software_genre, business, Social network analysis, computer, Graph, Probability measure
Abstract: One of the core tasks in social network analysis is to predict the formation of links (i.e. various types of relationships) over time. Previous research has generally represented the social network in the form of a graph and has leveraged topological and semantic measures of similarity between two nodes to evaluate the probability of link formation. Here we introduce a novel local probabilistic graphical model method that can scale to large graphs to estimate the joint co-occurrence probability of two nodes. Such a probability measure captures information that is not captured by either topological measures or measures of semantic similarity, which are the dominant measures used for link prediction. We demonstrate the effectiveness of the co-occurrence probability feature by using it both in isolation and in combination with other topological and semantic features for predicting co-authorship collaborations on real datasets.
Published: 2007

27. Efficient Design of End-to-End Probes for Source-Routed Networks

Author: Rajeev Rastogi, Srinivasan Parthasarathy, and Marina Thottan
Subjects: Voice over IP, End-to-end principle, Computer science, business.industry, Network packet, Heuristic (computer science), Network delay, Topology (electrical circuits), Service provider, business, Network topology, Computer network
Abstract: Migration to a converged network has caused service providers to deploy real time applications such as voice over an IP (VoIP) network. From the provider's perspective, the success of such emerging multimedia services over IP networks depend on mechanisms which help understand the network-wide end-to-end performance dynamics. In this work, we present a mechanism to design efficient probes for measuring end-to-end performance impairments such as network delay and loss for a specific service in the provider network. We address two main issues related to deploying network probes: (1) the need for correlating the topology data with the measured values and (2) reducing the amount of probe traffic. We use explicitly routed probe packets to alleviate the need for correlation with topology measurements. We also present a 3.5-approximation algorithm for designing probe-sets which cover all the edges in the network. Further, we explore techniques for using observed performance degradations in a given set of probes to isolate the miscreant-edge which caused the degradations. We state a precise characterization for probe-sets which isolate miscreant edges in the network; this also suggests a natural heuristic for miscreant-edge detection. Simulations on ISP topologies obtained from the RocketFuel project show that our algorithms perform much better than the analytically guaranteed bounds and are near-optimal in practice with respect to probe costs.
Published: 2007

28. Knowledge and Cache Conscious Algorithm Design and Systems Support for Data Mining Algorithms

Author: Shirish Tatikonda, Tahsin Kurc, Matthew Goyder, Gregory Buehrer, Joel H. Saltz, Srinivasan Parthasarathy, Xiaodong Zhang, and Amol Ghoting
Subjects: Knowledge extraction, Data stream mining, Computer science, Middleware, Locality, Leverage (statistics), Probabilistic analysis of algorithms, Algorithm design, Cache, Data mining, Cluster analysis, computer.software_genre, computer
Abstract: The knowledge discovery process is interactive in nature and therefore minimizing query response time is imperative. The compute and memory intensive nature of data mining algorithms makes this task challenging. We propose to improve the performance of data mining algorithms by re-architecting algorithms and designing effective systems support. From the view point of re-architecting algorithms, knowledge-conscious and cache-conscious design strategies are presented. Knowledge-conscious algorithm designs try and re-use repeated computation between iterations and across executions of a data mining algorithm. Cache-conscious algorithm designs on the other hand reduce execution time by maximizing data locality and reuse. The design of systems support that allows a variety of data mining algorithms to leverage knowledge-caching and cache-conscious placement with minimal implementation efforts is also presented.
Published: 2007

29. Adaptive Parallel Graph Mining for CMP Architectures

Author: Srinivasan Parthasarathy, Yen-Kuang Chen, and Gregory Buehrer
Subjects: Speedup, Shared memory, Computer science, Distributed computing, Scalability, Problem statement, Commodity computing, Parallel computing, Graph
Abstract: Mining graph data is an increasingly popular challenge, which has practical applications in many areas, including molecular substructure discovery, web link analysis, fraud detection, and social network analysis. The problem statement is to enumerate all subgraphs occurring in at least \sigma graphs of a database, where \sigma is a user specified parameter. Chip Multiprocessors (CMPs) provide true parallel processing, and are expected to become the de facto standard for commodity computing. In this work, building on the state-of-the-art, we propose an efficient approach to parallelize such algorithms for CMPs. We show that an algorithm which adapts its behavior based on the runtime state of the system can improve system utilization and lower execution times. Most notably, we incorporate dynamic state management to allow memory consumption to vary based on availability. We evaluate our techniques on current day shared memory systems (SMPs) and expect similar performance for CMPs. We demonstrate excellent speedup, 27- fold on 32 processors for several real world datasets. Additionally, we show our dynamic techniques afford this scalability while consuming up to 35% less memory than static techniques.
Published: 2006

30. Effective Pre-Processing Strategies for Functional Clustering of a Protein-Protein Interactions Network

Author: Srinivasan Parthasarathy, Chao Wang, Duygu Ucar, and Sitaram Asur
Subjects: Theoretical computer science, Computer science, Graph partition, computer.software_genre, Weighting, ComputingMethodologies_PATTERNRECOGNITION, Transformation (function), Key (cryptography), False positive paradox, Cluster (physics), Preprocessor, Data mining, Cluster analysis, computer
Abstract: In this article we present novel preprocessing techniques, based on typological measures of the network, to identify clusters of proteins from protein-protein interaction (PPI) networks wherein each cluster corresponds to a group of functionally similar proteins. The two main problems with analyzing protein-protein interaction networks are their scale-free property and the large number of false positive interactions that they contain. Our preprocessing techniques use a key transformation and separate weighting functions to effectively eliminate suspect edges, potential false positives, from the graph. A useful side-effect of this transformation is that the resulting graph is no longer scale free. We then examine the application of two well-known clustering techniques, namely hierarchical and multilevel graph partitioning on the reduced network. We define suitable statistical metrics to evaluate our clusters meaningfully. From our study, we discover that the application of clustering on the pre-processed network results in significantly improved, biologically relevant and balanced clusters when compared with clusters derived from the original network. We strongly believe that our strategies would prove invaluable to future studies on prediction of protein functionality from PPI networks.
Published: 2006

31. A Multi-Level Approach to SCOP Fold Recognition

Author: Keith Marsolo, Chris Ding, and Srinivasan Parthasarathy
Subjects: Alternative methods, Protein function, business.industry, Pattern recognition, Structural Classification of Proteins database, Biology, Machine learning, computer.software_genre, Naive Bayes classifier, Local optimum, Protein structure, Classification methods, Artificial intelligence, business, computer, Classifier (UML)
Abstract: The classification of proteins based on their structure can play an important role in the deduction or discovery of protein function. However, the relatively low number of solved protein structures and the unknown relationship between structure and sequence requires an alternative method of representation for classification to be effective. Furthermore, the large number of potential folds causes problems for many classification strategies, increasing the likelihood that the classifier will reach a local optima while trying to distinguish between all of the possible structural categories. Here we present a hierarchical strategy for structural classification that first partitions proteins based on their SCOP class before attempting to assign a protein fold. Using a well-known dataset derived from the 27 most-populated SCOP folds and several sequence-based descriptor properties as input features, we test a number of classification methods, including Naive Bayes and Boosted C4.5. Our strategy achieves an average fold recognition of 74%, which is significantly higher than the 56-60% previously reported in the literature, indicating the effectiveness of a multi-level approach.
Published: 2006

32. Alternate Representation of Distance Matrices for Characterization of Protein Structure

Author: Srinivasan Parthasarathy and Keith Marsolo
Subjects: Discrete wavelet transform, business.industry, Stationary wavelet transform, Second-generation wavelet transform, Wavelet transform, Pattern recognition, Cascade algorithm, Wavelet packet decomposition, Combinatorics, Wavelet, Artificial intelligence, business, Harmonic wavelet transform, Mathematics
Abstract: The most suitable method for the automated classification of protein structures remains an open problem in computational biology. In order to classify a protein structure with any accuracy, an effective representation must be chosen. Here we present two methods of representing protein structure. One involves representing the distances between the C/sub a/ atoms of a protein as a two-dimensional matrix and creating a model of the resulting surface with Zernike polynomials. The second uses a wavelet-based approach. We convert the distances between a protein's C/sub a/ atoms into a one-dimensional signal which is then decomposed using a discrete wavelet transformation. Using the Zernike coefficients and the approximation coefficients of the wavelet decomposition as feature vectors, we test the effectiveness of our representation with two different classifiers on a dataset of more than 600 proteins taken from the 27 most-populated SCOP folds. We find that the wavelet decomposition greatly outperforms the Zernike model. With the wavelet representation, we achieve an accuracy of approximately 56%, roughly 12% higher than results reported on a similar, but less-challenging dataset. In addition, we can couple our structure-based feature vectors with several sequence-based properties to increase accuracy another 5-7%. Finally, we use a multi-stage classification strategy on the combined features to increase performance to 78%, an improvement in accuracy of more than 15-20% and 34% over the highest reported sequence-based and structure-based classification results, respectively.
Published: 2006

33. Mining Spatial and Spatio-Temporal Patterns in Scientific Data

Author: Srinivasan Parthasarathy and Hui Yang
Subjects: Data set, Process (engineering), Computer science, Intrusion detection system, Data mining, Focus (optics), computer.software_genre, Data science, computer, Variety (cybernetics), Personalization, Domain (software engineering)
Abstract: Data mining is the process of discovering hidden and meaningful knowledge in a data set. It has been successfully applied to many real-life problems, for instance, web personalization, network intrusion detection, and customized marketing. Recent advances in computational sciences have led to the application of data mining to various scientific domains, such as astronomy and bioinformatics, to facilitate the understanding of different scientific processes in the underlying domain. In this thesis work, we focus on designing and applying data mining techniques to analyze spatial and spatiotemporal data originated in scientific domains. Examples of spatial and spatio-temporal data in scientific domains include data describing protein structures and data produced from protein folding simulations, respectively. Specifically, we have proposed a generalized framework to effectively discover different types of spatial and spatio-temporal patterns in scientific data sets. Such patterns can be used to capture a variety of interactions among objects of interest and the evolutionary behavior of such interactions. We have applied the framework to analyze data originated in the following three application domains: bioinformatics, computational molecular dynamics, and computational fluid dynamics. Empirical results demonstrate that the discovered patterns are meaningful in the underlying domain and can provide important insights into various scientific phenomena.
Published: 2006

34. Similarity Searching in Peer-to-Peer Databases

Author: Srinivas Kashyap, Indrajit Bhattacharya, and Srinivasan Parthasarathy
Subjects: Pastry, Information retrieval, Database, Computer science, Nearest neighbor search, Search engine indexing, Peer-to-peer, Load balancing (computing), computer.software_genre, Cluster analysis, Chord (peer-to-peer), computer, Database index
Abstract: We consider the problem of handling similarity queries in peer-to-peer databases. We propose an indexing and searching mechanism which, given a query object, returns the set of objects in the database that are semantically related to the query. We propose an indexing scheme which clusters data such that semantically related objects are partitioned into a small set of clusters, allowing for a simple and efficient similarity search strategy. Our indexing scheme also decouples object and node locations. Our adaptive replication and randomized lookup schemes exploit this feature and ensure that the number of copies of an object is proportional to its popularity and all replicas are equally likely to serve a given query, thus achieving perfect load balancing. The techniques developed in this work are oblivious to the underlying DHT topology and can be implemented on a variety of structured overlays such as CAN, CHORD, Pastry, and Tapestry. We also present DHT-independent analytical guarantees for the performance of our algorithms in terms of search accuracy, cost, and load-balance; the experimental results from our simulations confirm the insights derived from these analytical models
Published: 2005

35. Provable Algorithms for Parallel Sweep Scheduling on Unstructured Meshes

Author: V. S. Anil Kumar, Aravind Srinivasan, Madhav V. Marathe, Sibylle Zust, and Srinivasan Parthasarathy
Subjects: Computer science, Computation, Parallel algorithm, Processor scheduling, Algorithm design, Polygon mesh, Heuristics, Algorithm, Time complexity, Randomized algorithm, Scheduling (computing)
Abstract: We present provably efficient parallel algorithms for sweep scheduling on unstructured meshes. Sweep scheduling is a commonly used technique in radiation transport problems, and involves inverting an operator by iteratively sweeping across a mesh. Each sweep involves solving the operator locally at each cell. However, each direction induces a partial order in which this computation can proceed. On a distributed computing system, the goal is to schedule the computation, so that the length of the schedule is minimized. Several heuristics have been proposed for this problem; but none of the heuristics have worst case performance guarantees. We present a simple, almost linear time randomized algorithm which (provably) gives a schedule of length at most O(log/sup 2/n) times the optimal schedule for instances with n cells, when the communication cost is not considered, and a slight variant, which coupled with a much more careful analysis, gives a schedule of (expected) length O(logmlogloglogm) times the optimal schedule for m processors. These are the first such provable guarantees for this problem. We also design a priority based list schedule using these ideas, with the same theoretical guarantee, but much better performance in practice. We complement our theoretical results with extensive empirical analysis. The results show that (i) our algorithm performs very well and has significantly better performance guarantee in practice and (ii) the algorithm compares favorably with other natural and efficient parallel algorithms proposed in the literature [S. Pautz, (2002), S. Plimpton et al., (2001)].
Published: 2005

36. A Services Oriented Framework for Next Generation Data Analysis Centers

Author: Shirish Tatikonda, Srinivasan Parthasarathy, Amol Ghoting, Hao Wang, Gregory Buehrer, Joel H. Saltz, and Tahsin Kurc
Subjects: Software framework, Service (systems architecture), Iterative and incremental development, Knowledge extraction, Database, Data stream mining, Computer science, Dynamic data, Middleware, Middleware (distributed applications), computer.software_genre, Data science, computer
Abstract: Over the past decade, advances in computational and sensor technology have enabled us to dynamically collect vast amounts of data from observations, health screening tests, simulations, and experiments at an ever-increasing pace. Knowledge discovery and data mining is an iterative process concerned with deriving interesting, non-obvious, and useful patterns and models from such large volumes of data. Although inexpensive storage is conducive to maintaining said data, accessing and managing it for knowledge discovery and data mining becomes a performance issue when datasets are large, dynamic, and distributed. In this work, we present our vision of a software framework consisting of middleware services to support interactive data mining over dynamic data at data analysis centers built on top of heterogeneous clusters. The design of a sampling service for dynamic data, together with initial performance results, are also presented.
Published: 2005

37. Correlation Preserving Discretization

Author: Sameep Mehta, Srinivasan Parthasarathy, and Hui Yang
Subjects: Structure (mathematical logic), Multivariate statistics, Discretization, business.industry, Computer science, Pattern recognition, computer.software_genre, Missing data, Data warehouse, Correlation, ComputingMethodologies_PATTERNRECOGNITION, Principal component analysis, Artificial intelligence, Data mining, business, computer, Computer Science::Databases
Abstract: Discretization is a crucial preprocessing primitive for a variety of data warehousing and mining tasks. In this article we present a novel PCA-based unsupervised algorithm for the discretization of continuous attributes in multivariate datasets. The algorithm leverages the underlying correlation structure in the dataset to obtain the discrete intervals, and ensures that the inherent correlations are preserved. The approach also extends easily to datasets containing missing values. We demonstrate the efficacy of the approach on real datasets and as a preprocessing step for both classification and frequent item set mining tasks. We also show that the intervals are meaningful and can uncover hidden patterns in data.
Published: 2005

38. LOADED: Link-Based Outlier and Anomaly Detection in Evolving Data Sets

Author: Matthew Eric Otey, Amol Ghoting, and Srinivasan Parthasarathy
Subjects: Computer science, business.industry, Outlier, Pattern recognition, Anomaly detection, Data mining, Artificial intelligence, computer.software_genre, Link (knot theory), business, computer, Categorical variable
Abstract: In this paper, we present LOADED, an algorithm for outlier detection in evolving data sets containing both continuous and categorical attributes. LOADED is a tunable algorithm, wherein one can trade off computation for accuracy so that domain-specific response times are achieved. Experimental results show that LOADED provides very good detection and false positive rates, which are several times better than those of existing distance-based schemes.
Published: 2005

39. Facilitating interactive distributed data stream processing and mining

Author: Amol Ghoting and Srinivasan Parthasarathy
Subjects: Database, Computer science, Data stream mining, Process (computing), Data mining, computer.software_genre, computer, Data stream processing
Abstract: Summary form only given. The past few years have seen the emergence of application domains that need to process data elements arriving as a continuous stream. Recently, several architectures to process database queries over these data streams have been proposed in the literature. Although these architectures may be suitable for general purpose query processing in a centralized-setting, they have serious limitations when it comes to supporting data mining queries in a distributed-setting. Data mining is an interactive process and it is crucial that we provide the user with interactive response times. In addition, many data mining applications, such as network intrusion detection, need to process data streams arriving at distributed end-points. Centralized processing of data streams for network intrusion detection would be overwhelming. These are fundamental issues for data mining over data streams and have been addressed. Our schemes give controlled interactive response times when processing data streams in a distributed-setting.
Published: 2004

40. A slacker coherence protocol for pull-based monitoring of on-line data sources

Author: Tahsin Kurc, Joel H. Saltz, Mario Lauria, Srinivasan Parthasarathy, and R. Sundaresan
Subjects: Distributed database, Computer science, business.industry, Distributed computing, Context (language use), Coherence (statistics), computer.software_genre, Data warehouse, Grid computing, The Internet, Polling, business, computer, Protocol (object-oriented programming)
Abstract: An increasing number of online applications operate on data from disparate, and often wide-spread, data sources. This paper studies the design of a system for the automated monitoring of on-line data sources. In this system a number of ad-hoc data warehouses, which maintain client-specified views, are interposed between clients and data sources. We present a model of coherence, referred to here as slacker coherence, to address the freshness problem in the context of pull-based protocols. We experimentally examine various techniques for estimating update rates and polling adaptively. We also look at the impact on the coherence model performance of the request scheduling algorithm at the source.
Published: 2003

41. Scalable trigger processing

Author: L. Huang, J.B. Park, Srinivasan Parthasarathy, A. Vernon, Eric N. Hanson, C. Carnes, L. Noronha, and M. Konyala
Subjects: Theoretical computer science, Non-lock concurrency control, Computer science, Concurrency, Distributed concurrency control, Multiversion concurrency control, Database trigger, Active database, Database index, Timestamp-based concurrency control, Concurrency control, Scalability, Isolation (database systems), Tuple, Optimistic concurrency control
Abstract: Current database trigger systems have extremely limited scalability. This paper proposes a way to develop a truly scalable trigger system. Scalability to large numbers of triggers is achieved with a trigger cache to use the main memory effectively, and a memory-conserving selection predicate index based on the use of unique expression formats called expression signatures. A key observation is that if a very large number of triggers are created, many will have the same structure, except for the appearance of different constant values. When a trigger is created, tuples are added to special relations created for expression signatures to hold the trigger's constants. These tables can be augmented with a database index or main-memory index structure to serve as a predicate index. The design presented also uses a number of types of concurrency to achieve scalability, including token (tuple)-level, condition-level, rule action-level and data-level concurrency.
Published: 1999

42. Reliability Study on Copper Pillar Bumping with Lead Free Solder

Author: Yu, Jinhua, primary, Anand, Ashok, additional, Mui, YC, additional, Srinivasan, Parthasarathy, additional, and Master, Raj, additional
Published: 2007
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

43 results on '"Srinivasan, Parthasarathy"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources