Author: "Mark James Carman" / Publisher: acm - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Mark James Carman"' showing total 17 results

Start Over Author "Mark James Carman" Publisher acm

17 results on '"Mark James Carman"'

1. Annotator Expertise and Information Quality in Annotation-based Retrieval

Author: Wern Han Lim and Mark James Carman
Subjects: Bookmarking, Computer science, media_common.quotation_subject, Information quality, 02 engineering and technology, Popularity, World Wide Web, Annotation, Resource (project management), Wisdom of the crowd, 0202 electrical engineering, electronic engineering, information engineering, Leverage (statistics), 020201 artificial intelligence & image processing, Quality (business), media_common
Abstract: This paper investigates the annotation-based retrieval (AR) of World Wide Web (WWW) resources that has been annotated by users on Collaborative Tagging (CT) platforms as a form of user-generated content (UGC). Previous approaches have simply weight the WWW resources according to their popularity, in order to leverage on the inherent wisdom of the crowd (WotC). In this paper, we argue that the popularity alone is not a sufficient indicator of quality since (1) some users are better annotators than the others; (2) resource popularity can be easily inflated by malicious users; and (3) high quality but highly specific resources may exhibit lower popularity than more general ones. Thus, we investigate the indicators of information quality for WWW resources, particularly user annotations that can be used to describe them. This research estimates the user expertise of content annotators in order to infer the information quality of their contributions; by exploring the various signals available on social bookmarking platforms such as the temporal information of annotations. The evaluation in retrieval performance on social bookmarking data shows significant improvements with the estimated user expertise and inferred information quality.
Published: 2017
Full Text: View/download PDF

2. Using Knowledge Graphs to Explain Entity Co-occurrence in Twitter

Author: Mark James Carman, Yiwei Wang, and Yuan-Fang Li
Subjects: Microblogging, Computer science, business.industry, media_common.quotation_subject, 010401 analytical chemistry, Co-occurrence, 02 engineering and technology, computer.software_genre, 01 natural sciences, 0104 chemical sciences, Ranking (information retrieval), Ranking, Knowledge graph, 020204 information systems, Path (graph theory), 0202 electrical engineering, electronic engineering, information engineering, Social media, Artificial intelligence, Function (engineering), business, computer, Natural language processing, media_common, Complement (set theory)
Abstract: Modern Knowledge Graphs such as DBPedia contain significant information regarding Named Entities and the logical relationships which exist between them. Twitter on the other hand, contains important information on the popularity and frequency with which these entities are mentioned and discussed in combination with one another. In this paper we investigate whether these two sources of information can be used to complement and explain one another. In particular, we would like to know whether the logical relationships (a.k.a. semantic paths) which exist between pairs of known entities can help to explain the frequency with which those entities co-occur with one another in Twitter. To do this we train a ranking function over semantic paths between pairs of entities. The aim of the ranker is to identify the path that most likely explains why a particular pair of entities have appeared together in a particular tweet. We train the ranking model using a number of lexical, graph-embedding and popularity-based features over semantic paths containing a single intermediate entity and demonstrate the efficacy of the model for determining why pairs of entities occur together in tweets.
Published: 2017
Full Text: View/download PDF

3. Estimating Relative User Expertise for Content Quality Prediction on Reddit

Author: Wern Han Lim, Sze-Meng Jojo Wong, and Mark James Carman
Subjects: Consumption (economics), Computer science, media_common.quotation_subject, Information quality, 02 engineering and technology, World Wide Web, Cold start, 020204 information systems, Wisdom of the crowd, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Quality (business), Content (Freudian dream analysis), Reliability (statistics), media_common
Abstract: Reddit as a social curation site relies on its users to curate content from the World Wide Web (WWW) for the consumption of other users. Content on the site is enriched through user comments, discussions and extensions. This additional content is of varying quality however -- ranging from meaningful information to misleading content; depending on the reliability, expertise and intention of the authors. Reddit relies on the Wisdom of the Crowd (WotC) from its community as well as selected moderators to manage its content. We argue that this approach suffers from the cold start in collecting user votes and is at risk of user bias, particularly a group-think mentality. Besides that, managing the large collection of content on Reddit is expensive. In our study, we explore the estimation of relative user expertise through various content-agnostic approaches. We show that it is possible to infer information quality on Reddit using the expertise of the authors. This prediction of content quality could lead to an improved organisation of Reddit content (re-ranking) for user consumption and future information retrieval.
Published: 2017
Full Text: View/download PDF

4. Estimating Domain-Specific User Expertise for Answer Retrieval in Community Question-Answering Platforms

Author: Sze-Meng Jojo Wong, Mark James Carman, and Wern Han Lim
Subjects: Information retrieval, Computer science, 02 engineering and technology, 020204 information systems, Wisdom of the crowd, 0202 electrical engineering, electronic engineering, information engineering, Question answering, Leverage (statistics), Graph (abstract data type), 020201 artificial intelligence & image processing, Pairwise comparison, Quality information, Natural language, Knowledge mining
Abstract: Community Question-Answering (CQA) platforms leverage the inherent wisdom of the crowd - enabling users to retrieve quality information from domain experts through natural language. An important and challenging task is to identify reliable and trusted experts on large popular CQA platforms. State-of-the-art graph-based approaches to expertise estimation consider only user-user interactions without taking the relative contribution of individual answers into account, while pairwise-comparison approaches consider only pairs involving the best-answerer of each question. This research argues that there is a need to account for the user's relative contribution towards solving the question when estimating user expertise and proposes a content-agnostic measure of user contributions. This addition is incorporated into a competition-based approach for ranking users' question answering ability. The paper analyses how improvements in user expertise estimation impact on applications in expert search and answer quality prediction. Experiments using the Yahoo! Chiebukuro data show encouraging performance improvements and robustness over state-of-the-art approaches.
Published: 2016
Full Text: View/download PDF

5. Beyond Clustering

Author: Ramakrishna Bairi, Mark James Carman, and Ganesh Ramakrishnan
Subjects: Topic model, Computer science, 02 engineering and technology, computer.software_genre, Entropy (classical thermodynamics), symbols.namesake, 020204 information systems, Expectation–maximization algorithm, 0202 electrical engineering, electronic engineering, information engineering, Entropy (information theory), Entropy (energy dispersal), Cluster analysis, Hierarchy, Entropy (statistical thermodynamics), business.industry, Computer Science::Information Retrieval, Bayesian network, Hierarchical clustering, Generative model, symbols, 020201 artificial intelligence & image processing, Data mining, Artificial intelligence, F1 score, business, computer, Natural language processing, Gibbs sampling
Abstract: We study the problem of generating DAG-structured category hierarchies over a given set of documents associated with "importance" scores. Example application includes automatically generating Wikipedia disambiguation pages for a set of articles having click counts associated with them. Unlike previous works, which focus on clustering the set of documents using the category hierarchy as features, we directly pose the problem as that of finding a DAG structured generative mode that has maximum likelihood of generating the observed "importance" scores for each document where documents are modeled as the leaf nodes in the DAG structure. Desirable properties of the categories in the inferred DAG-structured hierarchy include document coverage and category relevance, each of which, we show, is naturally modeled by our generative model. We propose two different algorithms for estimating the model parameters. One by modeling the DAG as a Bayesian Network and estimating its parameters via Gibbs Sampling; and the other by estimating the path probabilities using the Expectation Maximization algorithm. We empirically evaluate our method on the problem of automatically generating Wikipedia disambiguation pages using human generated clusterings as the ground truth. We find that our framework improves upon the baselines according to the F1 score and Entropy that are used as standard metrics to evaluate the hierarchical clustering.
Published: 2016
Full Text: View/download PDF

6. On the Effectiveness of Query Weighting for Adapting Rank Learners to New Unlabelled Collections

Author: Mark James Carman, Mark Sanderson, Falk Scholer, and Pengfei Li
Subjects: business.industry, Computer science, Rank (computer programming), 02 engineering and technology, Machine learning, computer.software_genre, Weighting, Ranking (information retrieval), Set (abstract data type), Ranking, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Learning to rank, Artificial intelligence, Data mining, Representation (mathematics), Transfer of learning, business, computer, Statistical hypothesis testing
Abstract: Query-level instance weighting is a technique for unsupervised transfer ranking, which aims to train a ranker on a source collection so that it also performs effectively on a target collection, even if no judgement information exists for the latter. Past work has shown that this approach can be used to significantly improve effectiveness; in this work, the approach is re-examined on a wide set of publicly available L2R test collections with more advanced learning to rank algorithms. Different query-level weighting strategies are examined against two transfer ranking frameworks: AdaRank and a new weighted LambdaMART algorithm. Our experimental results show that the effectiveness of different weighting strategies, including those shown in past work, vary under different transferring environments. In particular, (i) Kullback-Leibler based density-ratio estimation tends to outperform a classification-based approach and (ii) aggregating document-level weights into query-level weights is likely superior to direct estimation using a query-level representation. The Nemenyi statistical test, applied across multiple datasets, indicates that most weighting transfer learning methods do not significantly outperform baselines, although there is potential for the further development of such techniques.
Published: 2016
Full Text: View/download PDF

7. Overcoming Key Weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilarity Measure

Author: Mark James Carman, Ye Zhu, Yue Zhu, Kai Ming Ting, and Zhi-Hua Zhou
Subjects: business.industry, Pattern recognition, 02 engineering and technology, computer.software_genre, Distance measures, ComputingMethodologies_PATTERNRECOGNITION, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Anomaly detection, Artificial intelligence, Data mining, business, K nearest neighbour, Cluster analysis, Neighbourhood (mathematics), Data dependent, computer, Mathematics, Distance based
Abstract: This paper introduces the first generic version of data dependent dissimilarity and shows that it provides a better closest match than distance measures for three existing algorithms in clustering, anomaly detection and multi-label classification. For each algorithm, we show that by simply replacing the distance measure with the data dependent dissimilarity measure, it overcomes a key weakness of the otherwise unchanged algorithm.
Published: 2016
Full Text: View/download PDF

8. Bayesian latent variable models for collaborative item rating prediction

Author: Ian Ruthven, Mark James Carman, Morgan Harvey, and Fabio Crestani
Subjects: QA75, Topic model, User profile, business.industry, Computer science, Bayesian probability, Latent variable, Recommender system, Machine learning, computer.software_genre, symbols.namesake, Collaborative filtering, symbols, Artificial intelligence, Data mining, Latent variable model, business, computer, Gibbs sampling
Abstract: Collaborative filtering systems based on ratings make it easier for users to find content of interest on the Web and as such they constitute an area of much research. In this paper we first present a Bayesian latent variable model for rating prediction that models ratings over each user's latent interests and also each item's latent topics. We describe a Gibbs sampling procedure that can be used to estimate its parameters and show by experiment that it is competitive with the gradient descent SVD methods commonly used in state-of-the-art systems. We then proceed to make an important and novel extension to this model, enhancing it with user-dependent and item-dependant biases to significantly improve rating estimation. We show by experiment on a large set of real ratings data that these models are able to outperform 3 common baselines, including a very competitive and modern SVD-based model. Furthermore we illustrate other advantages of our approach beyond simply its ability to provide more accurate ratings and show that it is able to perform better on the common and important case where the user profile is short.
Published: 2011
Full Text: View/download PDF

9. Towards query log based personalization using topic models

Author: Mark James Carman, Mark Baillie, Morgan Harvey, and Fabio Crestani
Subjects: Topic model, Web search query, Information retrieval, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Online aggregation, Query language, Query optimization, Personalization, Ranking (information retrieval), Personalized search, Query expansion, Web query classification, Query by Example, Sargable, computer, computer.programming_language, RDF query language
Abstract: We investigate the utility of topic models for the task of personalizing search results based on information present in a large query log. We define generative models that take both the user and the clicked document into account when estimating the probability of query terms. These models can then be used to rank documents by their likelihood given a particular query and user pair.
Published: 2010
Full Text: View/download PDF

10. Proximity-based opinion retrieval

Author: Shima Gerani, Mark James Carman, and Fabio Crestani
Subjects: Information retrieval, Point (typography), Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Key (cryptography), Statistical model, Lexicon, Term (time)
Abstract: Blog post opinion retrieval aims at finding blog posts that are relevant and opinionated about a user's query. In this paper we propose a simple probabilistic model for assigning relevant opinion scores to documents. The key problem is how to capture opinion expressions in the document, that are related to the query topic. Current solutions enrich general opinion lexicons by finding query-specific opinion lexicons using pseudo-relevance feedback on external corpora or the collection itself. In this paper we use a general opinion lexicon and propose using proximity information in order to capture opinion term relatedness to the query. We propose a proximity-based opinion propagation method to calculate the opinion density at each point in a document. The opinion density at the position of a query term in the document can then be considered as the probability of opinion about the query term at that position. The effect of different kernels for capturing the proximity is also discussed. Experimental results on the BLOG06 dataset show that the proposed method provides significant improvement over standard TREC baselines and achieves a 2.5% increase in MAP over the best performing run in the TREC 2008 blog track.
Published: 2010
Full Text: View/download PDF

11. Tag navigation

Author: Mark James Carman and Cédric Mesnage
Subjects: World Wide Web, Information retrieval, noindex, Social network, Computer science, business.industry, Web page, Scalability, Graph (abstract data type), Tag cloud, business, Popularity, Folksonomy
Abstract: The amount of information available on the world wide web keeps growing at an exponential pace. Social tagging is a feature of various online social networks to organize information elements by letting people label these with free-form text, called tags. The graph created by this process is often called a folksonomy and comprises the association between people, tags and documents. Tagging is now used to organize web pages, pictures, videos, music, books, academic publications, etc.The current ways of navigating folksonomies are limited. In most web portals, "search" is the main feature which uses tags. When browsing tags, most systems give a few related tags to the clicked tag, none enables the user to get related tags to multiple clicked tags at the same time. A popular tag cloud displays links to the most popular tags in the folksonomy with a font size that depends on their popularity. Popular tag clouds and related tags can enable tag-based navigation.Enabling navigation through related tag clouds to multiple clicked tags in an efficient and scalable manner is a hard problem. We propose a bayesian approach to the problem of generating related tag clouds for navigation by using social network information and probabilistic models of people's tagging behaviors. We propose two new models to generate tag clouds based on popularity, tag co-occurrence and social relationships. The models are implemented in a prototype application to navigate empirical data from "last.fm", an online social network for music. We give an evaluation plan to compare the models regarding searchability through user evaluations.
Published: 2009
Full Text: View/download PDF

12. A statistical comparison of tag and query logs

Author: Mark James Carman, Fabio Crestani, Mark Baillie, and Robert Gwadera
Subjects: Set (abstract data type), Vocabulary, Query expansion, Web search query, Information retrieval, Computer science, Web query classification, media_common.quotation_subject, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Folksonomy, Term (time), media_common
Abstract: We investigate tag and query logs to see if the terms people use to annotate websites are similar to the ones they use to query for them. Over a set of URLs, we compare the distribution of tags used to annotate each URL with the distribution of query terms for clicks on the same URL. Understanding the relationship between the distributions is important to determine how useful tag data may be for improving search results and conversely, query data for improving tag prediction. In our study, we compare both term frequency distributions using vocabulary overlap and relative entropy. We also test statistically whether the term counts come from the same underlying distribution. Our results indicate that the vocabulary used for tagging and searching for content are similar but not identical. We further investigate the content of the websites to see which of the two distributions (tag or query) is most similar to the content of the annotated/searched URL. Finally, we analyze the similarity for different categories of URLs in our sample to see if the similarity between distributions is dependent on the topic of the website or the popularity of the URL.
Published: 2009
Full Text: View/download PDF

13. Blog distillation using random walks

Author: Mark James Carman, Fabio Crestani, and Mostafa Keikha
Subjects: Information retrieval, Theoretical computer science, Computer science, Blogosphere, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Graph (abstract data type), Language model, Random walk, Blog distillation
Abstract: This paper addresses the blog distillation problem. That is, given a user query find the blogs most related to the query topic. We model the blogosphere as a single graph that includes extra information besides the content of the posts. By performing a random walk on this graph we extract most relevant blogs for each query. Our experiments on the TREC'07 data set show 15% improvement in MAP and 8% improvement in Precision@10 over the Language Modeling baseline.
Published: 2009
Full Text: View/download PDF

14. Tag data and personalized information retrieval

Author: Mark James Carman, Mark Baillie, and Fabio Crestani
Subjects: Cognitive models of information retrieval, Information retrieval, Bookmarking, Computer science, media_common.quotation_subject, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Personalization, World Wide Web, Query expansion, Human–computer information retrieval, Quality (business), Folksonomy, Smoothing, media_common
Abstract: Researchers investigating personalization techniques for Web Information Retrieval face a challenge; that the data required to perform evaluations, namely query logs and click-through data, is not readily available due to valid privacy concerns. One option for researchers is to perform a user study, however, such experiments are often limited to small (and sometimes biased) samples of users, restricting somewhat the conclusions that can be drawn. Alternatively, researchers can look for publicly available data that can be used to approximate query logs and click-through data. Recently it has been shown that the information contained in social bookmarking (tagging) systems may be useful for improving Web search.We investigate the use of tag data for evaluating personalized retrieval systems involving thousands of users. Using data from the social bookmarking site del.icio.us, we demonstrate how one can rate the quality of personalized retrieval results. Furthermore, we conduct experiments involving various smoothing techniques and profile settings, which show that a user's "bookmark history" can be used to improve search results via personalization. Analogously to studies involving implicit feedback mechanisms in IR, which have found that profiles based on the content of clicked URLs outperform those based on past queries alone, we find that profiles based on the content of bookmarked URLs are generally superior to those based on tags alone.
Published: 2008
Full Text: View/download PDF

15. Towards personalized distributed information retrieval

Author: Fabio Crestani and Mark James Carman
Subjects: World Wide Web, Information retrieval, Process (engineering), Computer science, Bookmarking, Relevance (information retrieval), Personalization
Abstract: Our aim is to investigate if and how the performance of Distributed Information Retrieval (DIR) systems can be improved through personalization. Toward this aim we are building a testbed of document collections and corresponding personalized relevance judgments. In this paper we discuss our intended approach for personalizing the three different phases of the DIR process. We also describe the test collection we are building and discuss our methodology for evaluating personalized DIR using relevance information taken from social bookmarking data.
Published: 2008
Full Text: View/download PDF

16. Proceedings of the 22nd Australasian Document Computing Symposium, ADCS 2017, Brisbane, QLD, Australia, December 7-8, 2017

Author: Bevan Koopman, Guido Zuccon, and Mark James Carman
Published: 2017
Full Text: View/download PDF

17. Proceedings of the 21st Australasian Document Computing Symposium, ADCS 2016, Caulfield, VIC, Australia, December 5-7, 2016

Author: Sarvnaz Karimi and Mark James Carman
Published: 2016
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

17 results on '"Mark James Carman"'

1. Annotator Expertise and Information Quality in Annotation-based Retrieval

2. Using Knowledge Graphs to Explain Entity Co-occurrence in Twitter

3. Estimating Relative User Expertise for Content Quality Prediction on Reddit

4. Estimating Domain-Specific User Expertise for Answer Retrieval in Community Question-Answering Platforms

5. Beyond Clustering

6. On the Effectiveness of Query Weighting for Adapting Rank Learners to New Unlabelled Collections

7. Overcoming Key Weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilarity Measure

8. Bayesian latent variable models for collaborative item rating prediction

9. Towards query log based personalization using topic models

10. Proximity-based opinion retrieval

11. Tag navigation

12. A statistical comparison of tag and query logs

13. Blog distillation using random walks

14. Tag data and personalized information retrieval

15. Towards personalized distributed information retrieval

16. Proceedings of the 22nd Australasian Document Computing Symposium, ADCS 2017, Brisbane, QLD, Australia, December 7-8, 2017

17. Proceedings of the 21st Australasian Document Computing Symposium, ADCS 2016, Caulfield, VIC, Australia, December 5-7, 2016

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

17 results on '"Mark James Carman"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources