Author: "Gaul W" / Topic: business information systems - Searchworks@Jio Institute Digital Library Search Results

1. A Comparison of Data Mining Methods and Logistic Regression to Determine Factors Associated with Death Following Injury.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: A comparison of techniques for analysing trauma injury data collected over ten years at a hospital trauma unit in the U.K. is reported. The analysis includes a comparison of four data mining techniques to determine factors associated with death following injury. The techniques include a classification and regression tree algorithm, a classification algorithm, a neural network and logistic regression. As well as techniques within the data mining framework, conventional logistic regression modelling is also included for comparison. Results are compared in terms of sensitivity, specificity, positive predictive value and negative predictive value. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

2. A Non-Homogeneous Poisson Based Model for Daily Rainfall Data.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: In this paper we report some results of the application of a new stochastic model applied to rainfall daily data. The Poisson models, characterized only by the expected rate of events (impulse occurrences, that is the mean number of impulses per unit time) and the assigned probability distribution of the phenomenon magnitude, do not take into consideration the datum regarding the duration of the occurrences, that is fundamental from a hydrological point of view. In order to describe the phenomenon in a way more adherent to its physical nature, we propose a new model simple and manageable. This model takes into account another random variable, representing the duration of the rainfall due to the same occurrence. Estimated parameters of both models and related confidence regions are obtained. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

3. Simple Component Analysis Based on RV Coefficient.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Published: 2006
Full Text: View/download PDF

4. Variable Architecture Bayesian Neural Networks: Model Selection Based on EMC.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: This work addresses the problem of Selecting appropriate architectures for Bayesian Neural Networks (BNN). Specifically, it proposes a variable architecture model where the number of hidden units are selected by using a variant of the real-coded Evolutionary Monte Carlo algorithm developed by Liang and Wong (2001) for inference and prediction in fixed architecture Bayesian Neural Networks. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

5. Approaches to Asymmetric Multidimensional Scaling with External Information.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: In this paper some possible approaches to asymmetric multidimensional sealing with external information are presented to analyze graphically asymmetric proximity matrices. In particular, a proposal to incorporate external information in biplot method is provided. The methods considered allow joint or separate analyzes of symmetry and skew-symmetry. A final application to Morse code data is performed to emphasize advantages and shortcomings of the different methods proposed. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

6. Nonparametric Clustering of Seismic Events.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: In this paper we propose a clustering technique, based on the maximization of the likelihood function defined from the generalization of a model for seismic activity (ETAS model, (Ogata (1988))), iteratively changing the partitioning of the events. In this context it is useful to apply models requiring the distinction between independent events (i.e. the background seismicity) and strongly correlated ones. This technique develops nonparametric estimation methods of the point process intensity function. To evaluate the goodness of fit of the model, from which the clustering method is implemented, residuals process analysis is used. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

7. Credit Risk Management Through Robust Generalized Linear Models.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: In this work, a robust methodology is developed for the classification of a sample of small and medium firms on the basis of their default probability. The importance of this classification procedure is emphasized by the New Basel Capital Accord (Basel II) for the capital adequacy of internationally active banks. The Basel accord introduces the possibility to adopt models of internal rating for the estimation of the default probability of customers' banks. The reference framework of this paper is the class of generalized linear models which allows to classify units avoiding strict assumptions such those required by the linear discriminant analysis. Another advantage of generalized linear models is the possibility to explore different links between the expected value of the dependent variable and the linear predictor. Parameters are estimated using balance ratios and data coming from Centrale dei Rischi for a set of firms which are customers of a medium sized bank of Northern Italy. Finally, we perform a robust analysis of the model estimates through the forward search in order to monitor the influence of outliers on the final classification. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

8. Classification of Financial Returns According to Thresholds Exceedances.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: Properties of a panel of financial time series arc explored, aiming at classifying market shares according to their extremal returns behaviour. Existing methods for optimal portfolio selection involve estimation of correlation coefficient, whose properties for measuring dependence in financial time series are questionable. Alternatively, for stationary processes of financial returns, the mean size of cluster of thresholds exceedances leads to define a measure of extremal dependence more accurate than correlation. Further functionals that might help to optimal portfolio selection, are, for instance, the total loss occurred to a stock during an extreme event or the time-length duration of a loss in a stress period. Combining functionals of financial returns it is possible to clustering shares properly and setting up a tool for portfolio selection. The performance of this method is assessed, through an application to real financial time series, by means of standard Markowitz theory of optimal selection of shares. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

9. Using CATPCA to Evaluate Market Regulation.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: One of the most interesting research area in economics concerns the measurement of relative competiveness of different economic systems. Among the several proposed indicators, a particularly relevant one is the Product Market Regulation (PMR) proposed by the OECD, calculated on the basis of a rich database. This paper uses the same database to compute alternative indicators. The main difference with the OECD indicator is that we propose a less invasive statistical methodology (CATPCA). suitable for the treatment of qualitative data. In addition we remove several arbitrary manipulations of basic data. The calculation delivered a new ranking of the 21 countries analyzed and some new interesting evidence. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

10. The Impact of the New Labour Force Survey on the Employed Classification.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: Regulation n. 577/1998 of the European Council gives the rules to be used by the Community countries to design and conduct the Labour Force Survey (LFS). In order to apply this regulation the Italian LFS has been completely revised regarding several aspects of the survey such as frequency, definitions, questionnaire, survey design, interviewers network. All these changes caused a break in the time series of the main labour force estimates. The aim of this work is to describe and evaluate these differences and their impact on the employed classification. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

11. A Spatial Mixed Model for Sectorial Labour Market Data.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: A vast literature has been recently concerned with the analysis of variation in overdispersed counts across geographical areas. In this paper, we extend the univariate semiparametric models introduced by Biggeri et al. (2003) to the analysis of multiple spatial counts. The proposed approach is applied to modeling the geographical distribution of employees by economic sectors of the manufacturing industry in Teramo province (Abruzzo) during 2001. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

12. Multidimensional Versus Unidimensional Models for Ability Testing.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: Over last few years the need for an objective way of evaluating student performance has rapidly increased due to the growing call for the evaluation of tests administered at the end of a teaching process and during the guidance phases. Performance evaluation can be achieved busing the Item Response Theory (IRT) approach. In this work we compare the performance of an IRT model defined first on a multidimensional ability space and then on a unidimensional one. The aim of the comparison is to assess the results obtained in the two situations through a simulation study in terms of student classification based on ability estimates. The comparison is made using the two-parameter model defined within the more general framework of the Generalized Linear Latent Variable Models (GLLVM) since it allows the inclusion of more than one ability (latent variables). The simulation highlights that the importance of the dimensionality of the ability space increases when the number of items referring to more than one ability increases. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

13. Testing Procedures for Multilevel Models with Administrative Data.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: Recent Relative Effectiveness studies of the Health Sector have strongly criticized hierarchical ranking in hospitals. As an alternative, they propose a multi-faceted approach which evaluates the quality and characteristics of Hospital services. In this direction, the use of administrative data has proven highly useful. This data is less precise than clinical data but performs more effectively in describing general situations. The numerosity of the population renders all the parameters Significant in linear model tests. We must therefore utilize resampling schemes in order to verify the hypotheses concerning the significance of the parameters in opportunely drawn subsamples. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

14. Determinants of Secondary School Dropping Out: a Structural Equation Model.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: In this work we present the main results of a research program on dropping out in secondary school, carried out for the Labor Bureau of Campania Region in Italy. We exploited structural equation modeling to identify determinants of the phenomenon under study. We adopt a social system perspective, considering data coming from official statistics related to the 103 Italian Provinces. We provide some details for the model specification and the estimated parameters. Some relevant issues related to the estimation process due to the small sample size and the non-normality of the variables are also discussed. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

15. Archetypal Analysis for Data Driven Benchmarking.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: In this work, adopting an exploratory and graphical approach, we suggest to consider archetypal analysis as a basis for a data driven benchmarking procedure. The procedure is aimed at defining some reference performers, at understanding their features, and at comparing observed performances with them. Being archetypes some extreme points, we propose to consider them as reference performers. Then, we offer a set of graphical tools in order to describe these archetypal benchmarks, and to evaluate the observed performances with respect to them. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

16. Analyzing Evaluation Data: Modelling and Testing for Homogeneity.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: In the evaluation process of a given service, different issues are worth of analysis. In first instance, it is interesting to assess how the evaluation responses changes over the time and whether there is an effect of the raters' features. Secondly, when the service is made up by different items, it is important to verify if the satisfaction feelings of the users/consumers are the same with respect to all the dimensions. At this scope, the paper proposes a modelling approach for analyzing and testing ordinal/rating data. Some evidence from University services evaluation shows the usefulness of this procedure in a real case-study. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

17. Customer Satisfaction Evaluation: An Approach Based on Simultaneous Diagonalization.

Author: Bock, 9H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: Several methods have been proposed in literature for the service quality evaluation. These models measure the gap between customer's expectations for excellence and their perceptions of actual service offered. In this paper we propose an extension of a techniques which allows to analyze jointly the expectations and perceptions data. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

18. Sensitivity of Attributes on the Performance of Attribute-Aware Collaborative Filtering.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: Collaborative Filtering (CF). the most commonly-used technique for recommender systems, does not make use of object attributes. Several hybrid recommender systems have been proposed, that aim at improving the recommendation quality by incorporating attributes in a CF model. In this paper, we conduct an empirical study of the sensitivity of attributes for Several existing hybrid techniques using a movie dataset with an augmented movie attribute set. In addition, we propose two attribute selection measures to select informative attributes for attribute-aware CF filtering algorithms. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

19. Boosted Incremental Tree-based Imputation of Missing Data.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: Tree-based procedures have been recently considered as non parametric tools for missing data imputation when dealing with large data structures and no probability assumption. A previous work used an incremental algorithm based on cross-validated decision trees and a lexicographic ordering of the single data to be imputed. This paper considers an ensemble method where tree-based model is used as learner. Furthermore, the incremental imputation concerns missing data of each variable at turn. As a result, the proposed method allows more accurate imputa-tions through a more efficient algorithm. A simulation case study shows the overall good performance of the proposed method against some competitors. A MatLab implementation enriches Tree Harvest Software for non-standard classification and regression trees. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

20. Variable Selection Using Random Forests.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: One of the main topic in the development of predictive models is the identification of variables which are predictors of a given outcome. Automated model selection methods, such as backward or forward stepwise regression, are classical solutions to this problem, but are generally based on strong assumptions about the functional form of the model or the distribution of residuals. In this pa-per an alternative selection method, based on the technique of Random Forests, is proposed in the context of classification, with an application to a real dataset. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

21. Evolutionary Algorithms for Classification and Regression Trees.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: Optimization Problems represent a topic whose importance is getting higher and higher for many statistical methodologies. This is particularly true for Data Mining. It is a fact that, for a particular class of problems, it is not feasible to exhaustively examine all possible solutions. This has led researchers' attention towards a particular class of algorithms called Heuristics. Some of these Heuristics (in particular Genetic Algorithms and Ant Colony Optimization Algorithms), which are inspired to natural phenomena, have captured the attention of the scientific community in many fields. In this paper Evolutionary Algorithms are presented, in order to face two well-known problems that affect Classification and Regression Trees. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

22. A Software Tool via Web for the Statistical Data Analysis: R-php.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: The spread of Internet and the growing demand of services from the web users have changed and are still changing the way to organize the work or the study. Nowadays, the main part of information and many services are on the web and the software is going toward the same direction: in fact, the use of software implemented via web is ever-increasing, with a client-server logic that enables the "centralized" use of software installed on a server. In this paper we describe the Structure and the running of R-php, an environment for statistical analysis, freely accessible and attainable through the World Wide Web, based on the statistical environment R. R-php is based on two modules: a base module and a point-and-click module. By using the point-and-click module, the so far implemented statistical analyses include also ANOVA, linear regression and some data analysis methods such as cluster analysis and PCA (Principal Component Analysis). [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

23. Building Recommendations from Random Walks on Library OPAC Usage Data.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: In this contribution we describe a new way of building a recommender service based on OPAC web-usage histories. The service is based on a clustering approach with restricted random walks. This algorithm has some properties of single linkage clustering and suffers from the same deficiency, namely bridging. By introducing the idea of a walk context (see Franke and Thede (2005) and Franke and Geyer-Schulz (2004)) the bridging effect can be considerably reduced and small clusters suitable as recommendations are produced. The resulting clustering algorithm scales well for the large data sets in library networks. It complements behavior-based recommender services by supporting the exploration of the revealed semantic net of a library network's documents and it offers the user the choice of the trade-off between precision and recall. The architecture of the behavior-based system is described in Geyer-Schulz et al. (2003). [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

24. Procrustes Techniques for Text Mining.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: This paper aims at exploring the capability of the so called Latent Semantic Analysis applied to a multilingual context. In particular we are interested in weighing how it could be useful in solving linguistic problems, moving from a statistical point of view. Here we focus on the possibility of evaluating the goodness of a translation by comparing the latent structures of the original text and its version in another natural language. Procrustes rotations are introduced in a statistical framework as a tool for reaching this goal. An application on one year of Le Monde Diplomatique and the corresponding Italian edition will show the effectiveness of our proposal. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

25. Robust Multivariate Calibration.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: Multivariate calibration uses an estimated relationship between a multivariate response Y and an explanatory vector X to predict unknown X in future from further observed responses. Up to now very little has been written about robust calibration. An approach can be based on the outliers deletion methods. An alternative is to employ robust procedures. The purpose of this paper is to present multivariate calibration methods which are able to detect and investigate those observations which differ from the bulk of the data or to identify subgroups of observations. Particular attention will be paid to the forward search approach. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

26. A Projection Method for Robust Estimation and Clustering in Large Data Sets.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: A projection method for robust estimation of shape and location in multivariate data and cluster analysis is presented. The key idea of the procedure is to search for heterogeneity in univariate projections on directions that are obtained both randomly, using a modification of the Stahel-Donoho procedure, and by maximizing and minimizing the kurtosis coefficient of the projected data, as proposed by Peña and Prieto (2005). We show in a Monte Carlo study that the resulting procedure works well for robust estimation. Also, it preserves the good theoretical properties of the Stahel-Donoho method. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

27. A Forward Search Method for Robust Generalised Procrustes Analysis.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: One drawback of Procrustes Analysis is the lack of robustness. To overcome this limitation a procedure that applies the Generalised Procrustes methods, by way of a progressive sequence inspired to the "forward search", was developed. Starting from an initial centroid, defined by the partial point configuration satisfying the LMS principle, this is extended by joining, at every step, a restricted subset of the remaining points. At every insertion, the updated centroid, redetermined by the new considered points, is compared with the previous by way of the common elements. If significant variations of the similarity transformation parameters occur, they reveal the presence of outliers or non stationary points among the new elements just inserted. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

28. An R Package for the Forward Analysis of Multivariate Data.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: We describe the R package Rfwdmv (R package for the forward multivariate analysis) which implements the forward search for the analysis of multivariate data. The package provides functions useful for detecting atypical observations and/or subsets in the data and for testing in a robust way whether the data should be transformed. Additionally, the package contains functions for performing robust principal component analyses and robust discriminant analyses as well as a range of graphical tools for interactively assessing fitted forward searches on multivariate data. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

29. The Forward Search Method Applied to Geodetic Transformations.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: In geodesy, one of the most frequent problems to solve is the coordinate transformation. This means that it is necessary to estimate the coefficients of the equations that transform the planimetric coordinates, defined in a reference system, to the corresponding ones in a second reference system. This operation, performed in a 2D-space is called planar transformation. The main problem is that if outliers are included in the data, adopting non-robust methods of adjustment to calculate the coefficients cause an arbitrary large change in the estimate. Traditional methods require a closer analysis, by the operator, of the computation progress and of the statistic indicator provided in order to identify possible outliers. In this paper the application of the Forward Search in geodesy is discussed and the results are compared with those computed with traditional adjustment methods. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

30. Robust Transformation of Proportions Using the Forward Search.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: The aim of this work is to detect the best transformation parameters to normality when data are proportions. To this purpose we extend the forward search algorithm introduced by Atkinson and Riani (2000), and Atkinson et al. (2004) to the transformation proposed by Aranda-Ordaz (1981). The procedure, implemented by authors with R package, is applied to the analysis of a particular characteristic of Tuscany industries. The data used derive from the Italian industrial census conducted in the year 2001 by the Italian National Statistical Institute (ISTAT). [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

31. Random Start Forward Searches with Envelopes for Detecting Clusters in Multivariate Data.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: During a forward search the plot of minimum Mahalanobis distances of observations not in the subset provides a test for outliers. However, if clusters are present in the data, their simple identification requires that there arc searches that initially include a preponderance of observations from each of the unknown clusters. We use random starts to provide such searches, combined with simulation envelopes for precise inference about clustering. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

32. Calibration Confidence Regions Using Empirical Likelihood.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: The literature on multivariate calibration shows an increasing interest in non-parametric or semiparametric methods. Using Empirical Likelihood (EL). we present a semiparametric approach to find multivariate calibration confidence regions and we show how a unique optimum calibration point may be found weighting the EL profile function. In addition, a freeware VBA for Excel© program has been implemented to solve the many relevant computational problems. An example taken from a process of a semiconductor industry is presented. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

33. The Effects of MEP Distributed Random Effects on Variance Component Estimation in Multilevel Models.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: An in-depth investigation on maximum likelihood estimators for variance components is proposed, where the reference is a multilevel model with misspecifications on random effect distribution. The multivariate distributions here introduced for the random effects belong to the family of the Multivariate Exponential Power (MEP) distributions. Our primary interest is devoted to the variability of such estimators, since the MEPs have a noteworthy influence upon it. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

34. A Generalization of the Polychoric Correlation Coefficient.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: The polychoric correlation coefficient is a measure of association between two ordinal variables. It is based on the assumption that two latent bivariate normally distributed random variables generate couples of ordinal scores. Categories of the two ordinal variables correspond to intervals of the corresponding continuous variables. Thus, measuring the association between ordinal variables means estimating the product moment correlation between the underlying normal variables (Olsonn. 1979). When the hypothesis of latent bivariate normality is empirically or theoretically implausible, other distributional assumptions can be made. In this paper a new and more flexible polychoric correlation coefficient is proposed assuming that the underlying variables are skewnormally distributed (Roscino. 2005). The skew normal (Azzalini and Dalla Valle. 1996) is a family of distributions which includes the normal distribution as a special case, but with an extra parameter to regulate the skewness. As for the original polychoric correlation coefficient, the new coefficient was estimated by the maximization of the log-likelihood function with respect to the thresholds of the continuous variables, the skewness and the correlation parameters. The new coefficient was then tested on samples from simulated populations differing in the number of ordinal categories and the distribution of the underlying variables. The results were compared with those of the original polychoric correlation coefficient. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

35. Automatic Discount Selection for Exponential Family State-Space Models.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: In a previous paper (Pastore, 2004), a method for selecting the discount parameter in a gaussian state-space model was introduced. The method is based on a sequential optimization of a Bayes factor and is intended for on-line modelling purposes. In this paper, these results are extended to state-space models where the distribution of the observable variable belongs to the exponential family. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

36. Visualizing Dependence of Bootstrap Confidence Intervals for Methods Yielding Spatial Configurations.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: Several techniques (like MDS and PCA) exist for summarizing data by means of a graphical configuration of points in a low-dimensional space. Usually, such analyses are applied to data for a sample drawn from a population. To assess how accurate the sample based plot is as a representation for the population, confidence intervals or ellipsoids can be constructed around each plotted point, using the bootstrap procedure. However, such a procedure ignores the dependence of variation of different points across bootstrap samples. To display how the variations of different points depend on each other, we propose to visualize bootstrap configurations in a bootstrap movie. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

37. Monotone Constrained EM Algorithms for Multinormal Mixture Models.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: We investigate the spectral decomposition of the covariance matrices of a multivariate normal mixture distribution in order to construct constrained EM algorithms which guarantee the monotonicity property. Furthermore we propose different set of constraints which can be simply implemented. These procedures have been tested on the ground of many numerical experiments. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

38. Baum-Eagon Inequality in Probabilistic Labeling Problems.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: This work illustrates an approach to the study of labeling, aka "object classification", This kind of parallel computing problem well suites to AI applications (pattern recognition, edge detection, etc.) Our target consists in simplifying an overly computationally costly algorithm proposed by Faugeras and Berthod: using Baum-Eagon theorem, we obtained a reduced algorithm which produces results comparable with other more complex approaches. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

39. Missing Data in Optimal Scaling.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: We propose a procedure to assess a measure for a latent phenomenon, starting from the observation of a wide set of ordinal variables affected by missing data. The proposal is based on Nonlinear PCA technique to be jointly used with an ad hoc imputation method for the treatment of missing data. The procedure is particularly suitable when dealing with ordinal, or mixed, variables, which are strongly interrelated and in the presence of Specific patterns of missing observations. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

40. Regularized Sliced Inverse Regression with Applications in Classification.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: Consider the problem of classifying a number of objects into one of several groups or classes based on a set of characteristics. This problem has been extensively studied under the general subject of discriminant analysis in the statistical literature, or supervised pattern recognition in the machine learning field. Recently, dimension reduction methods, such as SIR. and SAVE, have been used for classification purposes. In this paper we propose a regularized version of the SIR. method which is able to gain information from both the structure of class means and class variances. Furthermore, the introduction of a shrinkage parameter allows the method to be applied in under-resolution problems, such as those found in gene expression microarray data. The REGSIR method is illustrated on two different classification problems using real data sets. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

41. Sequential Decisional Discriminant Analysis.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: We are describing here a sequential discriminant analysis method which aim is essentially to classify evolutionary data. This method of decision-making is based on the research of principal axes of a configuration of points in the individual-space with a relational inner product. We are in presence of a discriminant analysis problem, in which the decision must be taken as the partial knowledge evolutionary information of the observations of the statistical unit, which we want to classify. We show here how the knowledge from the observation of the global testimony sample carried out during the entire period, can be of particular benefit to the classifying decision on supplementary statistical units, of which we only have partial information about. An analysis using real data is here described using this method. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

42. Estimation of the Structural Mean of a Sample of Curves by Dynamic Time Warping.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Cerioli, Andrea, and Riani, Marco
Abstract: Following our previous works where an improved dynamic time warping (DTW) algorithm has been proposed and motivated, especially in the multivariate case, for computing the dissimilarity between curves, in this paper we modify the classical DTW in order to obtain discrete warping functions and to estimate the structural mean of a sample of curves. With the suggested methodology we analyze series of daily measurements of some air pollutants in Emilia-Romagna (a region in Northern Italy). We compare results with those obtained with other flexible and non parametric approaches used in functional data analysis. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

43. Graphical Representation of Functional Clusters and MDS Configurations.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: We deal with graphical representations of results of functional clustering and functional multidimensional scaling (MDS). Ramsay and Silverman(1997, 2005) proposed functional data analysis. Functional data analysis enlarges the range of statistical data analysis. But, it is not easy to represent results of functional data analysis techniques. We focus on two methods of functional data analysis: functional clustering and functional MDS. We show graphical representations for functional hierarchical clustering and functional k-means method in the first part of this paper. Then, in the second part, graphical representation of results of functional MDS. functional configuration is presented. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

44. Growing Clustering Algorithms in Market Segmentation: Defining Target Groups and Related Marketing Communication.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: This paper outlines innovative techniques for the segmentation of consumer markets. It compares a new self-controlled growing neural network with a recent growing k-means algorithm. A critical issue is the identification of the "right" number of clusters, which is externally validated by the JUMP-criterion. The empirical application counters several objections recently raised against the use of cluster analysis for market segmentation. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

45. On the Choice of the Kernel Function in Kernel Discriminant Analysis Using Information Complexity.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: In this short paper we shall consider the Kernel Fisher Discriminant Analysis (KFDA) and extend the idea of Linear Discriminant, Analysis (LDA) to nonlinear feature space. We shall present a new method of choosing the optimal kernel function and its effect on the KDA classifier using information-theoretic complexity measure. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

46. Genetic Algorithms-based Approaches for Clustering Time Series.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Zani, Sergio, and Cerioli, Andrea
Abstract: Cluster analysis is to be included among the favorite data mining techniques. Cluster analysis of time series has received great attention only recently mainly because of the several difficult issues involved. Among several available methods, genetic algorithms proved to be able to handle efficiently this topic. Several partitions are considered and iteratively selected according to some adequacy criterion. In this artificial "struggle for survival" partitions are allowed to interact and mutate to improve and produce a "high quality" solution. Given a set of time series two genetic algorithms are considered for clustering (the number of clusters is assumed unknown). Both algorithms require a model to be fitted to each time series to obtain model parameters and residuals. These methods are applied to a real data set concerned with the visitors flow recorded, in state owned museums with paid admission, in the Lazio region of Italy. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Language

Publication Type

Database

46 results on '"Gaul W"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources