69 results on '"Data proliferation"'
Search Results
2. Technology‐induced bias in the theory of evidence‐based medicine.
- Author
-
Eustace, Scott
- Subjects
- *
MACHINE learning , *MEDICAL records , *MEDICAL technology , *MEDICAL research , *WORLD Wide Web , *EVIDENCE-based medicine , *SMARTPHONES - Abstract
Abstract: Designing trials and studies to minimize confounding and bias is central to evidence‐based medicine (EBM). The widespread use of recent technologies such as machine learning, smartphones, and the World Wide Web to collect, analyse, and disseminate information can improve the efficiency, reliability, and availability of medical research. However, it also has the potential to introduce new sources of significant, technology‐induced evidential bias. This paper assesses the extent of the impact by reviewing some of the methods by and principles according to which evidence is collected, analysed, and disseminated in EBM, supported by specific examples. It considers the effect of personal health tracking via smartphones, the current proliferation of research data and the influence of search engine “filter bubbles”, the possibility of machine learning‐driven study design, and the implications of using machine learning to seek patterns in large quantities of data, for example from observational studies and medical record databases. It concludes that new technology may introduce profound new sources of bias that current EBM frameworks do not accommodate. It also proposes new approaches that could be incorporated in to EBM theory to mitigate the most obvious risks, and suggests where further assessment of the practical implications is needed. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
3. Gender Transition: Is There a Right to Be Forgotten?
- Author
-
Guilhermina Rego, Mónica Correia, and Rui Nunes
- Subjects
Male ,Health (social science) ,education ,Identity (social science) ,Legislation ,Health data ,03 medical and health sciences ,0302 clinical medicine ,Humans ,media_common.cataloged_instance ,Data Protection Act 1998 ,030212 general & internal medicine ,Sociology ,European union ,media_common ,Law and economics ,Ethics ,030505 public health ,Right to be forgotten ,Health Policy ,Gender Identity ,Gender transition ,Issues, ethics and legal aspects ,Privacy ,Philosophy of medicine ,General Data Protection Regulation ,Female ,Original Article ,0305 other medical science ,Data proliferation ,‘Right to be forgotten’ - Abstract
The European Union (EU) faced high risks from personal data proliferation to individuals’ privacy. Legislation has emerged that seeks to articulate all interests at stake, balancing the need for data flow from EU countries with protecting personal data: the General Data Protection Regulation. One of the mechanisms established by this new law to strengthen the individual’s control over their data is the so-called “right to be forgotten”, the right to obtain from the controller the erasure of records. In gender transition, this right represents a powerful form of control over personal data, especially health data that may reveal a gender with which they do not identify and reject. Therefore, it is pertinent to discern whether the right to have personal data deleted—in particular, health data—is ethically acceptable in gender transition. Towards addressing the ethical dimensions of the right to be forgotten in this case, this study presents relevant concepts, briefly outlines history, ethics and law of records considering the evolution from paper to electronic format, the main aspects of identity construction and gender identity, and explores the relationship between privacy, data protection/information control and identity projection. Also, it discusses in gender transition the relation between “the right to self-determination”, “the right to delete”, and “the right to identity and individuality”. Conclusions on the ethical admissibility of the ‘right to be forgotten’ to control gender-affirming information are presented.
- Published
- 2021
4. Analysis of hadoop MapReduce scheduling in heterogeneous environment
- Author
-
Neeraj Gupta and Khushboo Kalia
- Subjects
HDFS ,Scheduling ,Computer science ,Task tracker ,020209 energy ,Distributed computing ,020208 electrical & electronic engineering ,General Engineering ,02 engineering and technology ,Engineering (General). Civil engineering (General) ,Scheduling (computing) ,Hadoop ,0202 electrical engineering, electronic engineering, information engineering ,MapReduce ,TA1-2040 ,Data proliferation - Abstract
Over the last decade, several advancements have happened in distributed and parallel computing. A lot of data is generated daily from various sources, and this speedy data proliferation led to the development of many more frameworks that are efficient to handle such huge data e.g. - Microsoft Dryad, Apache Hadoop, etc. Apache Hadoop is an open-source application of Google MapReduce and is getting a lot of attention from various researchers. Proper scheduling of jobs needs to be done for better performance. Numerous efforts have been done in the development of existing MapReduce schedulers and in developing new optimized techniques or algorithms. This paper focuses on the Hadoop MapReduce framework, its shortcomings, various issues we face while scheduling jobs to nodes and algorithms proposed by various researchers. Furthermore, we then classify these algorithms on various quality measures that affect MapReduce performance.
- Published
- 2021
5. Hybrid Smart Systems for Big Data Analysis
- Author
-
Andrey Ostroukh, M. Yu. Karelina, N. G. Kuftinova, E. N. Matyukhina, and E. U. Akhmetzhanova
- Subjects
Smart system ,business.industry ,Computer science ,Mechanical Engineering ,Big data ,Information technology ,Construct (python library) ,computer.software_genre ,Traffic flow ,Industrial and Manufacturing Engineering ,Key (cryptography) ,Data mining ,business ,computer ,Data proliferation ,Spatial analysis - Abstract
In big data analysis at an enterprise, a useful option is the analysis of spatial data by machine learning, using a hybrid smart system. Machine learning permits complex nonlinear prediction with maximum precision and efficiency. A real-time big data prediction network for effective traffic flow is of great practical value. In such a network, the key problem is to construct an adaptive model on the basic of historical data. Traffic flow prediction is critical to traffic management in information technology. Real-time data proliferation led to the development of big data analysis, for which nonlinearity of traffic data is the main source of inaccurate prediction.
- Published
- 2021
6. Delay Reduction Through Optimal Controller Placement to Boost Scalability in an SDDC
- Author
-
Haider Abbas, Malik Muhammad Zaki Murtaza Khan, Waseem Iqbal, Hammad Afzal, Bilal Rauf, and Yawar Abbas Bangash
- Subjects
021103 operations research ,Computer Networks and Communications ,Computer science ,business.industry ,Distributed computing ,0211 other engineering and technologies ,02 engineering and technology ,Computer Science Applications ,Storage area network ,Control and Systems Engineering ,Control theory ,Computer data storage ,Scalability ,Redundancy (engineering) ,Process control ,Single point of failure ,Electrical and Electronic Engineering ,business ,Data proliferation ,Information Systems - Abstract
Software-defined storage (SDS) is an abrupt technology that aims to address the explosive data proliferation and storage management complexity through the separation of control and data path, which are tightly bounded in the traditional storage area network (SAN) model. In SDS, the centralized controller manages and controls the overall storage services and operations. However, the centralized controller approach raises the scalability and single point of failure flag. In such a paradigm, the controller placement is an important activity. A random placement of controllers suffers from unpredictable delay in a large enterprise. Multiple SDS controllers mitigate the single point of failure and provide redundancy. We propose a method called controller placement based on the center of gravity (CPCG) to solve the optimal placement controllers’ locations to reduce the delay. CPCG incorporates different coordinates and analyzes their density (the number of attached devices to a switch) to suggest new optimal sites. For the least number of SDS controllers, we propose an areawise optimal placement location strategy; the areawise approach mitigates workload processing from the central controller and also provides a scalable and reliable storage system. Experiments show that the optimal placement drastically reduces delay and, hence, boosts scalability.
- Published
- 2020
7. Advancement Of Deep Learning In Big Data And Distributed Systems
- Author
-
Mustafa Maad Hamdi, Ali Saadoon Ahmed, and Mohammed Salah Abood
- Subjects
Metadata ,business.industry ,Computer science ,Deep learning ,Big data ,Table (database) ,Artificial intelligence ,Architecture ,business ,Data science ,Data proliferation ,Field (computer science) ,Variety (cybernetics) - Abstract
Digital computing space has grown dramatically since the beginning of the 2000s to deal with an increase in data proliferation. These come from a wide variety area. For example, the number of connected devices explodes with the advent of the Internet of Things. These machines generate a growing number of data, which must be analyzed, by their interactions with the outside environment and its various sensors. Social networks are also another field in which various data has been used, interactive data and metadata that provide information on user profiles. All these data require the storage of large capacity and analysis of several data. In effect, if the arrival of this quantities of information demanded storage improvement, significant advances in processing and interpretation were also required and feasible. In this paper, the main contributions are summarized in a comparison table as detailed in Table 1, like the objectives, challenges, and novelty of each paper are clarified. The architecture or model and applications used— finally, the recommendations for each.
- Published
- 2021
8. Data proliferation, reconciliation, and synthesis in viral ecology
- Author
-
Angela L. Rasmussen, Amy R. Sweeny, Tad A. Dallas, Colin J. Carlson, Rory Gibb, Gregory F. Albery, Liam Brierley, Timothée Poisot, Daniel J. Becker, Sadie J. Ryan, Ryan Connor, Maxwell J. Farrell, and Evan A. Eskew
- Subjects
Metadata ,Scope (project management) ,Computer science ,Ecology ,Ecology (disciplines) ,Pandemic ,Human virome ,Evolutionary ecology ,Taxonomy (biology) ,Data proliferation - Abstract
The fields of viral ecology and evolution have rapidly expanded in the last two decades, driven by technological improvements, and motivated by efforts to discover potentially zoonotic wildlife viruses under the rubric of pandemic prevention. One consequence has been a massive proliferation of host-virus association data, which comprise the backbone of research in viral macroecology and zoonotic risk prediction. These data remain fragmented across numerous data portals and projects, each with their own scope, structure, and reporting standards. Here, we propose that synthesis of host-virus association data is a central challenge to improve our understanding of the global virome and develop foundational theory in viral ecology. To illustrate this, we build an open reconciled mammal-virus database from four key published datasets, applying a standardized taxonomy and metadata. We show that reconciling these datasets provides a substantially richer view of the mammal virome than that offered by any one individual database. We argue for a shift in best practice towards the incremental development and use of synthetic datasets in viral ecology research, both to improve comparability and replicability across studies, and to facilitate future efforts to use machine learning to predict the structure and dynamics of the global virome.
- Published
- 2021
9. AI in Our Society
- Author
-
Ghislain Landry and Tsafack Chetsa
- Subjects
business.industry ,Computer science ,media_common.quotation_subject ,Unstructured data ,Data science ,GeneralLiterature_MISCELLANEOUS ,Field (computer science) ,ComputingMethodologies_PATTERNRECOGNITION ,The Internet ,business ,Data proliferation ,Operational strategy ,Reputation ,media_common - Abstract
Up to the early 2000s, artificial intelligence (AI) was perceived as a utopia outside of the restricted AI research and development community. A reputation that AI owed to its relatively poor performance at the time. In the early 2000s, significant progress had been made in the design and development of microprocessors, leading to computers capable of efficiently executing AI tasks. Additionally, the ubiquity of the Internet had led to data proliferation, characterized by the continuous generation of large volumes of structured and unstructured data at an unprecedented rate. The combination of the increasing computing power and the availability of large data sets stimulated extensive research in the field of AI, which led to successful deployments of the AI technology in various industries. Through such success, AI earned a place in the spotlight, as organizations continue to devote significant effort to integrate it as an integral part of their day-to-day operational strategy. However, the complex nature of AI often introduces challenges that organizations must efficiently address to fully realize the potential of the AI technology.
- Published
- 2021
10. Insight-Driven Sales Management
- Author
-
Hesham O. Dinana
- Subjects
Online and offline ,User experience design ,business.industry ,Analytics ,Order (business) ,Business ,Marketing ,Sales management ,Customer relationship management ,Data proliferation ,Content management - Abstract
In the new VUCA (Volatile, Uncertain, Complex and Ambiguous) world that we live in, there are new rules that will reshape many of the components of sales management, from prospecting, to lead qualification, to closing and relationship management. This chapter will explore the impact of technology, data proliferation, and omni-channel customer touch points on how organizations will manage their sales process and the sales teams in the integrated online and offline worlds (O2O sales). The digital-age consumer and the digital-age sales team will have different communication needs and tools that need to be addressed by sales leaders to ensure their organizations' success and competitiveness in this new landscape. Customer insights is the new name of the game and it needs to be developed using techniques such as content management, user experience management, performance analytics, machine learning, and artificial intelligence. Effectively and efficiently managing the sales process and the sales practices in the digital age will be the new challenge that organizations need to face as some types of sales jobs might disappear (order takers) and new jobs will need to be developed (sales analysts and data scientists). Todays sales managers need to put science into the art of selling.
- Published
- 2020
11. Optimization of Cost: Storage over Cloud Versus on Premises Storage
- Author
-
Akrati Sharma and Rajesh Sen
- Subjects
Service (systems architecture) ,Government ,Database ,business.industry ,Computer science ,020206 networking & telecommunications ,Cloud computing ,02 engineering and technology ,computer.software_genre ,Object (computer science) ,Term (time) ,Resource (project management) ,On demand ,0202 electrical engineering, electronic engineering, information engineering ,business ,computer ,Data proliferation - Abstract
In today’s world of digitalization the most important object is data. There are numerous techniques available for retrieval of the data. To store this data over a large period of time various data hardware (DHW) are used. As the data is increasing day by day the more cost is incurred in storing the data. Data proliferation is the term used to refer the large amount of data that is created by government and nongovernment institutions either in structured or unstructured format. To minimize the maintenance cost generated by data proliferation IaaS as cloud computing service can be implemented. IaaS can be considered as virtualized service, which can be used to provide ‘as a service’ to the users via cloud vendors. It provides the access of database resources on demand. One can ‘pay-as-per-use’ of resource from the pool of resources and these resources can be virtualized in cloud based environment. To access the virtualized resources there are various approaches used. Private cloud approach is commonly used to maintain the integrity of the data.
- Published
- 2020
12. Accurate Decision-making System for Mining Environment using Li-Fi 5G Technology over IoT Framework
- Author
-
Mekala, N. Srinivasu, Gps Varma, and P. Viswanathan
- Subjects
Government ,Scope (project management) ,Computer science ,020206 networking & telecommunications ,02 engineering and technology ,Risk analysis (engineering) ,Range (aeronautics) ,0202 electrical engineering, electronic engineering, information engineering ,Li-Fi ,020201 artificial intelligence & image processing ,Noise (video) ,Data proliferation ,5G ,Communication channel - Abstract
Environmental resources are the backbone of government economy. Most of the existing systems are failed to provide a secure management and maintenance of passing emergency information to the worker during risk conditions with the coverage of wi-fi technology. Data communication through radio wave is not possible on far end of subversive mines because asymmetrical data proliferation and inadequate rate of recurrence scope 3 kHz – 300 GHz. Most injuries taking place in subversive mines include rock falls, omissions, and bangs. The mining confinements remain pretentious by lung sickness due to inhaling dust and toxic gas, which are emitted from mining environment. Hence, we are formulating this issue with novel Li-Fi technology. It has covered a wide range of frequency (430–790) THZ. In this paper, we propose a LDM (Light-based Decision Mechanism) based HSM (Human Safety Management) algorithm to evaluate and monitor the abnormal conditions through an IoT sensor data. A precise machine learning based decision-making computational system is mandatory to evaluate the comfort level conditions and drove the notifications to the management remotely to overseer the mining environment conditions. This system estimates emitted toxic gas and evaluates the level of effect during the time of mining processes. This method requires unidirectional communication; therefore the LiFi channel is absolute suitable to accomplish secure and noise less data distribution within a nano second ample of time. The experimental result shows the performance of our algorithm for better than prevailing data analysis algorithms.
- Published
- 2019
13. Data proliferation based estimations over distribution factor in heterogeneous wireless sensor networks
- Author
-
Nagendra Prasad Pathak, Klimis Ntalianis, Vinod Kumar Verma, and Surinder Singh
- Subjects
Scheme (programming language) ,Computer Networks and Communications ,Computer science ,Distributed computing ,010401 analytical chemistry ,Real-time computing ,Pitch factor ,020206 networking & telecommunications ,02 engineering and technology ,01 natural sciences ,0104 chemical sciences ,Key distribution in wireless sensor networks ,Sensor node ,0202 electrical engineering, electronic engineering, information engineering ,Representation (mathematics) ,Data proliferation ,computer ,Wireless sensor network ,computer.programming_language - Abstract
In this paper, a new representation for the chi-squared distribution has been derived over wireless sensor networks (WSN). The underlying correlated data proliferation protocols have strong influence on performance of the deployed system. A proposed model has been deployed to investigate the WSN system over the data proliferation aspect. Initially, the degree of freedom (DOF) factor has been evaluated with respect to scalability issue in the proposal. Further, the factors affecting the outcome of the WSN system assessing data proliferation have been investigated. Moreover, sensor node operations namely: sense count, transmit count and receive redundant count have also been evaluated. Finally, extensive simulation analysis has been carried out to prove the validity of the proposed innovative scheme. However, it has also been investigated that chi-distribution for wireless sensor networks seems intractable with the degree of freedom, when varied with a specific number of nodes.
- Published
- 2018
14. Cloud Computing Issues, Challenges, and Needs: A Survey
- Author
-
Rd. Rohmat Saedudin, Hind Ra'ad Ebraheem, Shams N. Abd-Alwahab, Ronal Hadi, Defni, Mohammad Aljanabi, and Mohd Arfian Ismail
- Subjects
Service (systems architecture) ,Information Systems and Management ,General Computer Science ,Cover (telecommunications) ,business.industry ,Computer science ,Software as a service ,Cloud computing ,Computer security ,computer.software_genre ,Server ,User control ,The Internet ,Statistics, Probability and Uncertainty ,business ,Data proliferation ,computer - Abstract
Cloud computing represents a kind of computing that is based on the sharing of computing resources instead of possessing personal devices or local servers for handling several applications and tasks. This kind of computing includes three distinguished kinds of services provided remotely for clients that can be accessed by using the Internet. Typically, clients work on paying annual or monthly service fees for suppliers, in order to gain access to systems that work on delivering infrastructure as a service, platforms as a service, and software as a service for any subscriber. In this paper, the usefulness and the abuse of the cloud computing are briefly discussed and presented by highlighting the influences of cloud computing in different areas. Moreover, this paper also presents the kinds and services of cloud. In addition, the security issues that cover the cloud security solution requirements, and the cloud security issues, which is one of the biggest issues in recent years in cloud computing were presented in this paper. The security requirement that needs by the cloud computing covers privacy, lack of user control, unauthorized secondary usage, and finally data proliferation and data flow. Meanwhile, the security issues cover including ownership of device, the trust issue and legel aspects. To overcome the security issues, this paper also presents the solution at the end of this paper.
- Published
- 2021
15. Usage of Big Data in decision making process in companies
- Author
-
Gabriel Koman
- Subjects
Engineering ,business.industry ,Data efficiency ,Data quality ,Big data ,Master data ,Unstructured data ,General Medicine ,business ,Data science ,Data proliferation ,Data warehouse ,Data virtualization - Abstract
The rapid development in the field of information-communication technologies, which has been recorded in recent years, it has caused an increase in the volume of data in companies last year about 40 to 50% [1]. By analysing large amounts of data, it is possible to get information that is important for the enterprise and on the basis of which it is possible to improve the decision-making process for managers. The main problems in the management and decision-making of enterprises is constantly growing amount of data generated within the undertaking and its surroundings. These data reach the volumes and structures, which is not possible from the time and cost to manage through the current management information systems. The fastest increasing volumes of data are unstructured data, which may contain data with significant information value, for the purposes of decision making in the enterprise. In the light of the principle of the processing of data in the existing MIS, i.e. the processing of structured data, such as data capture, utility companies have to transform and analyse. The question of how to process and integrate data of different types of technology, solves the Big Data. This technology allows to handle different kinds of data, from a variety of data sources, in a very short amount of time (in milliseconds).Â
- Published
- 2017
16. Rapid evaluation of operation performance of multi-chiller system based on history data analysis
- Author
-
Yijun Wang, Xing Fang, and Xinqiao Jin
- Subjects
Chiller ,Engineering ,060102 archaeology ,business.industry ,020209 energy ,Mechanical Engineering ,06 humanities and the arts ,02 engineering and technology ,Building and Construction ,Energy consumption ,Reliability engineering ,Chiller boiler system ,Evaluation methods ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,0601 history and archaeology ,Electrical and Electronic Engineering ,business ,Data proliferation ,Simulation ,Civil and Structural Engineering - Abstract
This paper develops a rapid evaluation method for chillers’ operation strategies based on history data analysis, identifying their potential to improve the chillers’ performance. The data proliferation method and the moving-maximum method are developed to obtain the approximate best performance line of the overall chillers, and the line is taken as the evaluation benchmark. Three indexes, Iselect, Iload, and I, are defined to evaluate the selection of the active chillers, the load allocation among the active chillers, and the operation performance of the overall chillers respectively. With the index I, the percentage of potential energy saving and the least energy consumption during a period can also be calculated. The method is validated by evaluating two operation strategies of chillers. And the results of the evaluation identify the potential improvement of the strategies. Also the evaluation method has low computation cost and can be executed rapidly.
- Published
- 2017
17. Big Data and Business Analytics: Trends, Platforms, Success Factors and Applications
- Author
-
Henry Friday Nweke and Ifeyinwa Angela Ajah
- Subjects
Computer science ,Big data ,02 engineering and technology ,lcsh:Technology ,business intelligence ,Management Information Systems ,Open research ,Business analytics ,Artificial Intelligence ,big data ,020204 information systems ,big data tools ,0202 electrical engineering, electronic engineering, information engineering ,Hadoop ecosystem ,Social media ,business.industry ,lcsh:T ,business analytics ,review and business value ,020206 networking & telecommunications ,Unstructured data ,Business value ,Data science ,Computer Science Applications ,Business intelligence ,business ,Data proliferation ,Information Systems - Abstract
Big data and business analytics are trends that are positively impacting the business world. Past researches show that data generated in the modern world is huge and growing exponentially. These include structured and unstructured data that flood organizations daily. Unstructured data constitute the majority of the world’s digital data and these include text files, web, and social media posts, emails, images, audio, movies, etc. The unstructured data cannot be managed in the traditional relational database management system (RDBMS). Therefore, data proliferation requires a rethinking of techniques for capturing, storing, and processing the data. This is the role big data has come to play. This paper, therefore, is aimed at increasing the attention of organizations and researchers to various applications and benefits of big data technology. The paper reviews and discusses, the recent trends, opportunities and pitfalls of big data and how it has enabled organizations to create successful business strategies and remain competitive, based on available literature. Furthermore, the review presents the various applications of big data and business analytics, data sources generated in these applications and their key characteristics. Finally, the review not only outlines the challenges for successful implementation of big data projects but also highlights the current open research directions of big data analytics that require further consideration. The reviewed areas of big data suggest that good management and manipulation of the large data sets using the techniques and tools of big data can deliver actionable insights that create business values.
- Published
- 2019
18. MXenes for memristive and tactile sensory systems
- Author
-
Su-Ting Han, Ye Zhou, Guanglong Ding, Ruo-Si Chen, Baidong Yang, and Kui Zhou
- Subjects
010302 applied physics ,Computer science ,Process (engineering) ,General Physics and Astronomy ,02 engineering and technology ,Memristor ,Tactile perception ,021001 nanoscience & nanotechnology ,01 natural sciences ,law.invention ,symbols.namesake ,Neuromorphic engineering ,Human–computer interaction ,law ,0103 physical sciences ,symbols ,0210 nano-technology ,MXenes ,Data proliferation ,Tactile sensor ,Von Neumann architecture - Abstract
One of the most effective approaches to solving the current problem arising from the von Neumann bottleneck in this period of data proliferation is the development of intelligent devices that mimic the human learning process. Information sensing and processing/storage are considered to be the essential processes of learning. Therefore, high-performance sensors, memory/synaptic devices, and relevant intelligent artificial tactile perception systems are urgently needed. In this regard, innovative device concepts and emerging two-dimensional materials have recently received considerable attention. Herein, we discuss the development of MXenes for applications in tactile sensors, memristors, and artificial tactile perception systems. First, we summarize the structures, common properties, and synthesis and assembly techniques of MXenes. We then discuss the applications of MXenes in tactile sensors, memristors, and relevant neuromorphic-based artificial tactile perception systems along with the related working mechanisms. Finally, we present the challenges and prospects related to MXene synthesis, assembly, and application.
- Published
- 2021
19. Technology-induced bias in the theory of evidence-based medicine
- Author
-
Scott Eustace
- Subjects
Biomedical Research ,Computer science ,Biomedical Technology ,Filter (software) ,03 medical and health sciences ,0302 clinical medicine ,Bias ,Humans ,Philosophy, Medical ,Dissemination ,Reliability (statistics) ,Evidence-Based Medicine ,Information Dissemination ,030503 health policy & services ,Health Policy ,Public Health, Environmental and Occupational Health ,Reproducibility of Results ,030208 emergency & critical care medicine ,Evidence-based medicine ,Medical research ,Data science ,Philosophy of medicine ,Research Design ,Observational study ,0305 other medical science ,Data proliferation ,Medical Informatics - Abstract
Designing trials and studies to minimize confounding and bias is central to evidence-based medicine (EBM). The widespread use of recent technologies such as machine learning, smartphones, and the World Wide Web to collect, analyse, and disseminate information can improve the efficiency, reliability, and availability of medical research. However, it also has the potential to introduce new sources of significant, technology-induced evidential bias. This paper assesses the extent of the impact by reviewing some of the methods by and principles according to which evidence is collected, analysed, and disseminated in EBM, supported by specific examples. It considers the effect of personal health tracking via smartphones, the current proliferation of research data and the influence of search engine "filter bubbles", the possibility of machine learning-driven study design, and the implications of using machine learning to seek patterns in large quantities of data, for example from observational studies and medical record databases. It concludes that new technology may introduce profound new sources of bias that current EBM frameworks do not accommodate. It also proposes new approaches that could be incorporated in to EBM theory to mitigate the most obvious risks, and suggests where further assessment of the practical implications is needed.
- Published
- 2018
20. Life Cycle Management Considerations of Remotely Sensed Geospatial Data and Documentation for Long Term Preservation
- Author
-
Steven Kempler and Mohammad G. Khayat
- Subjects
Geospatial analysis ,business.industry ,media_common.quotation_subject ,Data management plan ,Usability ,Library and Information Sciences ,computer.software_genre ,Metadata ,World Wide Web ,Common Source Data Base ,Geography ,Documentation ,Quality (business) ,business ,Data proliferation ,computer ,media_common - Abstract
As geospatial missions age, one of the challenges for the usability of data is the availability of relevant and updated metadata with sufficient documentation that can be used by future generations of users to gain knowledge from the original data. Given that remote sensing data undergo many intermediate processing steps, for example, an understanding of the exact algorithms employed and the quality of that data produced could be key considerations for these users. As interest in global climate data is increasing, documentation about older data, their origins, and their provenance are valuable to first-time users attempting to perform historical climate research or comparative analysis of global change. Incomplete or missing documentation could be what stands in the way of a new researcher attempting to use the data. Therefore, preservation of documentation and related metadata is sometimes just as critical as the preservation of the original observational data. The Goddard Earth Sciences–Data and Informa...
- Published
- 2015
21. Data journals: A survey
- Author
-
Donatella Castelli, Paolo Manghi, Alice Tani, and Leonardo Candela
- Subjects
Information Systems and Management ,Computer Networks and Communications ,business.industry ,Computer science ,media_common.quotation_subject ,05 social sciences ,Big data ,Data publishing ,Library and Information Sciences ,050905 science studies ,Data science ,Publishing ,Data quality ,Quality (business) ,0509 other social sciences ,Information society ,050904 information & library sciences ,business ,Citation ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,Data proliferation ,Information Systems ,media_common - Abstract
Data occupy a key role in our information society. However, although the amount of published data continues to grow and terms such as data deluge and big data today characterize numerous research initiatives, much work is still needed in the direction of publishing data in order to make them effectively discoverable, available, and reusable by others. Several barriers hinder data publishing, from lack of attribution and rewards, vague citation practices, and quality issues to a rather general lack of a data-sharing culture. Lately, data journals have overcome some of these barriers. In this study of more than 100 currently existing data journals, we describe the approaches they promote for data set description, availability, citation, quality, and open access. We close by identifying ways to expand and strengthen the data journals approach as a means to promote data set access and exploitation.
- Published
- 2015
22. Data Modeling and Data Analytics: A Survey from a Big Data Perspective
- Author
-
André Ribeiro, Afonso Silva, and Alberto Rodrigues da Silva
- Subjects
Analytics ,business.industry ,Computer science ,Data quality ,Big data ,Unstructured data ,business ,Data science ,Data proliferation ,Data warehouse ,Data virtualization ,Data modeling - Abstract
These last years we have been witnessing a tremendous growth in the volume and availability of data. This fact results primarily from the emergence of a multitude of sources (e.g. computers, mobile devices, sensors or social networks) that are continuously producing either structured, semi-structured or unstructured data. Database Management Systems and Data Warehouses are no longer the only technologies used to store and analyze datasets, namely due to the volume and complex structure of nowadays data that degrade their performance and scalability. Big Data is one of the recent challenges, since it implies new requirements in terms of data storage, processing and visualization. Despite that, analyzing properly Big Data can constitute great advantages because it allows discovering patterns and correlations in datasets. Users can use this processed information to gain deeper insights and to get business advantages. Thus, data modeling and data analytics are evolved in a way that we are able to process huge amounts of data without compromising performance and availability, but instead by “relaxing” the usual ACID properties. This paper provides a broad view and discussion of the current state of this subject with a particular focus on data modeling and data analytics, describing and clarifying the main differences between the three main approaches in what concerns these aspects, namely: operational databases, decision support databases and Big Data technologies.
- Published
- 2015
23. EUDAT - A Pan-European Perspective on Data Management
- Author
-
Johannes Reetz, Shaun de Witt, Mark van de Sanden, and Damien Lecarpentier
- Subjects
Digital curation ,Knowledge management ,020205 medical informatics ,business.industry ,Best practice ,Data management ,010102 general mathematics ,Digital data ,Perspective (graphical) ,02 engineering and technology ,01 natural sciences ,Metadata ,0202 electrical engineering, electronic engineering, information engineering ,0101 mathematics ,business ,Data proliferation ,Dissemination - Abstract
Data management planning - thinking in advance about what will happen to data produced during the research process - is increasingly required by national research funding agencies, and data management guidelines for Horizon 2020 research projects were released by the EU in December 2013 (Guidelines on Data Management in Horizon 2020). Similar guidelines have been issued by the US Department of Energy (Statement on Digital Data Management), Australia (ANDS Data Management Plans) and across many other countries.The EUDAT project exists in part to disseminate and promote best practice in data management for twenty-first century research, and to provide support for communities in adopting basic principles such as data registration, metadata creation and data movement. As part of its mission to help researchers and research communities manage and preserve their data, EUDAT works with the world-recognised Digital Curation Centre on a version of their widely-used DMPonline tool which will capture the H2020 guidelines in a data management planning tool tailored to the emerging needs of European research.EUDAT’s is building a Collaborative Data Infrastructure (CDI) as a pan-European solution to the challenge of data proliferation and associated management in Europe’s scientific and research communities. The CDI will allow researchers to share data within and between communities and enable them to carry out their research effectively. Our mission is to provide a solution that will be affordable, trustworthy, robust, persistent, open and easy to use.
- Published
- 2017
24. Revisiting Sensemaking: The case of the Digital Decision Network Application (DigitalDNA)
- Author
-
Elizabeth Archer and Glen Barnes
- Subjects
management information systems ,Social Sciences and Humanities ,Computer science ,Big data ,02 engineering and technology ,Education ,World Wide Web ,dashbnoards ,Data visualization ,big data ,0202 electrical engineering, electronic engineering, information engineering ,Flexibility (engineering) ,LC8-6691 ,business.industry ,05 social sciences ,050301 education ,020207 software engineering ,Sensemaking ,Special aspects of education ,Data science ,data-use ,Management information systems ,Analytics ,Sciences Humaines et Sociales ,User interface ,business ,0503 education ,Data proliferation - Abstract
During this age of data proliferation, heavy reliance is placed on data visualisation to support users in making sense of vast quantities of information. Informational Dashboards have become the must have accoutrement for Higher Education institutions with various stakeholders jostling for development priority. Due to the time pressure and user demands, the focus of development process is often on designing for each stakeholder and the visual and navigational aspects. Dashboards are designed to make data visually appealing and easy to relate and understand; unfortunately this may mask data issues and create an impression of rigour where it is not justified. This article proposes that the underlying logic behind current dashboard development is limited in the flexibility, scalability, and responsiveness required in the demanding landscape of Big Data and Analytics and explores an alternative approach to data visualisation and sense making. It suggests that the first step required to address these issues is the development of an enriched database which integrates key indicators from various data sources. The database is designed for problem exploration allowing users freedom in navigating between various data-levels, which can then be overlaid with any user interface for dashboard generation for a multitude of stakeholders. Dashboards merely become tools providing users and indication of types of data available for exploration. A Design Research approach is shown, along with a case study to illustrate the benefits, showcasing various views developed for diverse stakeholders employing this approach, specifically the the Digital Decision Network Application (DigitalDNA) employed at Unisa.
- Published
- 2017
25. Training-based Workforce Development in Advanced Computing for Research and Education (ACoRE)
- Author
-
Yi Jiang, Yubao Wu, Semir Sarajlic, Neranjan Edirisinghe, and Gregori Faroux
- Subjects
Engineering ,Knowledge management ,business.industry ,Big data ,02 engineering and technology ,Supercomputer ,Workforce development ,01 natural sciences ,010305 fluids & plasmas ,HPC Challenge Benchmark ,Data sharing ,Engineering management ,Procurement ,0103 physical sciences ,Management system ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,Data proliferation - Abstract
Data proliferation across all domains is enabled by the advancement in data sharing and procurement that has led to explosion in volume of data worldwide. The exponential growth in data has provided opportunities for machine learning algorithms and computational algorithms that benefit from greater availability of data. Access to robust and resilient high performance computing (HPC) resources is available through campus and/or national centers. The growth in demand for advanced computing resources from non-traditional HPC research community leads to an increasing strain on user-support at local centers and/or national programs such as XSEDE, which creates challenges. At Georgia State University (GSU), we have a diverse HPC user community of over 300 users whom represent more than 40 disciplines and use about four million CPU hours annually. We implemented a workforce development model to support our HPC community through a combination of local and external workshops as well as integration of HPC training within classes such as Big Data Programming, Scientific Computing, Parallel and Distributed Computing (PDC). As part of our approach, we consolidated our three disparate local HPC resources under one management system that provides a similar user environment to that of national resources part of XSEDE, which simplifies user transition from local to national resources.
- Published
- 2017
26. The challenge of OwnData service features: A step towards an informed choice of an OwnData service
- Author
-
Jan T. Frece
- Subjects
Service (business) ,business.industry ,Computer science ,media_common.quotation_subject ,05 social sciences ,Internet privacy ,0507 social and economic geography ,02 engineering and technology ,Information repository ,Data visualization ,020204 information systems ,Data quality ,Computer data storage ,0202 electrical engineering, electronic engineering, information engineering ,media_common.cataloged_instance ,Quality (business) ,Data as a service ,European union ,business ,050703 geography ,Data proliferation ,media_common - Abstract
The goal of this paper is to raise awareness to the fact that the choice of data storage system is an increasingly significant one to make and to propose a number of dimensions to categorize such systems in a simple yet meaningful way. Many data subjects already use some kind of data service to store their messages, pictures, music, videos, etc. and in the light of increasing data production and a growing number of databased services, this trend is expected to continue. Advancing from storing pop songs to storing personal health or geo-location data, however, requires data subjects to get themselves acquainted with the quality features of data storage providers, should they wish to make an informed decision. The introduction chapter explores the consequences of the GDPR implementation in the European Union regarding the expectations towards storage of personal data, while the subsequent chapter explains the labeling decisions in this paper. The two ensuing chapters present the quality criteria for data storage widely used in contemporary reviews and completes them with additional dimensions advocated for by the author. In a final step, a quick assessment of popular data storage providers is made, using the discussed dimensions, to demonstrate the categorical imbalance in the data storage provider community.
- Published
- 2017
27. Comparing HiveQL and MapReduce methods to process fact data in a data warehouse
- Author
-
Prajyoti Dsilva, Sweedle Mascarnes, and Haince Denis Pen
- Subjects
Database ,Computer science ,business.industry ,Data transformation ,Big data ,Unstructured data ,computer.software_genre ,Data science ,Data warehouse ,Data modeling ,Data quality ,Data architecture ,business ,computer ,Data proliferation - Abstract
Today Big data is one of the most widely spoken about technology that is being explored throughout the world by technology enthusiasts and academic researchers. The reason for this is the enormous data generated every second of each day. Every webpage visited, every text message sent, every post on social networking websites, check-in information, mouse clicks etc. is logged. This data needs to be stored and retrieved efficiently, moreover the data is unstructured therefore the traditional methods of strong data fail. This data needs to be stored and retrieved efficiently There is a need of an efficient, scalable and robust architecture that needs stores enormous amounts of unstructured data, which can be queried as and when required. In this paper, we come up with a novel methodology to build a data warehouse over big data technologies while specifically addressing the issues of scalability and user performance. Our emphasis is on building a data pipeline which can be used as a reference for future research on the methodologies to build a data warehouse over big data technologies for either structured or unstructured data sources. We have demonstrated the processing of data for retrieving the facts from data warehouse using two techniques, namely HiveQL and MapReduce.
- Published
- 2017
28. Data analytics and web insights in area of data mining and analytics
- Author
-
Abzetdin Adamov
- Subjects
business.industry ,Computer science ,Big data ,Unstructured data ,computer.software_genre ,Data science ,Data warehouse ,World Wide Web ,Analytics ,Business intelligence ,Data architecture ,Data mining ,business ,Data proliferation ,computer ,Data virtualization - Abstract
The extremely fast grow of Internet Services, Web and Mobile Applications and advance of the related Pervasive, Ubiquity and Cloud Computing concepts have stimulated production of tremendous amounts of data partially available online (call metadata, texts, emails, social media updates, photos, videos, location, etc.). Even with the power of today's modern computers it still big challenge for business and government organizations to manage, search, analyze, and visualize this vast amount of data as information. Data-Intensive computing which is intended to address this problems become quite intense during the last few years yielding strong results. Data intensive computing framework is a complex system which includes hardware, software, communications, and Distributed File System (DFS) architecture. Just small part of this huge amount is structured (Databases, XML, logs) or semi-structured (web pages, email), over 90% of this information is unstructured, what means data does not have predefined structure and model. Generally, unstructured data is useless unless applying data mining and analysis techniques. At the same time, just in case if you can process and understand your data, this data worth anything, otherwise it becomes useless. Two key components of any Data-intensive system are: Data Storage and Data Processing. So, which technologies, techniques, platforms, tools are best for Big Data storing and processing? How Big Data Era effect technological landscape? These and many other questions will be answered during a speech.
- Published
- 2017
29. Research data reusability: conceptual foundations, barriers and enabling technologies
- Author
-
Costantino Thanos
- Subjects
Knowledge management ,Computer science ,Explicit knowledge ,Data discoverability ,02 engineering and technology ,Data publishing ,Library and Information Sciences ,Data abstraction ,lcsh:Communication. Mass media ,Data modeling ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Business and International Management ,Dimension (data warehouse) ,Metadata ,business.industry ,Communication ,lcsh:Information resources (General) ,Data reuse ,data reuse ,data discoverability ,data understandability ,relational thinking ,data abstraction ,data representation ,metadata ,explicit knowledge ,tacit knowledge ,data publishing ,Relational thinking ,Tacit knowledge ,Usability ,lcsh:P87-96 ,Data warehouse ,Computer Science Applications ,Data presentation ,Data quality ,020201 artificial intelligence & image processing ,business ,Data proliferation ,lcsh:ZA3040-5185 ,Data understandability - Abstract
High-throughput scientific instruments are generating massive amounts of data. Today, one of the main challenges faced by researchers is to make the best use of the world’s growing wealth of data. Data (re)usability is becoming a distinct characteristic of modern scientific practice. By data (re)usability, we mean the ease of using data for legitimate scientific research by one or more communities of research (consumer communities) that is produced by other communities of research (producer communities). Data (re)usability allows the reanalysis of evidence, reproduction and verification of results, minimizing duplication of effort, and building on the work of others. It has four main dimensions: policy, legal, economic and technological. The paper addresses the technological dimension of data reusability. The conceptual foundations of data reuse as well as the barriers that hamper data reuse are presented and discussed. The data publication process is proposed as a bridge between the data author and user and the relevant technologies enabling this process are presented.
- Published
- 2017
- Full Text
- View/download PDF
30. Object storage architecture in cloud for unstructured data
- Author
-
S. Samundiswary and Nilma M Dongre
- Subjects
Database ,Computer science ,business.industry ,Big data ,Unstructured data ,Cloud computing ,Information repository ,computer.software_genre ,Object storage ,Converged storage ,business ,Cloud storage ,Data proliferation ,computer - Abstract
Digital data both structured and unstructured data is everywhere and continues to grow approximately every two years. Due to increasing uses of data, creating tremendous data storage challenges in all industry. Object based Storage can be applied in both public and private clouds service models. Object based storage is used to store the data from various users through REST API Using HTTP(S) protocol and it also stores the virtual machines images as objects in cloud environment. Cloud storage services are used by all enterprises, home users and researchers to store their files that vary in size and formats. With the huge amount of stored data which is mostly unstructured, comes the related issue of finding the exact data relevant to the requirements of a user. This paper will cover the need of object storage system in cloud computing for massive unstructured digital static data and why traditional approaches are insufficient to meet this challenge. The basic architecture of object storage systems, how the object storage system supports cloud storage environment to store unstructured data effectively. Additionally we will compare and contrast the major object based storage cloud providers in use today including Google, Microsoft and Amazon.
- Published
- 2017
31. A Study on Big Data Integration with Data Warehouse
- Author
-
Arati Mohapatro and Tapan Kumar Das
- Subjects
Engineering ,Database ,business.industry ,Big data ,Volume (computing) ,Unstructured data ,computer.software_genre ,Data science ,Data warehouse ,Business intelligence ,Leverage (statistics) ,business ,computer ,Data proliferation ,Data virtualization - Abstract
The amount of data in world is exploding. Data is being collected and stored at unprecedented rates. The challenge is not only to store and manage the vast volume of data, but also to analyze and extract meaningful value from it. In the last decade Data Warehousing technology has been evolved for efficiently storing the data from different sources for business intelligence purpose. In the Age of the Big Data, it is important to remodel the existing warehouse system that will help you and your organization make the most of unstructured data with your existing Data Warehouse. As Big Data continues to revolutionize how we use data, this paper addresses how to leverage big data by effectively integrating it to your data warehouse.
- Published
- 2014
32. CertDB: A Practical Data Analysis System on Big Data
- Author
-
Ge Fu, Jianqiang Li, Qi Wang, Bing Jia, Jia Cui, and Bo Wang
- Subjects
Data element ,Database ,Computer science ,Data quality ,Data architecture ,computer.software_genre ,Data science ,Data proliferation ,computer ,Data migration ,Data warehouse ,Data modeling ,Data virtualization - Abstract
With the rapid development of big data technologies, more and more companies focus on the integration and analysis of the large scale data which are produced when the companies run, in order to discovering and solving their inner problems. Therefore, a data analysis system is in urgent need to process the big data generated by heterogeneous data sources (e.g. online and offline structured data, semi-structured data and non-structured data). Unfortunately, there is no integrated tool which can universally collect heterogeneous data sources, accomplish data ETL, storage, analysis, and knowledge output. In this paper, an integrated tool: CertDB is proposed, which can provide one-stop services of data collection, analysis, and knowledge output. The CertDB provides services to both expert and inexperienced data analysts, by proving both graphical BI interface and programming interface.
- Published
- 2016
33. Ten Simple Rules for Digital Data Storage
- Author
-
Sarah Mount, Timothée Poisot, Kara H. Woo, David LeBauer, Pauline Barmby, Edmund Hart, Naupaka Zimmerman, Patrick Mulrooney, Jeffrey W. Hollister, and François Michonneau
- Subjects
0301 basic medicine ,Computer and Information Sciences ,010504 meteorology & atmospheric sciences ,Databases, Factual ,Computer science ,QH301-705.5 ,Information Storage and Retrieval ,010501 environmental sciences ,Information repository ,Research and Analysis Methods ,01 natural sciences ,Computer Architecture ,World Wide Web ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Databases ,0302 clinical medicine ,Genetics ,Humans ,Biology (General) ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,0105 earth and related environmental sciences ,Data administration ,Data Management ,Taxonomy ,Data Processing ,Data curation ,Ecology ,Ecology and Environmental Sciences ,Biology and Life Sciences ,Biodiversity ,Research Assessment ,Computer Hardware ,Data warehouse ,Reproducibility ,030104 developmental biology ,Editorial ,Data Acquisition ,Computational Theory and Mathematics ,Modeling and Simulation ,Data quality ,Information Technology ,Data proliferation ,030217 neurology & neurosurgery ,Data migration ,Data virtualization - Abstract
Data is the central currency of science, but the nature of scientific data has changed dramatically with the rapid pace of technology. This change has led to the development of a wide variety of data formats, dataset sizes, data complexity, data use cases, and data sharing practices. Improvements in high throughput DNA sequencing, sustained institutional support for large sensor networks, and sky surveys with large-format digital cameras have created massive quantities of data. At the same time, the combination of increasingly diverse research teams and data aggregation in portals (e.g. for biodiversity data, GBIF or iDigBio) necessitates increased coordination among data collectors and institutions. As a consequence, “data” can now mean anything from petabytes of information stored in professionally-maintained databases, through spreadsheets on a single computer, to hand-written tables in lab notebooks on shelves. All remain important, but data curation practices must continue to keep pace with the changes brought about by new forms and practices of data collection and storage.
- Published
- 2016
34. Developing a Data Vault
- Author
-
Stuart Lewis, Lorraine Beard, Claire Knowles, Robin Taylor, Thomas Higgins, and Mary McDerby
- Subjects
Database ,Cost efficiency ,Computer science ,Information repository ,computer.software_genre ,Active data ,lcsh:Z ,lcsh:Bibliography. Library science. Information resources ,Data efficiency ,Component (UML) ,Transfer (computing) ,Stewardship ,computer ,Data proliferation - Abstract
Research data is being generated at an ever-increasing rate. This brings challenges in how to store, analyse, and care for the data. A component of this problem is the stewardship of data and associated files that need a safe and secure home for the medium to long-term. As part of typical suites of Research Data Management services, researchers are provided with large allocations of ‘active data storage’. This is often stored on expensive and fast disks to enable efficient transfer and working with large amounts of data. However, over time this active data store fills up, and researchers need a facility to move older but still valuable data to cheaper storage for long-term care. In addition, research funders are increasingly requiring data to be stored in forms that allow it to be described and retrieved in the future. For data that can’t be shared publicly in an open repository, a closed solution is required that can make use of offline or near-line storage for cost efficiency. This paper describes a solution to these requirements, called the Data Vault.
- Published
- 2016
35. A Major Threat to Big Data
- Author
-
Arunima Dubey and Satyajee Srivastava
- Subjects
Information privacy ,business.industry ,Computer science ,Big data ,Data security ,Unstructured data ,computer.software_genre ,Computer security ,Relational database management system ,Data quality ,business ,computer ,Transaction data ,Data proliferation - Abstract
Big Data has nowadays become the most talked latest IT trends. The fact that it can handle all the forms of data which includes unstructured data, big data has now become the preferred choice for analysis of huge amount of data over the Relational Database Management System(RDBMS). Big Data is helpful for the analysis of petabytes of data which is not possible in case of normal database system. But as everything comes with pros and cons so does big data. There are certain challenges which big data analytics is facing. The various challenges include validating the end point input, handling enormous amount of data on a very large scale, ensuring the security of transactional data, ensuring the safe and secure storage of data, sharing of variants of data with third party and analyzing those data without skipping even a single piece of information in order to generate reports and draw a conclusion. In this paper, we will be considering the major challenge to the big data and that is data security. Even with enormous advantages, the industry is taking a backseat for the shift from normal database to big data because of the data privacy concern. Even many big organizations don't consider big data to be a safe option as the data can be accessed by anyone. Certain different methods like the encryption-decryption technique, anonymization based, etc have been suggested by the researchers who are working to overcome the major threat of Big Data i.e. Data security. But unfortunately because of the 3 V's of the Big Data as stated by Gartner i.e. Velocity, Volume and Variety these methods didn't prove to be advantageous. Basically in this paper we will be discussing the various concern issues of Big data, out of which our main focus will be on data security.
- Published
- 2016
36. Shopping for privacy: Purchase details leaked to PayPal
- Author
-
Gunes Acar, Thomas Peetz, Bettina Berendt, and Sören Preibusch
- Subjects
Computer Networks and Communications ,media_common.quotation_subject ,Electronic commerce ,Internet privacy ,02 engineering and technology ,Data minimisation ,020204 information systems ,Management of Technology and Innovation ,PayPal ,0202 electrical engineering, electronic engineering, information engineering ,Web navigation ,Product (category theory) ,Leakage (economics) ,Online payments ,media_common ,Payment providers ,Marketing ,business.industry ,Tracking ,Data leakage ,Advertising ,Payment ,Computer Science Applications ,Data sharing ,Data aggregator ,Privacy ,020201 artificial intelligence & image processing ,Electronic retailing ,business ,Personally identifiable information ,Data proliferation - Abstract
© 2015 Elsevier B.V. We present a new form of online tracking: explicit, yet unnecessary leakage of personal information and detailed shopping habits from online merchants to payment providers. In contrast to the widely debated tracking of Web browsing, online shops make it impossible for their customers to avoid this dissemination of their data. We record and analyse leakage patterns for the 881 most popular US Web shops sampled from actual Web users' online purchase sessions. More than half of the sites we analysed shared product names and details with PayPal, allowing the payment provider to build up fine-grained and comprehensive consumption profiles about its clients across the sites they buy from, subscribe to, or donate to. In addition, PayPal forwards customers' shopping details to Omniture, a third-party data aggregator with even larger tracking reach than PayPal itself. Leakage to PayPal is commonplace across product categories and includes details of medication or sex toys. We provide recommendations for merchants. publisher: Elsevier articletitle: Shopping for privacy: Purchase details leaked to PayPal journaltitle: Electronic Commerce Research and Applications articlelink: http://dx.doi.org/10.1016/j.elerap.2015.11.004 content_type: article copyright: Copyright © 2015 Elsevier B.V. All rights reserved. ispartof: Electronic Commerce Research and Applications vol:15 pages:52-64 status: published
- Published
- 2016
37. Data as a Digital Resource
- Author
-
Gintare Surblyte
- Subjects
Data access ,Computer science ,General Data Protection Regulation ,Data quality ,Digital asset ,Data Protection Act 1998 ,Digital economy ,Computer security ,computer.software_genre ,Machine-readable data ,computer ,Data proliferation - Abstract
The digital economy has been termed a data-driven economy. Many digital business models are based on the collection and the processing of various types of data. Whereas the protection of personal data falls under the EU legal framework, and first and foremost the General Data Protection Regulation, the protection of data that goes beyond personal data may seem to be not protected at all. The latter circumstance is important in light of the EU Commission’s considerations on the introduction of rights related to the protection of non-personal data. Indeed, the question of whether there is a need for the protection of such data – called “industrial data” - and the issue of access to such data have also been at the center of a recent academic debate. Yet, whereas the protection of different types of data is highly relevant when it comes to the transfer and the processing of data, for the question of “access to data” data has to be considered more broadly, i.e. as a digital resource. Since in the digital economy it is no longer individual pieces of data (personal and/or non-personal) that play a role, but rather collections of data, digital data sets (which may consist of both personal and non-personal data) have become a digital asset of companies. The value of such digital data sets derives from the analysis of data – importantly though, the analysis of real-time data. Due to this feature of data, i.e. the fact that data is constantly changing, the question that arises as regards access to data in the digital economy relates not so much to access to data as such, but rather to access to the sources of data. After all, it is only in this case that access can be provided to real-time data. In light of this, the current debate on so-called “industrial data” may appear too static and, besides, it considers only part of the data that is relevant for the companies operating in the digital economy. The question is what the legal framework for an optimal allocation of such a digital resource as data should be and whether the question of access can be solved with the currently available legal tools.
- Published
- 2016
38. Data Trawling and Security Strategies
- Author
-
Venkata Karthik Gullapalli and Aishwarya Asesh
- Subjects
business.industry ,Computer science ,Trawling ,Volume (computing) ,Cloud computing ,Computer security ,computer.software_genre ,World Wide Web ,Server ,Computer data storage ,Leakage (economics) ,business ,Data proliferation ,computer ,Hacker - Abstract
The amount of data in the world seems increasing and computers make it easy to save the data. Companies offer data storage by providing cloud services and the amount of data being stored in these servers is increasing rapidly. In data mining, the data is stored electronically and the search is automated or at least augmented by computer. As the volume of data increases, inexorably, the proportion of it that people understand decreases alarmingly. This paper presents the data leakage problem arises because the services like Facebook and Google store all your data unencrypted on their servers, making it easy for them, or governments and hackers, to monitor the data.
- Published
- 2014
39. Data Mining System and Applications: A Review
- Author
-
Shrinivas Deshpande and Vilas M. Thakare
- Subjects
business.industry ,Data stream mining ,Computer science ,Information technology ,computer.software_genre ,Data science ,Local information systems ,Data warehouse ,Knowledge extraction ,Data mining ,Data architecture ,business ,computer ,Data proliferation ,Data virtualization - Abstract
In the Information Technology era information plays vital role in every sphere of the human life. It is very important to gather data from different data sources, store and maintain the data, generate information, generate knowledge and disseminate data, information and knowledge to every stakeholder. Due to vast use of computers and electronics devices and tremendous growth in computing power and storage capacity, there is explosive growth in data collection. The storing of the data in data warehouse enables entire enterprise to access a reliable current database. To analyze this vast amount of data and drawing fruitful conclusions and inferences it needs the special tools called data mining tools. This paper gives overview of the data mining systems and some of its applications.
- Published
- 2010
40. Overcoming inadequate documentation
- Author
-
Jinfang Niu
- Subjects
Descriptive knowledge ,Documentation ,Knowledge management ,Absorptive capacity ,Computer science ,business.industry ,Library and Information Sciences ,business ,Data proliferation ,Data science ,Information Systems - Abstract
Secondary data users need three types of knowledge to analyze secondary data: knowledge about data, background knowledge necessary to understand and interpret data, and data analysis skills. Part of knowledge about data is provided by the documentation of data. Background knowledge and data analysis skills are internalized as users' absorptive capacity. When documentation and their absorptive capacity are inadequate, users need to seek outside information to use secondary data. In this paper, causes of inadequate documentation were analyzed, why and how secondary users seek outside information were reported. Then based on the findings, implications about how to facilitate secondary data use were discussed.
- Published
- 2009
41. Proliferation framework on input data set to improve memory latency in multicores for optimization
- Author
-
Srinath N K and Sumalatha Aradhya
- Subjects
business.industry ,Computer science ,Data management ,media_common.quotation_subject ,Optimizing compiler ,Thread (computing) ,computer.software_genre ,CAS latency ,Computer architecture ,Debugging ,Compiler ,Multicore architecture ,business ,computer ,Data proliferation ,media_common - Abstract
In today's HPC world, numerous applications executed with high speed. The multi core architecture has provided a wide scope of exploration towards any kind of high-end applications. The paper gives an analogy on ability to handle the high-end input triggers by managing the core utilities at the lower end of the computer. The optimized way of memory block utilization, debugging proliferation and data management at configured input level discussed in the paper. The Data Proliferation framework model named as COMCO (Component for Compiler Optimization) model and it elaborates on how to handle configured inputs at the OS level. The paper gives an exploration of the configured inputs safely at the multi thread level using the COMCO (Component for Compiler Optimization) strategy.
- Published
- 2015
42. ShareInsights
- Author
-
Mukund Deshpande, Dhruva Ray, Sameer Dixit, and Avadhoot Agasti
- Subjects
Data collection ,business.industry ,Computer science ,Big data ,computer.software_genre ,Data science ,Data warehouse ,Visualization ,Data modeling ,Data visualization ,Data model ,Data quality ,Data architecture ,business ,computer ,Data proliferation ,Data integration ,Data virtualization - Abstract
The field of data analysis seeks to extract value from data for either business or scientific benefit. This field has seen a renewed interest with the advent of big data technologies and a new organizational role called data scientist. Even with the new found focus, the task of analyzing large amounts of data is still challenging and time-consuming. The essence of data analysis involves setting up data pipe-lines which consists of several operations that are chained together - starting from data collection, data quality checks, data integration, data analysis and data visualization (including the setting up of interaction paths in that visualization). In our opinion, the challenges stem from from the technology diversity at each stage of the data pipeline as well as the lack of process around the analysis. In this paper we present a platform that aims to significantly reduce the time it takes to build data pipelines. The platform attempts to achieve this in following ways. Allow the user to describe the entire data pipeline with a single language and idioms - all the way from data ingestion to insight expression (via visualization and end-user interaction). Provide a rich library of parts that allow users to quickly assemble a data analysis pipeline in the language. Allow for a collaboration model that allows multiple users to work together on a data analysis pipeline as well as leverage and extend prior work with minimal effort. We studied the efficacy of the platform for a data hackathon competition conducted in our organization. The hackathon provided us with a way to study the impact of the approach. Rich data pipelines which traditionally took weeks to build were constructed and deployed in hours. Consequently, we believe that the complexity of designing and running the data analysis pipeline can be significantly reduced; leading to a marked improvement in the productivity of data analysts/data scientists.
- Published
- 2015
43. An approach towards big data — A review
- Author
-
Palak Gupta and Nidhi Tyagi
- Subjects
Information privacy ,Database ,Distributed database ,business.industry ,Computer science ,Big data ,Unstructured data ,computer.software_genre ,Data science ,Data warehouse ,Data quality ,business ,Data proliferation ,computer ,Data virtualization - Abstract
Big Data refer to the analysis of significantly large collection of data that may contain user data, sensor data or machine data. An analyzed Big Data can deliver new business insights, open new markets and create competitive advantages. It consists of data sets that are of large magnitude (Volume), large collection of data which diverse representation include structured, semi structured, or unstructured data (Variety), and should arrive fast (velocity). The real value of data is observed after analyzing i.e. after finding patterns, deriving meanings, making decisions, the ultimate data is available whichever required. This paper discusses the need of Big Data and its analysis with the help of four V's. Further it gives the availability of data in day to day life because of which the requirement of Big Data was generated. The challenges, its privacy, security and analytic are taken into account. The security of Big Data is of major concerned thus different types are cyber attacks and threat detections are need to be discussed.
- Published
- 2015
44. Understanding Azure Storage and Databases
- Author
-
David Gollob, Mike Manning, Marshall Copeland, Julian Soh, and Anthony Puca
- Subjects
World Wide Web ,Work product ,Computer science ,business.industry ,Data management ,Computer data storage ,Pendulum ,Intellectual property ,business ,Database transaction ,Data proliferation - Abstract
The computer storage pendulum has swung from people asking, “What would I do with all that storage?” about a decade ago to people now using personal storage media in amounts once only possible for the largest organizations in the world. As you are probably well aware, the average smartphone today has as much storage in it as a PC did in 1997. Data has proliferated in type, size, and audience over the last few years. This data proliferation over the last decade has jokingly been referred to as “datageddon.” Data storage and data management have become a real challenge for organizations worldwide, as they struggle to keep up with webcam, voice, e-mail, transaction, and other data feeds that they are required to retain as part of their business’s intellectual property or work product.
- Published
- 2015
45. Key Technology Research for Unstructured Data Cloud Storage: New Exploring
- Author
-
Julan Yi
- Subjects
Database ,business.industry ,Computer science ,Master data ,Unstructured data ,Information repository ,computer.software_genre ,Data science ,Computer data storage ,Key (cryptography) ,The Internet ,business ,computer ,Cloud storage ,Data proliferation - Abstract
From traditional to today's data network text documents, pictures, audio and video mainstream, the Internet is gradually changing the structure of the data, the data from the non-structured data, which are growing and a wide variety of unstructured data, the Internet data Storage management has brought new challenges. In this paper, for all kinds of massive unstructured data storage problems of the proposed solutions, summed up the key issues to achieve unified storage of unstructured data, design and implement a batch framework of a non-structured data using a unified data storage features, to solve the problem of unified treatment of various types of unstructured data.
- Published
- 2015
46. Research issues in automatic database clustering
- Author
-
Le Gruenwald and Sylvain Guinepain
- Subjects
Database ,Process (engineering) ,Computer science ,Database administrator ,computer.software_genre ,Database design ,Resource (project management) ,Work (electrical) ,Dynamic database ,Cluster analysis ,computer ,Data proliferation ,Software ,Information Systems - Abstract
While a lot of work has been published on clustering of data on storage medium, little has been done about automating this process. This is an important area because with data proliferation, human attention has become a precious and expensive resource. Our goal is to develop an automatic and dynamic database clustering technique that will dynamically re-cluster a database with little intervention of a database administrator (DBA) and maintain an acceptable query response time at all times. In this paper we describe the issues that need to be solved when developing such a technique.
- Published
- 2005
47. Managing the three Ps of seismic data: proliferation, pervasiveness, and persistence
- Author
-
Jess B. Kozman
- Subjects
Geographic information system ,Petroleum industry ,Computer science ,business.industry ,Data management ,Best practice ,Computer data storage ,Hierarchical storage management ,General Engineering ,business ,Data science ,Implementation ,Data proliferation - Abstract
The use of large three-dimensional seismic datasets in oil and gas exploration puts enormous strain on data storage systems and strategies. The purpose of this study was to determine the factors that contribute to challenges in storage of large seismic datasets, to collect information about strategies being successfully implemented to meet these challenges at large international oil and gas companies, and to document best practices and procedures for planning for continuing expansion of these datasets. Schlumberger Information Solutions (SIS), as a division of the world?s largest provider of services to the oil and gas industry, is in a unique position to collect this information. SIS currently has seismic data management implementations with over 50 organizations in all major oil and gas exploration locations across the world. These engagements have produced a set of strategies for managing large seismic datasets that include hierarchical storage management (HSM), near line tape robotic systems, and web-based Geographic Information System (GIS) metadata analysis tools. Effective implementation of these strategies allows major exploration organizations to effectively manage and plan for continued growth and propagation of seismic data across the life cycle of their exploration and production projects.
- Published
- 2003
48. Critical analysis of load balancing strategies for cloud environment
- Author
-
Rajneesh Kumar and Anurag Jain
- Subjects
business.industry ,Computer science ,Computer Networks and Communications ,Distributed computing ,Big data ,020206 networking & telecommunications ,Round-robin DNS ,Cloud computing ,Unstructured data ,02 engineering and technology ,Load balancing (computing) ,Data modeling ,Network Load Balancing Services ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,Data proliferation - Abstract
Future of cloud computing depends on effective installation of infrastructure, efficient utilisation of resources and dynamic transformation of load. Due to inundation of tasks at the data centre there is a need for load balancing strategy. This will help in achieving better fault tolerance ratio and higher customer satisfaction. Load balancing strategy plays an important role in he distribution of the workload dynamically across multiple nodes so that no server is either underutilised or overwhelmed. Also, due to data proliferation nature of data sources such as social media sites, geographical information systems and the weather forecasting system, there is an exponential growth of structured and unstructured data which is called big data. The present data models are not capable enough to handle big data. To process the data efficiently, there is also a need of load balancing strategy at the data centre level. In this paper, the authors experimentally analyse some popular approaches for load balancing of tasks in a cloud environment using the cloud analyst simulator.
- Published
- 2017
49. Data quality: The other face of Big Data
- Author
-
Barna Saha and Divesh Srivastava
- Subjects
Information management ,System of record ,Quality management ,business.industry ,Computer science ,Data management ,Big data ,Enterprise data management ,Data science ,Data warehouse ,Data governance ,Data retrieval ,Data quality ,Scalability ,Data pre-processing ,business ,Data proliferation ,Data administration ,Data virtualization - Abstract
In our Big Data era, data is being generated, collected and analyzed at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Recent studies have shown that poor quality data is prevalent in large databases and on the Web. Since poor quality data can have serious consequences on the results of data analyses, the importance of veracity, the fourth ‘V’ of big data is increasingly being recognized. In this tutorial, we highlight the substantial challenges that the first three ‘V’s, volume, velocity and variety, bring to dealing with veracity in big data. Due to the sheer volume and velocity of data, one needs to understand and (possibly) repair erroneous data in a scalable and timely manner. With the variety of data, often from a diversity of sources, data quality rules cannot be specified a priori; one needs to let the “data to speak for itself” in order to discover the semantics of the data. This tutorial presents recent results that are relevant to big data quality management, focusing on the two major dimensions of (i) discovering quality issues from the data itself, and (ii) trading-off accuracy vs efficiency, and identifies a range of open problems for the community.
- Published
- 2014
50. Bird's eye view on 'big data management'
- Author
-
Jetti Suresh
- Subjects
Computer science ,business.industry ,Data quality ,Big data ,Data analysis ,business ,Data science ,Data proliferation ,Data warehouse ,Data migration ,Data administration ,Data virtualization - Abstract
Data usage has grown immensely in today's data sound and information poor technology age. Automated data collection tools and processes have led to the massive size of data stored in data marts and data warehouses, but a considerable amount of this data goes unutilized for information discovery. Data analytics in such enormous data has become increasingly challenging due to the growing variety, velocity, volume, veracity, value of underlying data elements. The five V's of data contributes the term “Big Data”. Big data with silos of Zetta bytes of data being now available for decision making in the form of social media, network data, enterprise operation and legacy databases, etc. The objective of this paper is to define Big Data and empirically discern the Big Data opportunities, challenges, applications, and the technologies
- Published
- 2014
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.