113 results on '"Coding conventions"'
Search Results
2. A Model-Driven Approach for the Management and Enforcement of Coding Conventions
- Author
-
Elder Rodrigues, Jose D'Abruzzo Pereira, and Leonardo Montecchi
- Subjects
Coding standards ,coding conventions ,model-driven engineering ,domain-specific languages ,static analysis ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Coding conventions are a means to improve the reliability of software systems, and they are especially useful to avoid the introduction of known bugs or security flaws. However, coding rules typically come in the form of text written in natural language, which makes them hard to manage and to enforce. Following the model-driven engineering principles, in this paper we propose an approach for the management and enforcement of coding conventions using structured models. We define the Coding Conventions Specification Language (CCSL), a language to define coding rules as structured specifications, from which checkers are derived automatically by code generation. To evaluate our approach, we run a thorough experiment on 8 real open-source projects and 77 coding rules for the Java language, comparing the violations identified by our checkers with those reported by the PMD static analysis tool. The obtained results are promising and confirm the feasibility of the approach. The experiment also revealed that textual coding rules rarely document all the necessary information to write a reliable checker.
- Published
- 2023
- Full Text
- View/download PDF
3. Learning natural coding conventions
- Author
-
Allamanis, Miltiadis, Sutton, Charles, and Gordon, Andrew
- Subjects
machine learning ,software engineering ,coding conventions - Abstract
Coding conventions are ubiquitous in software engineering practice. Maintaining a uniform coding style allows software development teams to communicate through code by making the code clear and, thus, readable and maintainable—two important properties of good code since developers spend the majority of their time maintaining software systems. This dissertation introduces a set of probabilistic machine learning models of source code that learn coding conventions directly from source code written in a mostly conventional style. This alleviates the coding convention enforcement problem, where conventions need to first be formulated clearly into unambiguous rules and then be coded in order to be enforced; a tedious and costly process. First, we introduce the problem of inferring a variable’s name given its usage context and address this problem by creating Naturalize — a machine learning framework that learns to suggest conventional variable names. Two machine learning models, a simple n-gram language model and a specialized neural log-bilinear context model are trained to understand the role and function of each variable and suggest new stylistically consistent variable names. The neural log-bilinear model can even suggest previously unseen names by composing them from subtokens (i.e. sub-components of code identifiers). The suggestions of the models achieve 90% accuracy when suggesting variable names at the top 20% most confident locations, rendering the suggestion system usable in practice. We then turn our attention to the significantly harder method naming problem. Learning to name methods, by looking only at the code tokens within their body, requires a good understating of the semantics of the code contained in a single method. To achieve this, we introduce a novel neural convolutional attention network that learns to generate the name of a method by sequentially predicting its subtokens. This is achieved by focusing on different parts of the code and potentially directly using body (sub)tokens even when they have never been seen before. This model achieves an F1 score of 51% on the top five suggestions when naming methods of real-world open-source projects. Learning about naming code conventions uses the syntactic structure of the code to infer names that implicitly relate to code semantics. However, syntactic similarities and differences obscure code semantics. Therefore, to capture features of semantic operations with machine learning, we need methods that learn semantic continuous logical representations. To achieve this ambitious goal, we focus our investigation on logic and algebraic symbolic expressions and design a neural equivalence network architecture that learns semantic vector representations of expressions in a syntax-driven way, while solely retaining semantics. We show that equivalence networks learn significantly better semantic vector representations compared to other, existing, neural network architectures. Finally, we present an unsupervised machine learning model for mining syntactic and semantic code idioms. Code idioms are conventional “mental chunks” of code that serve a single semantic purpose and are commonly used by practitioners. To achieve this, we employ Bayesian nonparametric inference on tree substitution grammars. We present a wide range of evidence that the resulting syntactic idioms are meaningful, demonstrating that they do indeed recur across software projects and that they occur more frequently in illustrative code examples collected from a Q&A site. These syntactic idioms can be used as a form of automatic documentation of coding practices of a programming language or an API. We also mine semantic loop idioms, i.e. highly abstracted but semantic-preserving idioms of loop operations. We show that semantic idioms provide data-driven guidance during the creation of software engineering tools by mining common semantic patterns, such as candidate refactoring locations. This gives data-based evidence to tool, API and language designers about general, domain and project-specific coding patterns, who instead of relying solely on their intuition, can use semantic idioms to achieve greater coverage of their tool or new API or language feature. We demonstrate this by creating a tool that suggests loop refactorings into functional constructs in LINQ. Semantic loop idioms also provide data-driven evidence for introducing new APIs or programming language features.
- Published
- 2017
4. A language-independent static checking system for coding conventions
- Author
-
Mount, Sarah and Newman, Robert
- Subjects
005.13 ,Static analysis ,static checker ,coding conventions ,intermediate language ,intermediate format - Abstract
Despite decades of research aiming to ameliorate the difficulties of creating software, programming still remains an error-prone task. Much work in Computer Science deals with the problem of specification, or writing the right program, rather than the complementary problem of implementation, or writing the program right. However, many desirable software properties (such as portability) are obtained via adherence to coding standards, and therefore fall outside the remit of formal specification and automatic verification. Moreover, code inspections and manual detection of standards violations are time consuming. To address these issues, this thesis describes Exstatic, a novel framework for the static detection of coding standards violations. Unlike many other static checkers Exstatic can be used to examine code in a variety of languages, including program code, in-line documentation, markup languages and so on. This means that checkable coding standards adhered to by a particular project or institution can be handled by a single tool. Consequently, a major challenge in the design of Exstatic has been to invent a way of representing code from a variety of source languages. Therefore, this thesis describes ICODE, which is an intermediate language suitable for representing code from a number of different programming paradigms. To substantiate the claim that ICODE is a universal intermediate language, a proof strategy has been developed: for a number of different programming paradigms (imperative, declarative, etc.), a proof is constructed to show that semantics-preserving translation exists from an exemplar language (such as IMP or PCF) to ICODE. The usefulness of Exstatic has been demonstrated by the implementation of a number of static analysers for different languages. This includes a checker for technical documentation written in Javadoc which validates documents against the Sun Microsystems (now Oracle) Coding Conventions and a checker for HTML pages against a site-specifc standard. A third system is targeted at a variant of the Python language, written by the author, called python-csp, based on Hoare's Communicating Sequential Processes.
- Published
- 2013
5. Styler: learning formatting conventions to repair Checkstyle violations
- Author
-
Loriot, Benjamin, Madeiral, Fernanda, and Monperrus, Martin
- Published
- 2022
- Full Text
- View/download PDF
6. Establishing Clinical Swallowing Assessment Services via Telepractice: A Multisite Implementation Evaluation
- Author
-
Lisa Baker, Clare L. Burns, Rukmani Rusch, Jodie Turvey, Amy Gray, Natalie Winter, Brooke Cowie, Elizabeth C. Ward, Sarah Barnes, and Robyn Saxon
- Subjects
Service (business) ,Linguistics and Language ,Process management ,Coding conventions ,Post implementation ,Computer science ,MEDLINE ,Implementation evaluation ,Deglutition ,Speech and Hearing ,Resource (project management) ,Otorhinolaryngology ,Facilitator ,Communication Disorders ,Developmental and Educational Psychology ,Humans ,Implementation research ,Delivery of Health Care - Abstract
Purpose While research has confirmed the feasibility and validity of delivering clinical swallowing evaluations (CSEs) via telepractice, challenges exist for clinical implementation. Using an implementation framework, strategies that supported implementation of CSE services via telepractice within 18 regional/rural sites across five health services were examined. Method A coordinated implementation strategy involving remote training and support was provided to 18 sites across five health services (five hub and spoke services) that had identified a need to implement CSEs via telepractice. Experiences of all 10 speech-language pathologists involved at the hub sites were examined via interviews 1 year post implementation. Interview content was coded using the Consolidated Framework for Implementation Research (CFIR) and constructs were rated for strength and direction of influence, using published CFIR coding conventions. Results Services were established and are ongoing at all sites. Although there were site-specific differences, 10 CFIR constructs were positive influencing factors at all five sites. The telepractice model was perceived to provide clear advantages for the service, and clinicians were motivated by positive patient response. Strategies used to support implementation, including having a well-organized implementation resource and an external facilitator who worked closely with the local champions, were highly valued. Two CFIR constructs, Structural Characteristics and Available Resources , were challenges for all sites. Conclusions A complex interplay of factors influenced service implementation at each site. A strong local commitment to improving patient care, and the assistance of targeted strategies to support local implementation were viewed as central to enabling implementation.
- Published
- 2021
7. Using triple graph grammars to realise incremental round‐trip engineering.
- Author
-
Buchmann, Thomas and Westfechtel, Bernhard
- Abstract
Model‐driven software engineering is supported with the help of model transformations. At present, the technology for defining and executing uni‐directional batch transformations seems to be fairly well developed, while bidirectional and incremental transformations are more difficult to handle. In this study, the authors present a bidirectional and incremental transformation tool for round‐trip engineering between class diagrams and Java source code. Unlike other approaches, the tool may work with arbitrary Java code rather than only with source code following specific coding conventions. For its realisation, they selected triple graph grammars (TGGs) because they allow to generate bidirectional incremental transformations from a single set of undirected rules. When applying TGGs, they observed several strengths and weaknesses which are also discussed in this study. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
8. A Look into Programmers’ Heads
- Author
-
Sven Apel, André Brechmann, Janet Siegmund, Chris Parnin, Anja Bethmann, Gunter Saake, Norman Peitek, Christian Kästner, and Thomas Leich
- Subjects
Source code ,Coding conventions ,Java ,Computer science ,Working memory ,media_common.quotation_subject ,Program comprehension ,020207 software engineering ,Cognition ,02 engineering and technology ,Human–computer interaction ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,Programmer ,computer ,Software ,media_common ,computer.programming_language - Abstract
Program comprehension is an important, but hard to measure cognitive process. This makes it difficult to provide suitable programming languages, tools, or coding conventions to support developers in their everyday work. Here, we explore whether functional magnetic resonance imaging (fMRI) is feasible for soundly measuring program comprehension. To this end, we observed 17 participants inside an fMRI scanner while they were comprehending source code. The results show a clear, distinct activation of five brain regions, which are related to working memory, attention, and language processing, which all fit well to our understanding of program comprehension. Furthermore, we found reduced activity in the default mode network, indicating the cognitive effort necessary for program comprehension. We also observed that familiarity with Java as underlying programming language reduced cognitive effort during program comprehension. To gain confidence in the results and the method, we replicated the study with 11 new participants and largely confirmed our findings. Our results encourage us and, hopefully, others to use fMRI to observe programmers and, in the long run, answer questions, such as: How should we train programmers? Can we train someone to become an excellent programmer? How effective are new languages and tools for program comprehension?
- Published
- 2020
9. Recognizing lines of code violating company-specific coding guidelines using machine learning
- Author
-
Regina Hebig, Wilhelm Meding, Miroslaw Ochodek, Miroslaw Staron, and Gert Frost
- Subjects
Code review ,Source lines of code ,Source code ,Coding conventions ,business.industry ,Computer science ,media_common.quotation_subject ,Software development ,020207 software engineering ,Static program analysis ,02 engineering and technology ,computer.software_genre ,Software quality ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,business ,Software engineering ,computer ,Software ,media_common ,Codebase - Abstract
Software developers in big and medium-size companies are working with millions of lines of code in their codebases. Assuring the quality of this code has shifted from simple defect management to proactive assurance of internal code quality. Although static code analysis and code reviews have been at the forefront of research and practice in this area, code reviews are still an effort-intensive and interpretation-prone activity. The aim of this research is to support code reviews by automatically recognizing company-specific code guidelines violations in large-scale, industrial source code. In our action research project, we constructed a machine-learning-based tool for code analysis where software developers and architects in big and medium-sized companies can use a few examples of source code lines violating code/design guidelines (up to 700 lines of code) to train decision-tree classifiers to find similar violations in their codebases (up to 3 million lines of code). Our action research project consisted of (i) understanding the challenges of two large software development companies, (ii) applying the machine-learning-based tool to detect violations of Sun’s and Google’s coding conventions in the code of three large open source projects implemented in Java, (iii) evaluating the tool on evolving industrial codebase, and (iv) finding the best learning strategies to reduce the cost of training the classifiers. We were able to achieve the average accuracy of over 99% and the average F-score of 0.80 for open source projects when using ca. 40K lines for training the tool. We obtained a similar average F-score of 0.78 for the industrial code but this time using only up to 700 lines of code as a training dataset. Finally, we observed the tool performed visibly better for the rules requiring to understand a single line of code or the context of a few lines (often allowing to reach the F-score of 0.90 or higher). Based on these results, we could observe that this approach can provide modern software development companies with the ability to use examples to teach an algorithm to recognize violations of code/design guidelines and thus increase the number of reviews conducted before the product release. This, in turn, leads to the increased quality of the final software.
- Published
- 2019
10. Understanding Code Smell Detection via Code Review: A Study of the OpenStack Community
- Author
-
Xiaofeng Han, Peng Liang, Yajing Luo, Steve Counsell, and Amjed Tahir
- Subjects
Code review ,Coding conventions ,Computer science ,business.industry ,Code smell ,Context (language use) ,computer.software_genre ,Software quality ,Code (semiotics) ,Computer Science - Software Engineering ,Code refactoring ,InformationSystems_MISCELLANEOUS ,Software engineering ,business ,computer ,Software quality control - Abstract
Code review plays an important role in software quality control. A typical review process would involve a careful check of a piece of code in an attempt to find defects and other quality issues/violations. One type of issues that may impact the quality of the software is code smells - i.e., bad programming practices that may lead to defects or maintenance issues. Yet, little is known about the extent to which code smells are identified during code reviews. To investigate the concept behind code smells identified in code reviews and what actions reviewers suggest and developers take in response to the identified smells, we conducted an empirical study of code smells in code reviews using the two most active OpenStack projects (Nova and Neutron). We manually checked 19,146 review comments obtained by keywords search and random selection, and got 1,190 smell-related reviews to study the causes of code smells and actions taken against the identified smells. Our analysis found that 1) code smells were not commonly identified in code reviews, 2) smells were usually caused by violation of coding conventions, 3) reviewers usually provided constructive feedback, including fixing (refactoring) recommendations to help developers remove smells, and 4) developers generally followed those recommendations and actioned the changes. Our results suggest that 1) developers should closely follow coding conventions in their projects to avoid introducing code smells, and 2) review-based detection of code smells is perceived to be a trustworthy approach by developers, mainly because reviews are context-sensitive (as reviewers are more aware of the context of the code given that they are part of the project's development team)., Comment: The 29th IEEE/ACM International Conference on Program Comprehension (ICPC)
- Published
- 2021
11. Verbesserung der Softwarequalität durch Code Optimierung. Eine Analyse über bestimmte Programmierparadigmen
- Author
-
Fleck, Bernhard
- Subjects
Softwarequalität ,Clean Code ,Coding Conventions ,Testing Frameworks ,BDD ,Konventionen der Programmierung ,Test Werkzeuge ,DDD ,TDD ,Software Quality - Abstract
Wir sind in einer Zeit angekommen in der Softwareentwicklung nun nicht mehr das ist was es früher einmal war. Während damals das Endprodukt einfach nur funktionieren sollte, sind heutzutage die Anforderungen ganz andere. Nun soll ein Programm nahezu keine Bugs beinhalten, neue Features sollen laufend geliefert werden, natürlich so schnell wie möglich und noch vieles mehr. Um diesen Ansprüchen der Ökonomie gerecht werden zu können, mussten sich die Qualitätsanforderungen drastisch verändern. Aus diesem Grund ist das Ziel dieser Arbeit herauszufinden, mit welchem programmiertechnischen Basiswissen man die Qualität in der Entwicklung anheben könnte, sodass weniger Bugs entstehen und der Quellcode zusätzlich leichter wartbar wird. Als Ergebnis einer Literaturrecherche hat sich herausgestellt, dass sich gewisse Konventionen wie Clean-Code dafür als sehr nützlich erweisen und sie gepaart mit einer hohen Testabdeckung das Fundament einer hochqualitativen Software bilden. Daher werden in dieser Arbeit die genannten Konzepte von Grund auf wiedergegeben und abgerundet werden sie am Ende mit den dafürsprechenden Fakten. Sie sollen vermitteln, warum es für jede/n Softwareentwickler/in ein Anliegen sein sollte, ihre/seine kognitive Arbeit erneut zu überdenken und einen gewissen Standard einzuhalten. We are living in a time where the production of software has changed tremendously. It is not created the way we used to know many years ago. Back then, the purpose behind a released product was to work only. But now the requirements differ immensely. In the best case, a program shall not have any bugs, features shall be delivered continuously, as soon as possible and many more demands are requested. To meet these criteria of the economy the requirements of quality had to change as well. For this reason, the aim of this thesis is to ascertain which basic programming know-how is needed to lift the quality during the development stage so that less bugs arise and the source code becomes easier to maintain. As a result of a literature research, it turned out that certain conventions like clean code paired with a high coverage of tests are very considerable to build the fundament of a high quality software. Consequently, those concepts are presented and they are complemented by appropriate facts. Those should demonstrate why it is such a huge concern that every developer should refactor their cognitive work and comply with a certain standard.
- Published
- 2021
12. Approach of a Coding Conventions for Warning and Suggestion in Transpiler for Rust Convert to RTL
- Author
-
Keisuke Takano, Tetsuya Oda, and Masaki Kohata
- Subjects
Coding conventions ,Computer science ,Programming language ,Hardware description language ,computer.software_genre ,Code (semiotics) ,Logic gate ,Hardware design languages ,Field-programmable gate array ,Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION ,computer ,Hardware_LOGICDESIGN ,Rust (programming language) ,computer.programming_language ,Register-transfer level - Abstract
The logic circuit design in Field-Programmable Gate Arrays (FPGA) has few method warnings or suggestions based on coding rules. In our previous work, we implement a transpiler for Rust programming language to convert to Register Transfer Level (RTL). The transpiler is designed to encourage learners of hardware description language (HDL) to design highly readable code. Rust has several warnings and suggestions for coding conventions. However, the transpiler does not cover specific errors of the FPGAs. In this work, we implement warnings and suggestions to promote to learn FPGAs.
- Published
- 2020
13. Does code review really remove coding convention violations?
- Author
-
Giovanni Rosa, Chaiyong Ragkhitwetsagul, Jens Krinke, DongGyun Han, and Matheus Paixao
- Subjects
Code review ,Coding conventions ,business.industry ,Computer science ,media_common.quotation_subject ,020206 networking & telecommunications ,020207 software engineering ,02 engineering and technology ,computer.software_genre ,Computer security ,Software quality ,Convention ,Surprise ,Software ,Technical debt ,0202 electrical engineering, electronic engineering, information engineering ,business ,computer ,Coding (social sciences) ,media_common - Abstract
Many software developers perceive technical debt as the biggest problems in their projects. They also perceive code reviews as the most important process to increase code quality. As inconsistent coding style is one source of technical debt, it is no surprise that coding convention violations can lead to patch rejection during code review. However, as most research has focused on developer’s perception, it is not clear whether code reviews actually prevent the introduction of coding convention violations and the corresponding technical debt.Therefore, we investigated how coding convention violations are introduced, addressed, and removed during code review by developers. To do this, we analysed 16,442 code review requests from four projects of the Eclipse community for the introduction of convention violations. Our result shows that convention violations accumulate as code size increases despite changes being reviewed. We also manually investigated 1,268 code review requests in which convention violations disappear and observed that only a minority of them have been removed because a convention violation has been flagged in a review comment. The investigation results also highlight that one can speed up the code review process by adopting tools for code convention violation detection.
- Published
- 2020
14. Um metamodelo para apoiar a formalização de convenções de codificação
- Author
-
Rodrigues Júnior, Elder de Oliveira, 1995, Montecchi, Leonardo, 1982, França, Breno Bernard Nicolau de, Lucrédio, Daniel, Terra, Ricardo, Universidade Estadual de Campinas. Instituto de Computação, Programa de Pós-Graduação em Ciência da Computação, and UNIVERSIDADE ESTADUAL DE CAMPINAS
- Subjects
Geradores de código ,Object-oriented programming (Computer science) ,Engenharia de software auxiliada por computador ,Code generators ,Convenções de codificação ,Computer-aided software engineering ,Programação orientada a objetos (Computação) ,Coding conventions ,Static analysis ,Análise estática - Abstract
Orientador: Leonardo Montecchi Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação Resumo: As convenções de codificação são um meio de melhorar a confiabilidade dos sistemas de software. Elas podem ser estabelecidas por vários motivos, desde melhorar a legibilidade do código até evitar a introdução de falhas de segurança. No entanto, convenções de codificação geralmente vêm na forma de documentos textuais em linguagem natural, o que as torna difíceis de gerenciar e aplicar. Seguindo os princípios de engenharia orientados a modelos, nesta dissertação, propomos uma abordagem e uma linguagem para especificar convenções de codificação usando modelos estruturados. Chamamos tal linguagem de Coding Conventions Specification Language (CCSL). Também propomos uma transformação de modelo para gerar automaticamente verificadores a partir de uma especificação CCSL para encontrar violações das regras especificadas. Para avaliar a proposta, realizamos dois experimentos. O primeiro experimento tem como objetivo avaliar o metamodelo CCSL, enquanto o outro tem como objetivo verificar a capacidade dos verificadores de encontrar violações da regra especificada nos códigos Java. Os resultados obtidos são promissores e sugerem que a abordagem proposta é viável. No entanto, eles também destacam que muitos desafios ainda precisam ser superados. No primeiro experimento, analisamos um total de 216 regras individuais de dois grandes conjuntos de convenções de codificação existentes. No geral, foi possível representar 63% das regras de codificação consideradas usando nossa linguagem. No segundo experimento, selecionamos 53 regras dentre as implementadas na ferramenta PMD (um analisador de código popular) para comparar os resultados entre nossa ferramenta e a ferramenta PMD em três projetos reais. Em geral, alcançamos resultados iguais ou melhores da ferramenta PMD em mais da metade das regras selecionadas (79%), enquanto apenas 6% das regras não puderam ser especificadas usando nossa linguagem. Nas regras restantes, os resultados apresentados foram diferentes para cada uma das ferramentas. Concluímos discutindo instruções para trabalhos futuros Abstract: Coding conventions are a means to improve the reliability of software systems. They can be established for many reasons, ranging from improving the readability of code to avoiding the introduction of security flaws. However, coding conventions often come in the form of textual documents in natural language, which makes them hard to be managed and to enforced. Following model-driven engineering principles, in this dissertation we propose an approach and language for specifying coding conventions using structured models. We call this language Coding Conventions Specification Language (CCSL). We also propose a model transformation to concretely generate checkers to find violations of the rules specified with our language. To evaluate the proposal, we performed two experiments. The first experiment aims to evaluate the Coding Conventions Specification Language metamodel, while the other aims to check the capability of the derived checkers to find violations of the specified rule in Java codes. The obtained results are promising and suggest that the proposed approach is feasible. However, they also highlight that many challenges still need to be overcome. In the first experiment, we analyzed a total of 216 individual rules from two large sets of existing coding conventions. Overall, it was possible to represent 63% of the considered coding rules using our language. In the second experiment, we selected 53 rules from those implemented in the PMD tool to compare the results between our tool and the PMD tool in three real projects. In general, we achieve equal or better results of the PMD tool in more than half of the selected rules (79%), while only 6% of the rules could not be specified using our language. There are also cases where PMD performed better than our approach (9%) as well as cases where the results were different for each of the tools (6%). We conclude by discussing directions for future works Mestrado Ciência da Computação Mestre em Ciência da Computação FAPESP 2018/11129-8
- Published
- 2020
15. Deep Generation of Coq Lemma Names Using Elaborated Terms
- Author
-
Milos Gligoric, Karl Palmskog, Junyi Jessy Li, and Pengyu Nie
- Subjects
Source code ,Parsing ,Coding conventions ,business.industry ,Computer science ,media_common.quotation_subject ,020207 software engineering ,02 engineering and technology ,computer.software_genre ,Toolchain ,Software ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Leverage (statistics) ,Artificial intelligence ,Language model ,business ,Abstract syntax tree ,computer ,Natural language processing ,media_common - Abstract
Coding conventions for naming, spacing, and other essentially stylistic properties are necessary for developers to effectively understand, review, and modify source code in large software projects. Consistent conventions in verification projects based on proof assistants, such as Coq, increase in importance as projects grow in size and scope. While conventions can be documented and enforced manually at high cost, emerging approaches automatically learn and suggest idiomatic names in Java-like languages by applying statistical language models on large code corpora. However, due to its powerful language extension facilities and fusion of type checking and computation, Coq is a challenging target for automated learning techniques. We present novel generation models for learning and suggesting lemma names for Coq projects. Our models, based on multi-input neural networks, are the first to leverage syntactic and semantic information from Coq ’s lexer (tokens in lemma statements), parser (syntax tree s), and kernel (elaborated terms) for naming; the key insight is that learning from elaborated terms can substantially boost model performance. We implemented our models in a toolchain, dubbed Roosterize, and applied it on a large corpus of code derived from the Mathematical Components family of projects, known for its stringent coding conventions. Our results show that Roosterize substantially outperforms baselines for suggesting lemma names, highlighting the importance of using multi-input models and elaborated terms.
- Published
- 2020
16. Technical Debt indication in PLC Code for automated Production Systems: Introducing a Domain Specific Static Code Analysis Tool
- Author
-
Safa Bougouffa, Sebastian Diehm, Fabian Gemein, Birgit Vogel-Heuser, and Quang Huan Dong
- Subjects
Coding conventions ,Computer science ,business.industry ,020209 energy ,020207 software engineering ,Static program analysis ,02 engineering and technology ,Software metric ,Domain (software engineering) ,Software ,Control and Systems Engineering ,Technical debt ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,Software engineering ,business - Abstract
Nowadays, technical debt (TD) has become a well-known metaphor signifying long-term consequences of short-term benefits in system development. Accumulating TD can cause severe maintenance effort, and thus affect the quality of the system. Identifying and managing TD through appropriate methods and tools can be a first step towards preventing TD accumulation. Static code analysis is a technique widely used to identify TD at code level in software engineering domain and various tools were developed accordingly. However, tools for identifying TD in technical systems such as automated production systems (aPS) that are mainly controlled by Programmable Logic Controller (PLC) implemented in IEC 61131-3 programming languages are rare. Therefore, this paper presents a tool that uses static code analysis with the application of software quality metrics and coding conventions enabling the PLC software developer to identify TD and evaluate it.
- Published
- 2018
17. Using triple graph grammars to realise incremental round‐trip engineering
- Author
-
Bernhard Westfechtel and Thomas Buchmann
- Subjects
0209 industrial biotechnology ,Source code ,Theoretical computer science ,Coding conventions ,Java ,Computer science ,media_common.quotation_subject ,Realisation ,020207 software engineering ,02 engineering and technology ,Round-trip engineering ,Computer Graphics and Computer-Aided Design ,Set (abstract data type) ,020901 industrial engineering & automation ,Transformation (function) ,0202 electrical engineering, electronic engineering, information engineering ,Class diagram ,computer ,media_common ,computer.programming_language - Abstract
Model-driven software engineering is supported with the help of model transformations. At present, the technology for defining and executing uni-directional batch transformations seems to be fairly well developed, while bidirectional and incremental transformations are more difficult to handle. In this study, the authors present a bidirectional and incremental transformation tool for round-trip engineering between class diagrams and Java source code. Unlike other approaches, the tool may work with arbitrary Java code rather than only with source code following specific coding conventions. For its realisation, they selected triple graph grammars (TGGs) because they allow to generate bidirectional incremental transformations from a single set of undirected rules. When applying TGGs, they observed several strengths and weaknesses which are also discussed in this study.
- Published
- 2016
18. CocoQa: Question Answering for Coding Conventions Over Knowledge Graphs
- Author
-
Tianjiao Du, Wei Li, Qinyue Wu, Junming Cao, Beijun Shen, and Yuting Chen
- Subjects
Information retrieval ,Coding conventions ,Computer science ,0202 electrical engineering, electronic engineering, information engineering ,Question answering ,SPARQL ,020207 software engineering ,02 engineering and technology ,computer.file_format ,Programmer ,computer ,Software quality ,Blossom algorithm - Abstract
Coding convention plays an important role in guaranteeing software quality. However, coding conventions are usually informally presented and inconvenient for programmers to use. In this paper, we present CocoQa, a system that answers programmer's questions about coding conventions. CocoQa answers questions by querying a knowledge graph for coding conventions. It employs 1) a subgraph matching algorithm that parses the question into a SPARQL query, and 2) a machine comprehension algorithm that uses an end-to-end neural network to detect answers from searched paragraphs. We have implemented CocoQa, and evaluated it on a coding convention QA dataset. The results show that CocoQa can answer questions about coding conventions precisely. In particular, CocoQa can achieve a precision of 82.92% and a recall of 91.10%. Repository: https://github.com/14dtj/CocoQa/ Video: https://youtu.be/VQaXi1WydAU
- Published
- 2019
19. Constructing a Knowledge Base of Coding Conventions from Online Resources
- Author
-
Yuting Chen, Junming Cao, Qinyue Wu, Tianjiao Du, Wei Li, and Beijun Shen
- Subjects
World Wide Web ,Coding conventions ,Knowledge base ,Computer science ,business.industry ,business - Published
- 2019
20. Python Coding Style Compliance on Stack Overflow
- Author
-
Robert White, Gazi Oznacar, Martin Cabello Salazar, Niels Boecker, Jens Krinke, Wenjie Boon, and Nikolaos Bafatakis
- Subjects
Information retrieval ,Coding conventions ,Computer science ,business.industry ,media_common.quotation_subject ,020207 software engineering ,02 engineering and technology ,Python (programming language) ,Recommender system ,Software ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Stack overflow ,business ,License ,computer ,computer.programming_language ,Coding (social sciences) ,Reputation ,media_common - Abstract
Software developers all over the world use Stack Overflow (SO) to interact and exchange code snippets. Research also uses SO to harvest code snippets for use with recommendation systems. However, previous work has shown that code on SO may have quality issues, such as security or license problems. We analyse Python code on SO to determine its coding style compliance. From 1,962,535 code snippets tagged with 'python', we extracted 407,097 snippets of at least 6 statements of Python code. Surprisingly, 93.87% of the extracted snippets contain style violations, with an average of 0.7 violations per statement and a huge number of snippets with a considerably higher ratio. Researchers and developers should, therefore, be aware that code snippets on SO may not representative of good coding style. Furthermore, while user reputation seems to be unrelated to coding style compliance, for posts with vote scores in the range between -10 and 20, we found a strong correlation (r = -0.87, p < 10^-7) between the vote score a post received and the average number of violations per statement for snippets in such posts.
- Published
- 2019
21. Towards a Structured Specification of Coding Conventions
- Author
-
Leonardo Montecchi and Elder Rodrigues
- Subjects
Domain-specific language ,Coding conventions ,Computer science ,business.industry ,020207 software engineering ,02 engineering and technology ,Static analysis ,Readability ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Software system ,Model-driven architecture ,Software engineering ,business ,computer ,Natural language ,computer.programming_language ,Coding (social sciences) - Abstract
Coding conventions are a means to improve the reliability of software systems. They can be established for many reasons, ranging from improving the readability of code to avoiding the introduction of security flaws. However, coding conventions often come in the form of textual documents in natural language, which makes them hard to manage and to enforce. Following model-driven engineering principles, in this paper we propose an approach and language for specifying coding conventions using structured models. We ran a feasibility study, in which we applied our language for specifying 215 coding rules from two popular rulesets. The obtained results are promising and suggest that the proposed approach is feasible. However, they also highlight that many challenges still need to be overcome. We conclude with an overview on the ongoing work for generating automated checkers from such models, and we discuss directions for an objective evaluation of the methodology.
- Published
- 2019
- Full Text
- View/download PDF
22. Semantic Source Code Models Using Identifier Embeddings
- Author
-
Vasiliki Efstathiou and Diomidis Spinellis
- Subjects
FOS: Computer and information sciences ,Source code ,Coding conventions ,Programming language ,Computer science ,business.industry ,media_common.quotation_subject ,Software development ,020207 software engineering ,02 engineering and technology ,Software maintenance ,Python (programming language) ,computer.software_genre ,Formal methods ,Identifier ,Software Engineering (cs.SE) ,Computer Science - Software Engineering ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,business ,computer ,Natural language ,computer.programming_language ,media_common - Abstract
The emergence of online open source repositories in the recent years has led to an explosion in the volume of openly available source code, coupled with metadata that relate to a variety of software development activities. As an effect, in line with recent advances in machine learning research, software maintenance activities are switching from symbolic formal methods to data-driven methods. In this context, the rich semantics hidden in source code identifiers provide opportunities for building semantic representations of code which can assist tasks of code search and reuse. To this end, we deliver in the form of pretrained vector space models, distributed code representations for six popular programming languages, namely, Java, Python, PHP, C, C++, and C#. The models are produced using fastText, a state-of-the-art library for learning word representations. Each model is trained on data from a single programming language; the code mined for producing all models amounts to over 13.000 repositories. We indicate dissimilarities between natural language and source code, as well as variations in coding conventions in between the different programming languages we processed. We describe how these heterogeneities guided the data preprocessing decisions we took and the selection of the training parameters in the released models. Finally, we propose potential applications of the models and discuss limitations of the models., 16th International Conference on Mining Software Repositories (MSR 2019): Data Showcase Track
- Published
- 2019
- Full Text
- View/download PDF
23. Compute mindlessly. Not! map consciously
- Author
-
Pia Niemelä, V. Mikkolainen, J. Vuorinen, Tampere University, Research area: Software engineering, and Computing Sciences
- Subjects
Syllabus ,Elementary mathematics ,Higher education ,Coding conventions ,business.industry ,Concept map ,Taxonomy (general) ,Knowledge building ,Mathematics education ,111 Mathematics ,National curriculum ,business ,Education - Abstract
This paper utilizes concept mapping as a tool for conscious and deliberate knowledge building in mathematics and its extension to algorithms. Currently, alleged defects in mathematics education are obvious: instead of conceptual elaboration, everyday praxis relies on routine computations that are likely to lead into alienated concepts with weak connections to prior knowledge. A concept map visualizes the existing conceptual structure, and whenever new information is brought in, it will be placed in the map by clearly explicating its linkage to the previous concepts. In the Finnish mathematics education, such new knowledge is programming content that is integrated into elementary school mathematics in 2014 Finnish National Curriculum. This content is crystallized as the requirements of computational and algorithmic thinking, the utilization of respective data structures, and adequate amount of hands-on practice to internalize good coding conventions. This study examines secondary (N = 19) and higher education students (N = 10) and their conceptual knowledge of mathematics concentrating on the domain of algorithms in particular. The concept maps drawn by the students are evaluated using the SOLO taxonomy. To conclude, a consensus map of algorithms is represented and linked to the elementary mathematics syllabus. publishedVersion
- Published
- 2018
24. On the usage of pythonic idioms
- Author
-
Carol V. Alexandru, José J. Merchante, Sebastian Proksch, Sebastiano Panichella, Harald C. Gall, Gregorio Robles, and University of Zurich
- Subjects
Vocabulary ,Source code ,Coding conventions ,10009 Department of Informatics ,Computer science ,business.industry ,media_common.quotation_subject ,Software development ,020207 software engineering ,005: Computerprogrammierung, Programme und Daten ,02 engineering and technology ,000 Computer science, knowledge & systems ,Python (programming language) ,World Wide Web ,Software ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,business ,Software architecture ,Implementation ,computer ,media_common ,computer.programming_language - Abstract
Developers discuss software architecture and concrete source code implementations on a regular basis, be it on question-answering sites, online chats, mailing lists or face to face. In many cases, there is more than one way of solving a programming task. Which way is best may be decided based on case-specific circumstances and constraints, but also based on convention. Having strong conventions, and a common vocabulary to express them, simplifies communication and strengthens common understanding of software development problems and their solutions. While many programming ecosystems have a common vocabulary, Python’s relationship to conventions and common language is a particularly pronounced. The “Zen of Python”, a famous set of high-level coding conventions authored by Tim Peters, states “There should be one, and preferably only one, obvious way to do it”. This ‘one way to do it’ is often referred to as the ‘Pythonic’ way: the ideal solution to a particular problem. Few other programming languages have coined a unique term to label the quality of craftsmanship gone into a software artifact. In this paper, we explore how Python developers understand the term ‘Pythonic’ by means of structured interviews, build a catalogue of ‘pythonic idioms’ gathered from literature, and conjecture on the effects of having a language-specific term for quality code, considering the potential it could hold for other programming languages and ecosystems. We find that while the term means different things to novice versus experienced Python developers, it encompasses not only concrete implementation, but a way of thinking - a culture - in general.
- Published
- 2018
25. An agile software engineering course with product hand-off
- Author
-
Jason B. Shepherd
- Subjects
Engineering management ,Transformative learning ,Documentation ,Coding conventions ,Point (typography) ,Computer science ,Event (computing) ,ComputingMilieux_COMPUTERSANDEDUCATION ,Product (category theory) ,Peer learning ,Course (navigation) - Abstract
This paper describes a novel design for an agile software engineering course that emphasizes keeping product artifacts updated throughout development. The signature transformative event in the course is the mid-semester project "hand-off," at which point teams trade projects with other student teams and must make immediate progress despite no prior knowledge of the new project's design, coding conventions, or documentation. Course features are described along with their implementation and assessment.
- Published
- 2018
26. Impacts of coding practices on readability
- Author
-
Rodrigo Magalhaes dos Santos and Marco Aurélio Gerosa
- Subjects
Knowledge management ,Java ,Coding conventions ,business.industry ,Computer science ,media_common.quotation_subject ,05 social sciences ,020207 software engineering ,02 engineering and technology ,Readability ,Programming style ,Software ,Code readability ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,Code comprehension ,business ,computer ,050203 business & management ,Coding (social sciences) ,media_common ,computer.programming_language - Abstract
Several conventions and standards aim to improve maintain-ability of software code. However, low levels of code readabil- ity perceived by developers still represent a barrier to their daily work. In this paper, we describe a survey that assessed the impact of a set of Java coding practices on the readability perceived by software developers. While some practices pro- moted an enhancement of readability, others did not show statistically signicant effects. Interestingly, one of the prac- tices worsened the readability. Our results may help to iden- tify coding conventions with a positive impact on readability and, thus, guide the creation of coding standards.
- Published
- 2018
27. Do Java programmers write better Python? Studying off-language code quality on GitHub
- Author
-
Siegfried Horschig, Toni Mattis, and Robert Hirschfeld
- Subjects
Java ,Coding conventions ,Programming language ,Computer science ,Best practice ,Language code ,020207 software engineering ,02 engineering and technology ,Python (programming language) ,computer.software_genre ,Software quality ,Readability ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,computer ,computer.programming_language ,Coding (social sciences) - Abstract
There are style guides and best practices for many programming languages. Their goal is to promote uniformity and readability of code, consequentially reducing the chance of errors. While programmers who are frequently using the same programming language tend to internalize most of its best practices eventually, little is known about what happens when they casually switch languages and write code in a less familiar language. Insights into the factors that lead to coding convention violations could help to improve tutorials for programmers switching languages, make teachers aware of mistakes they might expect depending on what language students have been using before, or influence the order in which programming languages are taught. To approach this question, we make use of a large-scale data set representing a major part of the open source development activity happening on GitHub. In this data set, we search for Java and C++ programmers that occasionally program Python and study their Python code quality using a lint tool. Comparing their defect rates to those from Python programmers reveals significant effects in both directions: We observe that some of Python's best practices have more widespread adoption among Java and C++ programmers than Python experts. At the same time, python-specific coding conventions, especially indentation, scoping, and the use of semicolons, are violated more frequently. We conclude that programming off-language is not generally associated with better or worse code quality, but individual coding conventions are violated more or less frequently depending on whether they are more universal or language-specific. We intend to motivate a discussion and more research on what causes these effects, how we can mitigate or use them for good, and which related effects can be studied using the presented data set.
- Published
- 2018
28. Towards safer programming language constructs
- Author
-
Baráth, Áron
- Subjects
Syntax (programming languages) ,Coding conventions ,Computer science ,Semantics (computer science) ,business.industry ,Programming language ,Software development ,computer.software_genre ,Backward compatibility ,Control flow ,Primitive wrapper class ,business ,computer ,Compile time - Abstract
Current mainstream programming languages suffer numerous safety issues that originally introduced to be convenient and practical, but later turned out these features can be harmful. Although, these programming languages continuously evolving, they usually have a limitation due to backward compatibility. New languages are being created today to fix issues or introduce new paradigms. In thesis 1 we describe various software issues related to language syntax based on real industrial experiences. To avoid these problems we suggest strict coding conventions for existing languages and rigorous syntactical rules for new languages. Rules include to prefer immutable memory as default function arguments, more expressive control flow and better ways to define operators. Current mainstream languages also suffer some semantical problems what we discuss in thesis 2. Implicit conversions usually allowed in this languages but they may lead to unwanted behavior. To avoid such cases in C++ we introduced a wrapper class based solution. The C++11 move semantics may reduce copy operations when implemented properly. We specified an algorithm and implemented a prototype tool to detect possible misuse of the move semantics. It is well-known that the costs of software development is only the fragment of its all maintenance cycle. In order to reduce maintenance costs, we may demand various services from the programming languages. We analyze such features in thesis 3. As an example, mainstream languages give no support for handle the binary compatibility issues among different library versions. Compile time testing is also rarely supported. We suggest solutions for these problems. In order to prove that the design decisions we proposed in the earlier theses viable, we implemented Welltype, a prototype programming language that designed according our findings. Our language, is imperative with additional multi-paradigm language elements and available as open source with a full development tool-chain.
- Published
- 2018
- Full Text
- View/download PDF
29. Effect Analysis of Coding Convention Violations on Readability of Post-Delivered Code
- Author
-
Taek Lee, Jung Been Lee, and Hoh Peter In
- Subjects
Source code ,Java ,Coding conventions ,Computer science ,business.industry ,media_common.quotation_subject ,Software development ,Readability ,Software quality ,Checklist ,World Wide Web ,Empirical research ,Artificial Intelligence ,Hardware and Architecture ,Computer Vision and Pattern Recognition ,Electrical and Electronic Engineering ,business ,computer ,Software ,media_common ,computer.programming_language ,Coding (social sciences) - Abstract
Adherence to coding conventions during the code production stage of software development is essential. Benefits include enabling programmers to quickly understand the context of shared code, communicate with one another in a consistent manner, and easily maintain the source code at low costs. In reality, however, programmers tend to doubt or ignore the degree to which the quality of their code is affected by adherence to these guidelines. This paper addresses research questions such as “Do violations of coding conventions affect the readability of the produced code?”, “What kinds of coding violations reduce code readability?”, and “How much do variable factors such as developer experience, project size, team size, and project maturity influence coding violations?” To respond to these research questions, we explored 210 open-source Java projects with 117 coding conventions from the Sun standard checklist. We believe our findings and the analysis approach used in the paper will encourage programmers and QA managers to develop their own customized and effective coding style guidelines. key words: coding conventions, coding style standard, code readability, software quality, empirical study.
- Published
- 2015
30. Empirical Study of Abnormalities in Local Variables of Change-Prone Java Methods
- Author
-
Hirohisa Aman, Tomoyuki Yokogawa, Minoru Kawahara, and Sousuke Amasaki
- Subjects
Mahalanobis distance ,Coding conventions ,Java ,Computer science ,05 social sciences ,Local variable ,020207 software engineering ,02 engineering and technology ,computer.software_genre ,050105 experimental psychology ,Software quality ,Code (semiotics) ,Variable (computer science) ,Empirical research ,0202 electrical engineering, electronic engineering, information engineering ,Operating system ,0501 psychology and cognitive sciences ,computer ,Cognitive psychology ,computer.programming_language - Abstract
The naming of local variables is usually at the programmer's discretion. Thus, there is a diversity in naming local variables and this may cause variations in the code quality. Many coding conventions say that the name of a local variable can/should be short. This paper focuses on such conventions, and aims to explore the trends of local variables' names in Java and examine if abnormal local variables create harmful effects on the code quality. This paper collected data on local variables–-names, scopes and types—from six popular open source products, and proposes to evaluate their abnormalities by using the notion of the Mahalanobis distance. The empirical results report the following findings:1) The trend of naming local variables differs according to the variable type; 2) The majority of local variables have short names with narrow scopes, where a name is often a word or its abbreviated form;3) Methods having abnormal local variables are about 1.2 – 2.5 times more likely to be change-prone than the others. While the naming of local variables depends on who writes the code, there seem to be common trend of naming. Java methods with deviant local variables tend to be fixed many times after their release andcannot survive unscathed.
- Published
- 2017
31. Coding conventions and principles for a National Land-Change Modeling Framework
- Author
-
David I. Donato
- Subjects
Land change ,Coding conventions ,business.industry ,Computer science ,Software engineering ,business - Published
- 2017
32. Coding for Demographic Categories in the Creation of Legacy Corpora: Asian American Ethnic Identities
- Author
-
Amy Wing-mei Wong and Lauren Hall-Lew
- Subjects
Metadata ,Data sharing ,Linguistics and Language ,Coding conventions ,business.industry ,Data management ,Ethnic group ,Sociology ,business ,On Language ,Sociolinguistics ,Linguistics ,Coding (social sciences) - Abstract
A set of shared coding conventions for speaker ethnicity is necessary for open-source data sharing and cross-study compatibility between linguistic corpora. However, ethnicity, like many other aspects of speaker identity, is continually negotiated and reproduced in discourse, and therefore a challenge to code representatively. This paper discusses some of the challenges facing researchers who want to use, create, or contribute to existing corpora that are annotated for the ethnic identity of a speaker. We specifically problematize the macro-social label ‘Asian American’ and propose that researchers should consider different levels and types of specificity of ‘Asianness’ in order to ensure that the corpora best represent the reality of ethnic identity in the community sampled. This is particularly important given the limited incorporation of different Asian groups in most existing linguistic research). We argue that more rigorous coding for Asian American ethnicities in corpora will improve the utility of archived corpora and enhance sociolinguistic research on language variation and ethnic identity.
- Published
- 2014
33. A Study of Different Coding Styles Affecting Code Readability
- Author
-
Hoh Peter In, Taek Lee, and Jung Been Lee
- Subjects
Source code ,Coding conventions ,Multimedia ,Computer science ,business.industry ,media_common.quotation_subject ,Software maintenance ,computer.software_genre ,Readability ,Software quality ,Software metric ,Code readability ,Artificial intelligence ,business ,computer ,Software ,Natural language processing ,media_common ,Coding (social sciences) - Abstract
During software programming, code readability is very important because it affects the understanding of the code context, facilitates communication and collaboration between team members, and it avoids problematic software maintenance. In this study, our research hypothesis was that the coding style affects the code readability. Developers who do not comply with coding conventions and guidelines may be more likely to produce code with a low level of readability. To test this hypothesis, we investigated coding rule violations in the source code files of five open source projects and correlated the observed violations with the scores rated by a readability estimation function. We tested whether significant violations of coding conventions affected the readability of developed codes and identified the violations that were related specifically to low readability quality. We consider that our findings will improve the understanding of programmers so they can take appropriate action to generate better quality code.
- Published
- 2013
34. Language design and implementation for the domain of coding conventions
- Author
-
Vadim Zaytsev and Boryana Goncharenko
- Subjects
Coding conventions ,Programming language ,business.industry ,Computer science ,Open problem ,020207 software engineering ,02 engineering and technology ,Ontology (information science) ,computer.software_genre ,Domain (software engineering) ,Consistency (database systems) ,Digital subscriber line ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Domain analysis ,Artificial intelligence ,business ,computer ,Natural language ,Natural language processing - Abstract
Coding conventions are lexical, syntactic or semantic restrictions enforced on top of a software language for the sake of consistency within the source base. Specifying coding conventions is currently an open problem in software language engineering, addressed in practice by resorting to natural language descriptions which complicate conformance verification. In this paper we present an endeavour to solve this problem for the case of CSS — a ubiquitous software language used for specifying appearance of hypertextual content separately from the content itself. The paper contains the results of domain analysis, a short report on an empirically obtained catalogue of 143 unique CSS coding conventions, the domain-specific ontology for the domain of detecting violations, the design of CssCoco, a language for expressing coding conventions of CSS, as well as a description of the tool we developed to detect violations of conventions specified in this DSL.
- Published
- 2016
35. SNPConvert: SNP Array Standardization and Integration in Livestock Species
- Author
-
Ezequiel L. Nicolazzi, Gabriele Marras, and Alessandra Stella
- Subjects
0301 basic medicine ,Coding conventions ,Standardization ,Biomedical Engineering ,Bioengineering ,Single-nucleotide polymorphism ,integration ,array ,Biology ,computer.software_genre ,Biochemistry ,Article ,Set (abstract data type) ,lcsh:Biochemistry ,03 medical and health sciences ,single nucleotide polymorphism ,SNP ,lcsh:QD415-436 ,Genetics ,standardization ,030102 biochemistry & molecular biology ,software ,File format ,030104 developmental biology ,ComputingMethodologies_PATTERNRECOGNITION ,Data mining ,computer ,Biotechnology ,Coding (social sciences) ,SNP array - Abstract
One of the main advantages of single nucleotide polymorphism (SNP) array technology is providing genotype calls for a specific number of SNP markers at a relatively low cost. Since its first application in animal genetics, the number of available SNP arrays for each species has been constantly increasing. However, conversely to that observed in whole genome sequence data analysis, SNP array data does not have a common set of file formats or coding conventions for allele calling. Therefore, the standardization and integration of SNP array data from multiple sources have become an obstacle, especially for users with basic or no programming skills. Here, we describe the difficulties related to handling SNP array data, focusing on file formats, SNP allele coding, and mapping. We also present SNPConvert suite, a multi-platform, open-source, and user-friendly set of tools to overcome these issues. This tool, which can be integrated with open-source and open-access tools already available, is a first step towards an integrated system to standardize and integrate any type of raw SNP array data. The tool is available at: https://github. com/nicolazzie/SNPConvert.git.
- Published
- 2016
36. Inferring Semantic Mapping Between Policies and Code: The Clue is in the Language
- Author
-
Awais Rashid, Chris Weichel, Matthew Edwards, and Pauline Anthonysamy
- Subjects
Source code ,Operationalization ,Coding conventions ,Computer science ,media_common.quotation_subject ,020207 software engineering ,02 engineering and technology ,Computer security ,computer.software_genre ,Semantics ,Code (semiotics) ,Domain (software engineering) ,Semantic mapping ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,computer ,Natural language ,media_common - Abstract
A common misstep in the development of security and privacy solutions is the failure to keep the demands resulting from high-level policies in line with the actual implementation that is supposed to operationalize those policies. This is especially problematic in the domain of social networks, where software typically predates policies and then evolves alongside its user base and any changes in policies that arise from their interactions with and the demands that they place on the system. Our contribution targets this specific problem, drawing together the assurances actually presented to users in the form of policies and the large codebases with which developers work. We demonstrate that a mapping between policies and code can be inferred from the semantics of the natural language. These semantics manifest not only in the policy statements but also coding conventions. Our technique, implemented in a tool CASTOR, can infer semantic mappings with F1 accuracy of 70i?ź% and 78i?ź% for two social networks, Diaspora and Friendica respectively --- as compared with a ground truth mapping established through manual examination of the policies and code.
- Published
- 2016
37. High-Level Concurrency Constructs
- Author
-
Peter A. Buhr
- Subjects
Correctness ,Coding conventions ,Programming language ,Non-lock concurrency control ,Computer science ,Concurrency ,Multiversion concurrency control ,Mutual exclusion ,computer.software_genre ,Semaphore ,computer ,Blocking (computing) - Abstract
Chapter 7, p. 313 on threads and locks ended with the readers and writer problem, which is an important concurrent problem. However, it is clear that the solutions to the readers and writer problem were becoming complex; too complex to feel comfortable about the correctness of the solution without extensive analysis. While coding conventions like split-binary semaphores and baton passing give a formal design approach, the resulting programs are still complex, both to manage and maintain; basically, relying on programmers following coding conventions to assure correctness is a poor approach. As well, there is the subtle problem of properly handling staleness and writing because of the “window” for interrupt ion between releasing the entry semaphore and blocking. In essence, we have arrived at the same impasse that occurred with software solutions for mutual exclusion, that is, the complexity and inefficiency of the locking solution is increasing for more complex critical-section. Therefore, it is worthwhile to search for a different approach to reduce the complexity, as for hardware solutions for mutual exclusion. However, in this case, the programming language, not the hardware, provides the mechanism to simplify the solution.
- Published
- 2016
38. Motivating lessons: A classroom-oriented investigation of the effects of content-based instruction on EFL young learners’ motivated behaviours and classroom verbal interaction
- Author
-
Kuei-Min Huang
- Subjects
Scheme (programming language) ,Linguistics and Language ,Coding conventions ,Language and Linguistics ,Education ,Nonverbal communication ,Content-based instruction ,Pedagogy ,ComputingMilieux_COMPUTERSANDEDUCATION ,Young learners ,Language education ,Situational ethics ,Psychology ,Interpersonal interaction ,computer ,computer.programming_language - Abstract
This study investigated the impact of content-based language instruction (CBLI) on EFL young learners’ motivated behaviours, namely attention, engagement, and eager volunteering, and classroom verbal interaction. Situational factors play vital roles in shaping language learners’ motivation particularly in EFL contexts. While many private schools implement CBLI programmes in Taiwan as it has been proved elsewhere that such language programmes improve language learners’ motivation and academic performance in ESL contexts, such as US and Canada, the effects CBLI might have on EFL young learners have never been investigated in Taiwan. Twenty-five six-year-old year one primary students participated in this study. Both classroom observation implementing Spada and Frohlich’s [Spada, N., Frohlich, M., 1995. COLT Communicative Orientation of Language Teaching Observation Scheme: Coding Conventions and Applications. Macquarie University, National Centre for English Language Teaching and Research, Sydney, Australia.] Communicative Orientation of Language Teaching (COLT) observation scheme and qualitative analysis of classroom video taping revealed that learners tend to participate more actively in subject-learning classes than language-input classes and have benefited from the programme in terms of eagerness to volunteer and classroom verbal output. Although the differences of the subjects’ attention level and engagement between content-focused lessons and language-focused lessons were not evident, there was a dramatic improvement in both types of lessons over six weeks.
- Published
- 2011
39. Gamification for enforcing coding conventions
- Author
-
Matthias Jarke and Christian R. Prause
- Subjects
Engineering ,Source code ,Coding conventions ,business.industry ,media_common.quotation_subject ,Software quality ,Code (semiotics) ,Scrum ,Software ,business ,Software engineering ,Information exchange ,Agile software development ,media_common - Abstract
Software is a knowledge intensive product, which can only evolve if there is effective and efficient information exchange between developers. Complying to coding conventions improves information exchange by improving the readability of source code. However, without some form of enforcement, compliance to coding conventions is limited. We look at the problem of information exchange in code and propose gamification as a way to motivate developers to invest in compliance. Our concept consists of a technical prototype and its integration into a Scrum environment. By means of two experiments with agile software teams and subsequent surveys, we show that gamification can effectively improve adherence to coding conventions.
- Published
- 2015
40. Suggesting accurate method and class names
- Author
-
Charles Sutton, Miltiadis Allamanis, Earl T. Barr, and Christian Bird
- Subjects
Class (computer programming) ,Source code ,Theoretical computer science ,Coding conventions ,Computer science ,business.industry ,media_common.quotation_subject ,Local variable ,Context (language use) ,Space (commercial competition) ,computer.software_genre ,Variable (computer science) ,Language model ,Artificial intelligence ,business ,computer ,Natural language processing ,media_common - Abstract
Descriptive names are a vital part of readable, and hence maintainable,code. Recent progress on automatically suggesting names forlocal variables tantalizes with the prospect of replicating that successwith method and class names. However, suggesting names for methodsand classes is much more difficult. This is because good methodand class names need to be functionally descriptive, but suggestingsuch names requires that the model goes beyond local context. Weintroduce a neural probabilistic language model for source codethat is specifically designed for the method naming problem. Ourmodel learns which names are semantically similar by assigningthem to locations, called embeddings, in a high-dimensional continuousspace, in such a way that names with similar embeddings tendto be used in similar contexts. These embeddings seem to containsemantic information about tokens, even though they are learnedonly from statistical co-occurrences of tokens. Furthermore, weintroduce a variant of our model that is, to our knowledge, the firstthat can propose neologisms, names that have not appeared in thetraining corpus. We obtain state of the art results on the method,class, and even the simpler variable naming tasks. More broadly,the continuous embeddings that are learned by our model have thepotential for wide application within software engineering.
- Published
- 2015
41. Benchmarking of hospital activity data: an international comparison
- Author
-
Mark Booth, Vladimir Stevanovic, and Phil James
- Subjects
Health services ,Geography ,Actuarial science ,Coding conventions ,Strategy and Management ,Sample (statistics) ,Benchmarking ,Business and International Management ,Hospital performance ,Data limitations - Abstract
Purpose – The aim of the study is to examine the feasibility of comparing hospitals internationally and to highlight some of the barriers.Design/methodology/approach – Comparative analysis of anonymised patient‐level data from hospitals in New Zealand and the UK.Findings – Comparisons were made of aggregate statistics. For example, it was found that average length of stay and death rates in New Zealand were lower than in the UK, although average severity was higher. Adverse reactions were higher in the New Zealand sample than in that of the UK.Research limitations/implications – There were data limitations associated with different coding conventions in the two countries. There may also be different coding conventions used when classifying data. The research attempted to correct for this, but some may remain.Originality/value – There are few cross‐national comparisons of hospital performance. This paper shows that such analysis is possible. It is hoped that further effort can be put into addressing some o...
- Published
- 2005
42. Supporting Students in C++ Programming Courses with Automatic Program Style Assessment
- Author
-
Kirsti Ala-Mutka, Hannu-Matti Järvinen, and Toni Uimonen
- Subjects
lcsh:LC8-6691 ,General Computer Science ,Multimedia ,Coding conventions ,lcsh:Special aspects of education ,lcsh:T58.5-58.64 ,business.industry ,Computer science ,lcsh:Information technology ,media_common.quotation_subject ,Pascal (programming language) ,computer.software_genre ,Porting ,Education ,Programming style ,Content analysis ,Programming paradigm ,Software engineering ,business ,Programmer ,computer ,Intentional programming ,media_common ,computer.programming_language - Abstract
Introduction Programming style, i.e., the way to use a programming language and to write program code, is one of the important but too often neglected issues in programming. With habits of writing bad code, it is possible to make a program impossible for other programmers to understand. There may also be language-dependent coding requirements that need special attention for avoiding errors in program functionality. This is especially true for the C++ language, which has several pitfalls for the programmer, e.g., implicit type conversions and copy methods. By following good and systematic programming guidelines, it is possible to avoid these kinds of problems. Common guidelines and coding conventions are essential in real-life software projects, because the projects are usually carried out in groups, where several programmers should be able to work together effectively and pay attention to the quality of the software. For novice programmers, it is often difficult to understand the relevance of programming style guidelines while they try to concentrate on producing their first programs. As noticed by Schorsch (1995), students commonly perceive programming style as secondary, not integrated to program development. Hence, they often clean up a program's style last, or not at all. A small program can run correctly even if it is badly written. The problems of unsystematic or otherwise bad coding habits demonstrate themselves in bigger programs, or in a situation where a program should later be modified by another person or ported to another environment. For this reason, it is essential that students always get feedback on these issues in their work. Students should not be let to use bad or inconsistent coding habits freely, because these may be very difficult to change later. Programming courses generally contain lots of practical exercises: the issues to be learned do not become concrete for the student until tried in a program. The first programming courses aim at giving students the basic programming skills on which they can later build more advanced skills and knowledge. In these courses it is especially important to pay attention to programming style of, and to try to guide students to learn good programming habits. For achieving this goal, students should always be offered as good and thorough feedback on their work as possible. Unfortunately, assessing and giving feedback on these issues is time-consuming and difficult, even for experienced programmers and teachers. Large programming class sizes make the problem even worse. Often the only way to keep the level of required practical programming work and the amount of feedback offered for students on a reasonable level, is to develop software solutions to assist teachers in their tutoring and assessing tasks. This article is organized as follows. The "Background" section reviews existing approaches to programming style analysis and assessment tools for it. The third section briefly describes the principles for developing C++ coding guidelines and the assessment tool for them. The fourth section presents the context and practices for using the assessment tool in programming courses. The fifth section presents some feedback and experiences from taking the tool in use in our courses. In the evaluation section, we discuss and evaluate our approach according to a framework defined for evaluating educational technology. Finally, we conclude the paper with some prospects for future work. Background Good programming style has been debated many times in developing programming languages. Heated discussions have taken place about whether, and in what matter, style can be quantified. Research on measurable programming style definitions was very active in the 1980's, when some well-known basic models and assessment applications were created. [FIGURE 1 OMITTED] One of the early works was published by Rees (1982), who developed a STYLE system to assess programming style of Pascal programs. …
- Published
- 2004
43. The Roter interaction analysis system (RIAS): utility and flexibility for analysis of medical interactions
- Author
-
Susan Larson and Debra L. Roter
- Subjects
Predictive validity ,Coding conventions ,business.industry ,Communication Analysis ,Medicine ,General Medicine ,business ,Data science ,Social psychology ,Coding (social sciences) - Abstract
The Roter interaction analysis system (RIAS), a method for coding medical dialogue, is widely used in the US and Europe and has been applied to medical exchanges in Asia, Africa, and Latin America. Contributing to its rapid dissemination and adoption is the system’s ability to provide reasonable depth, sensitivity, and breadth while maintaining practicality, functional specificity, flexibility, reliability, and predictive validity to a variety of patient and provider outcomes. The purpose of this essay is two-fold. First, to broadly overview the RIAS and to present key capabilities and coding conventions, and secondly to address the extent to which the RIAS is consistent with, or complementary to, linguistic-based techniques of communication analysis.
- Published
- 2002
44. Learning natural coding conventions
- Author
-
Christian Bird, Earl T. Barr, Charles Sutton, and Miltiadis Allamanis
- Subjects
FOS: Computer and information sciences ,Source code ,Code review ,Coding conventions ,Computer science ,media_common.quotation_subject ,Object (computer science) ,computer.software_genre ,Syntax ,Code (semiotics) ,Readability ,Software Engineering (cs.SE) ,World Wide Web ,Computer Science - Software Engineering ,Software design pattern ,Programmer ,computer ,media_common ,Codebase - Abstract
Every programmer has a characteristic style, ranging from preferences about identifier naming to preferences about object relationships and design patterns. Coding conventions define a consistent syntactic style, fostering readability and hence maintainability. When collaborating, programmers strive to obey a project's coding conventions. However, one third of reviews of changes contain feedback about coding conventions, indicating that programmers do not always follow them and that project members care deeply about adherence. Unfortunately, programmers are often unaware of coding conventions because inferring them requires a global view, one that aggregates the many local decisions programmers make and identifies emergent consensus on style. We present NATURALIZE, a framework that learns the style of a codebase, and suggests revisions to improve stylistic consistency. NATURALIZE builds on recent work in applying statistical natural language processing to source code. We apply NATURALIZE to suggest natural identifier names and formatting conventions. We present four tools focused on ensuring natural code during development and release management, including code review. NATURALIZE achieves 94% accuracy in its top suggestions for identifier names and can even transfer knowledge about conventions across projects, leveraging a corpus of 10,968 open source projects. We used NATURALIZE to generate 18 patches for 5 open source projects: 14 were accepted.
- Published
- 2014
45. Understanding understanding source code with functional magnetic resonance imaging
- Author
-
André Brechmann, Thomas Leich, Sven Apel, Chris Parnin, Anja Bethmann, Gunter Saake, Christian Kästner, and Janet Siegmund
- Subjects
Source code ,Coding conventions ,Programming language ,Working memory ,Computer science ,Program comprehension ,media_common.quotation_subject ,Cognition ,Cognitive neuroscience ,computer.software_genre ,Human–computer interaction ,Syntax error ,Programmer ,computer ,media_common - Abstract
Program comprehension is an important cognitive process that inherently eludes direct measurement. Thus, researchers are struggling with providing suitable programming languages, tools, or coding conventions to support developers in their everyday work. In this paper, we explore whether functional magnetic resonance imaging (fMRI), which is well established in cognitive neuroscience, is feasible to soundly measure program comprehension. In a controlled experiment, we observed 17 participants inside an fMRI scanner while they were comprehending short source-code snippets, which we contrasted with locating syntax errors. We found a clear, distinct activation pattern of five brain regions, which are related to working memory, attention, and language processing---all processes that fit well to our understanding of program comprehension. Our results encourage us and, hopefully, other researchers to use fMRI in future studies to measure program comprehension and, in the long run, answer questions, such as: Can we predict whether someone will be an excellent programmer? How effective are new languages and tools for program understanding? How should we train programmers?
- Published
- 2014
46. ICD-10 as a Coding Tool in Psychiatry
- Author
-
Aleksandar Janca
- Subjects
Psychiatry and Mental health ,medicine.medical_specialty ,Coding conventions ,medicine ,ICD-10 ,Psychology ,Psychiatry ,Coding (social sciences) - Abstract
Objective: The aim of this paper is to provide an overview of the structure, construction principles and coding conventions of importance for the application of the Tenth Revision of the International Classification of Diseases (ICD-10) in the coding of psychiatric diagnoses. Method: The method involved an extensive review of the various versions of ICD-10 and ICD-10-related documents, instruments and literature. Results: ICD-10 differs from its predecessors in many ways including the introduction of a novel alphanumerical coding scheme; development of the concept of a “family” of disease and health-related classifications; inclusion of diagnostic guidelines and explicit criteria for mental disorders; production of the mental disorders chapter of the classification in different versions and for different types of users; and provision for incorporation of nationally relevant coding standards which facilitates preparation of national adaptations of the classification such as the Australian Modification of ICD-10 or ICD-10-AM. Conclusions: Relatively late and slow implementation of ICD-10 into psychiatric coding practices in Australia is mainly due to a lack of organised ICD-10 educational programs and activities at the national and State levels. In view of the fact that, as one of the Member States of the World Health Organization, Australia has an obligation to report mortality and morbidity statistics at an international level using the ICD-10, the development of comprehensive ICD-10 educational and training strategies for all types of mental health professionals dealing with coding and recording of psychiatric diagnoses should be seen as a priority in this country. APAIS, ASSIA, BIOSIS, Cambridge Scientific Abstracts, CancerLIT, CINAHL, Current Contents/Clinical Medicine and Social & Behavioral Sciences, EMBASE/Excerpta Medica, Health Services Abstracts, Index Medicus, Inpharma Weekly, International Nursing Index, MEDLINE, Psychiatric Rehabilitation Journal, Psychological Abstracts, PsycINFO, PsycLIT, Reactions Weekly, Research Alert, Science Citation Index, SciSearch, Social Planning/Policy & Development Abstracts, Social Sciences Citation Index, Social SciSearch, Sociological Abstracts, Year Book of Psychiatric and Applied Mental Health
- Published
- 2001
47. Development strategies for Pythia version 7. A new HEP event generator
- Author
-
Leif Lönnblad
- Subjects
Structure (mathematical logic) ,Physics ,Particle physics ,Development (topology) ,Software ,Coding conventions ,Hardware and Architecture ,business.industry ,General Physics and Astronomy ,Software engineering ,business - Abstract
This document describes the strategies for the development of the Pythia7 program. Both the internal and external structure of the program is discussed. Some comments on relationship to other software is given as well as some comments on coding conventions and other technical details.
- Published
- 1999
48. Using Meddra for Adverse Events in Cancer Trials: Experience, Caveats, and Advice
- Author
-
Tremmel, Lothar T. and Scarpone, Lynn
- Published
- 2001
- Full Text
- View/download PDF
49. Program comprehension and software reengineering
- Author
-
Gregor Snelting, Hausi A. Müller, and Thomas Reps
- Subjects
Coding conventions ,Computer science ,Programming language ,business.industry ,Program comprehension ,General Medicine ,computer.software_genre ,Variable (computer science) ,Disk formatting ,Software ,Programmer ,business ,Software architecture ,computer ,Pointer analysis - Abstract
Coping with Software Change Using Information Transparency William Griswold, University of California, San Diego Designs are frequently unsuccessful in designing for change using traditional modularity techniques. It is difficult to anticipate exactly how technology will advance, standards will arise, and features from competitor’s products will influence future features. Market pressures dictate that most time be invested in timely release of the current product, not in accomodating future changes. One way to increase the coverage of relevant design decisions is to use a design principle called information transparency : Write code such that all uses of an exposed design decision are easily visible to a programmer using available tools. The coding conventions include techniques like variable naming conventions, code formatting, and encoding software architecture into the program source. As a consequence of using such techniques, a programmer can use a searching tool like grep to view all the related objects together, creating locality out of similarity. Flow-Insensitive Pointer Analysis Susan Horwitz, University of Wisconsin, Madison Most static analysis rely on knowing what objects are used and defined at each point in the program. In a language with pointers, determining this information can be non-trivial. Lars Andersen defined a flow-insensitive algorithm for computing points-to information that is O(n) in the worst case. More recently, Bjarne Steensgard gave another flow-insensitive algorithm that is faster (essentially O(n)), but less precise (computes larger points-to sets than Andersen’s algorithm). In the talk, we first define a new points-to analysis algorithm that can be “tuned” to provide resolutions that fall all along the spectrum from Steensgard to Andersen (both in terms of runtime and precision). We then present
- Published
- 1998
50. Communicative orientation of language teaching observation scheme: Coding conventions & applications
- Author
-
David Block
- Subjects
Scheme (programming language) ,Linguistics and Language ,Coding conventions ,Computer science ,Mathematics education ,Language education ,Communicative language teaching ,Orientation (graph theory) ,computer ,Language and Linguistics ,Linguistics ,Education ,computer.programming_language - Published
- 1997
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.