Descriptor: "text encoding" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"text encoding"' showing total 227 results

Start Over Descriptor "text encoding"

227 results on '"text encoding"'

1. Text Encoding

Author: Jo, Taeho, Kacprzyk, Janusz, Series Editor, and Jo, Taeho
Published: 2024
Full Text: View/download PDF

2. A Knowledge Graph Completion Algorithm Based on the Fusion of Neighborhood Features and vBiLSTM Encoding for Network Security.

Author: Zhang, Wenbo, Wang, Mengxuan, Han, Guangjie, Feng, Yongxin, and Tan, Xiaobo
Subjects: KNOWLEDGE graphs, GRAPH algorithms, COMPUTER network security, INFORMATION networks, DECODERS & decoding, DEEP learning
Abstract: Knowledge graphs in the field of network security can integrate diverse, heterogeneous, and fragmented network security data, further explore the relationships between data, and provide support for deep analysis. Currently, there is sparse security information in the field of network security knowledge graphs. The limited information provided by traditional text encoding models leads to insufficient reasoning ability, greatly restricting the development of this field. Starting from text encoding, this paper first addresses the issue of the inadequate capabilities of traditional models using a deep learning model for assistance. It designs a vBiLSTM model based on a word2vec and BiLSTM combination to process network security texts. By utilizing word vector models to retain semantic information in entities and extract key features to input processed data into BiLSTM networks for extracting higher-level features that better capture and express their deeper meanings, this design significantly enhances understanding and expression capabilities toward complex semantics in long sentences before inputting final feature vectors into the KGC-N model. The KGC-N model uses feature vectors combined with graph structure information to fuse forward and reverse domain features and then utilizes a Transformer decoder to decode predictions and complete missing information within the network security knowledge map. Compared with other models using evaluation metrics such as MR, MRR demonstrates that employing our proposed method effectively improves performance on completion tasks and increases comprehension abilities toward complex relations, thereby enhancing accuracy and efficiency when completing knowledge graphs. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. A Review on Text Sentiment Analysis With Machine Learning and Deep Learning Techniques

Author: Yonatan Mamani-Coaquira and Edwin Villanueva
Subjects: Machine learning, deep learning, word embedding, text encoding, sentiment analysis, text classification, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Automating sentiment analysis in texts has become an important task in recent years due to the exponential growth of user-generated content, including comments and opinions on products and services. This represents a valuable opportunity for businesses to glean insights into customer sentiment and, in turn, to refine their offerings. Motivated by this, the machine learning field has witnessed a surge of innovation, with an introduction of models and tools being introduced to streamline sentiment analysis. This paper offers a thorough review of the recent advancements in machine learning and deep learning approaches for text sentiment analysis. We propose a novel framework for studying these models, distinguishing them by their structural intricacies. Additionally, we delve into the challenges, prospects, and emerging directions in research, as illuminated by our framework. Consequently, this paper equips researchers with a detailed panorama of the cutting-edge machine learning methodologies for dissecting text sentiment, easing the way for future explorations in this vibrant field.
Published: 2024
Full Text: View/download PDF

4. Reading novels in Einaudi: the case of Natalia Ginzburg. Reading opinions under the scrutiny of Digital Humanities.

Author: Laura Antonietti
Subjects: modelling, uml, database, editorial archives, reading reports, history of publishing, contemporary italian literature, natalia ginzburg, einaudi, text encoding, General Works, History of scholarship and learning. The humanities, AZ20-999
Abstract: This paper presents the results of the modelling and analysis of the reading process within the Italian publishing house Einaudi after the Second World War, with a special focus on Natalia Ginzburg (1916-1991). More specifically, the corpus of sources consists of the reading reports concerning contemporary narrative works, which represent a fundamental step in the decision-making process that leads to the publication of a work and therefore to the construction of the catalogue of a publishing house. The examination of Natalia Ginzburg’s reading reports, mostly unpublished, provides the basis for a critical reflection on her editorial activity, a reflection which is rooted in the belief that editorial writing represents a complex critical genre that deserves to be investigated by specific methods. The tools (in the present case UML, PostgreSQL, XML TEI) and methods of the Digital Humanities have made a major contribution to the realisation of the scientific objectives of the research work. On the one hand, they have made it possible to model, represent and interrogate the corpus of documents in a relevant and efficient way; on the other hand, they were fundamental and indispensable from a methodological, heuristic and interpretative point of view.
Published: 2023
Full Text: View/download PDF

5. Integrating Digital Editions and Methods for Text Editing and Analysis in Undergraduate Literary Studies.

Author: Stoyanova, Silvia
Subjects: *LITERARY criticism, *DIGITAL humanities, *ARTISTIC influence, *HOTEL suites, *UNDERGRADUATES, *DIGITAL technology, *ITALIAN literature
Abstract: This article evaluates the integration of digital editions, computational text analysis and digital scholarly editing in the context of an introductory undergraduate course on Italian literature and digital humanities taught at a US university. It offers specific examples of employing the apparatus of several digital platforms dedicated to the study of foundational authors in the Italian literary tradition (Dante, Petrarch, Boccaccio and Leopardi), and of gaining familiarity with a suite of digital tools for text analysis and editing, namely Voyant Tools, Recogito, Oxygen, Gephi, Transkribus Lite and OpenRefine. The discussion of digital project interfaces examines the student user experience of different design approaches, while the illustrations of tool exercises explore how these could support the close attention to a text and facilitate the navigation between its micro and macro frameworks of interpretation. The article furthermore suggests that digital text analysis could reinforce student appreciation of the signifying value of textual form and genre, and that the pedagogical method of digital text editing creates opportunities for situated learning. In conclusion, it argues that the academic work of students at the undergraduate level could be harnessed by the scaffolded methods of faculty-led digital research projects and contribute to the creation of public knowledge. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Handwritten stenography recognition and the LION dataset

Author: Heil, Raphaela and Nauwerck, Malin
Published: 2024
Full Text: View/download PDF

7. The Extended Digital Scholarly Edition of 'The Name of the Rose': modelling, workflow and the IDEA paradigm

Author: Christian D'Agata
Subjects: aiucd2022, digital scholarly edition, umberto eco, text encoding, euporia and domain specific language annotation, didactics and public humanities, computational literature studies, literary lexicography, digital literary hermeneutics, General Works, History of scholarship and learning. The humanities, AZ20-999
Abstract: The paper presents the Digital Scholarly Edition of Umberto Eco's The Name of the Rose characterized since its conception by the dialogue between Digital Philology and Computational Literary Criticism in a interdisciplinary perspective called 'IDEA' paradigm (Interpretation, Didactics, Edition, Annotation). By bringing together Authorial Philology, Digital Annotation and Literature Didactics, the IDEA paradigm aims to overcome the objective limitations of publishing contemporary literature content on the Web in open format. The contribution then describes the portal «The Variants of the Rose», a virtual place where to present the critical apparatus of variants encoded in XML-TEI (and visualized through EVT2), some didactic content elaborated with TRAViz, Storymap JS and Timeline JS, the annotation in a Domain Specific Language developed on Euporia and further critical insights elaborated from Franco Moretti's Distant reading and Giuseppe Savoca's Literary Lexicography. Subsequently, a reflection on the modelling of edition, the concept of Extended Edition and the project workflow is presented. Finally, a road map of future content is proposed with a view to the increasing integration of scientific research and the Public Humanities.
Published: 2023
Full Text: View/download PDF

8. Open Pedagogy and the Archives: Engaging Students in Public Digital Humanities

Author: Conatser, Trey, Schwan, Anne, editor, and Thomson, Tara, editor
Published: 2022
Full Text: View/download PDF

9. Investigation of Input Alphabets of End-to-End Lithuanian Text-to-Speech Synthesizer.

Author: KASPARAITIS, Pijus and ANTANAVIČIUS, Danielius
Subjects: SPEECH synthesis, AUTOMATIC speech recognition, LITHUANIAN language, NATURAL language processing, SPEECH, LITHUANIANS, JUDGMENT (Psychology)
Abstract: The present paper deals with choosing the input alphabet for the end-to-end synthesizer of the Lithuanian language. Tacotron 2 is a state-of-the-art end-to-end speech synthesis model. Characters, phonemes or their combinations can be used as an input of the model. The model was trained on Lithuanian speech recordings using the following five input alphabets: letters, lowercase letters, accented letters, reduced set of accented letters, letters with separate accent marks. Acceptability of the synthesized speech was evaluated on the basis of human listeners' subjective judgment. Experimental testing showed that accent marks significantly improved the quality of the synthesized speech. Reducing the size of the input alphabet also has a slight positive impact. Putting accent marks into the text produced the best results as compared to using the accented letters. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

10. Human face generation from textual description via style mapping and manipulation.

Author: Todmal, Shantanu, Mule, Ashish, Bhagwat, Devang, Hazra, Tanmoy, and Singh, Bhupendra
Subjects: HIGH resolution imaging, COMPUTER vision, GENERATIVE adversarial networks, CRIMINAL investigation, IMAGE retrieval
Abstract: Text-to-Face generation is an interesting and challenging task with great potential for diverse computer vision ap- plications in public safety domain. There has been very selective work in Text-to-Face synthesis than Text-to-Image due to diverse facial visual attributes and their corresponding descriptions. In this paper, we have proposed a Text-to-Face generative model that can produce high quality and high resolution images from a given textual description. The model is also able to produce a range of diverse images for a given description. In the proposed approach, the encoded text input is mapped to the generator to produce high quality output which is further manipulated to better reflect the described attributes. Apart from diversity (or in addition to diversity), the model is also able to significantly emphasize the facial attributes provided in the description. The applications of the proposed model include criminal investigation, character generation (video games, movies etc.), manipulating facial attributes according to brief textual description, text based style transfer, text based Image retrieval etc. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

11. WordCode using WordTrie

Author: R. Sakthi Murugan and V.S. Ananthanarayana
Subjects: Word encoding, Text encoding, WordCode, WordTrie, Electronic computers. Computer science, QA75.5-76.95
Abstract: Computers work with text data by assigning a code for each character, called encoding. Character-encoding techniques emerged in the late 1960s, and a similar type of technique is still used to encode text data. Computers can only understand alphabets, not words. In this article, we develop an approach that enables computers to understand words. We introduce a word-based encoding of text data named WordCode. WordCode encodes the most frequent set of characters (i.e., words) found in Internet directories with a dynamic code combination. Although some dictionary-encoding techniques have been proposed, we still tend to use character encoding, such as Unicode, to encode text data. Dictionary-encoding techniques have not been adopted due to the massive size of the code page and the complexity in accessing the code page. In this article, we introduce a customised trie named WordTrie to store words for faster encoding and decoding. We generate the code combination in such a way that the size of the WordCode for a word is always smaller than the total size of the character coding. Our experimental results from encoding text files from the Gutenberg corpus, Canterbury corpus, large corpus, Calgary corpus and Silesia corpus using WordCode show an up to 19.9% reduction in file size with respect to character-based encoding. This smaller file size means that less storage space is needed and results in faster processing and communication of text data.
Published: 2022
Full Text: View/download PDF

12. Chapter Hesperia, a Database for Palaeohispanic Languages; and AELAW, a Database for the Ancient European Languages and Writings. Challenges, Solutions, Prospects

Author: Estarán, María José, Beltrán, Francisco, Orduña, Eduardo, Gorrochategui, Joaquín, De Santis, Annamaria, and Rossi, Irene
Subjects: Ancient languages, data modelling, digital humanities, epigraphy, grapheme analysis, interoperability, lexicography, palaeography, scripts, text encoding, translation, writing systems, Palaeography, Writing systems, alphabets, Ancient, classical and medieval texts, History, History: theory and methods, Archaeological science, methodology and techniques
Abstract: This miscellaneous volume collects contributions on nineteen projects dealing with Digital Epigraphy – they are diversified in geographic and chronological context, for script and language, and for typology of digital output. The objective is to point out the methodological issues which are specific to the application of information technologies to epigraphy, with a focus on data modelling and text annotation, lexicography and interoperability.
Published: 2022
Full Text: View/download PDF

13. 生成对抗网络及其文本图像合成综述.

Author: 王威, 李玉洁, 郭富林, 刘岩, and 何俊霖
Subjects: GENERATIVE adversarial networks, DEEP learning, COMPUTER vision, RESEARCH & development, EVALUATION methodology
Abstract: Copyright of Journal of Computer Engineering & Applications is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2022
Full Text: View/download PDF

14. Appetite comes with eating. The never ending digital edition of the Codex Pelavicino

Author: Enrica Salvatori
Subjects: digital editions, evt, text encoding, xml/tei, digital history, General Works, History of scholarship and learning. The humanities, AZ20-999
Abstract: The paper presents the potential means of utilizing digital edition tools by retracing the process of the Codice Pelavicino digital edition. We will also explore the related issues of coding, output, easiness of use, relationship with users. The paper presents the reworking of a lecture presented at the webinar “Medieval archival sources in digital environment. The challenge of processing and visualizing semi-structured data ", organized on 22-23 June 2020 by the Digital Culture Laboratory (University of Pisa) and by the head of the Engineering Historical Memory project. The aim is to promote future collaboration and experimentation initiatives in the field of processing and extraction of information from semi-structured editions of archival documents.
Published: 2021
Full Text: View/download PDF

15. Introduction: 'Medieval archival sources in the digital world. The challenge of treating and visualizing semi-structured data'

Author: Enrica Salvatori
Subjects: xml/tei, text encoding, information retrieval, digital public humanities, open culture, digital edition, General Works, History of scholarship and learning. The humanities, AZ20-999
Abstract: The webinar “Medieval archival sources in the digital world. The challenge of treating and visualizing semi-structured data”, organized by the Digital Culture Laboratory (UNIPI) and by the Engineering Historical Memory project. This thematic dossier aims to offer a brief account of the results of the workshop, with particular attention to future collaborations and experimentations in the field of processing and extracting information from semi-structured editions of archival documents. These advanced features are essential to broaden the use of digital editions of archival sources and other written documents, making them more flexible and open to research tools. The central problem concerns the relationship between text coding and the creation of good interfaces, to which the user can accessintuitively.
Published: 2021
Full Text: View/download PDF

16. Exploring bridge maintenance knowledge graph by leveraging GrapshSAGE and text encoding.

Author: Gao, Yan, Xiong, Guanyu, Li, Haijiang, and Richards, Jarrod
Subjects: *LANGUAGE models, *BRIDGE maintenance & repair, *KNOWLEDGE graphs, *BRIDGE design & construction, *DATA mining
Abstract: Knowledge graphs (KGs) are crucial in documenting bridge maintenance expertise. However, existing KG schemas lack integration of bridge design and practical inspection insights. Meanwhile, traditional methods for node feature initialization, relying on meticulous manual encoding or word embeddings, are inadequate for real-world maintenance textual data. To address these challenges, this paper introduces a bridge maintenance-oriented KG (BMKG) schema and approaches for graph data mining, including node-layer classification and link prediction. These methods leverage large language model (LLM)-based text encoding combined with GraphSAGE, demonstrating excellent performance in semantic enrichment and KG completion on deficient BMKGs. Additionally, ablation studies reveal the superiority of the pre-trained BERT text encoder and the L2 distance pairwise scoring calculator. Furthermore, a practical implementation framework integrating these approaches is developed for routine bridge maintenance, which can facilitate various practical applications, such as maintenance planning, and has the potential to enhance the efficiency of engineers' documentation work. • This paper introduces a hierarchical knowledge graph schema based on real-world bridge maintenance reports. • A node-layer classification approach is proposed for semantic enrichment by employing GraphSAGE and LLM-based text encoding. • A link prediction approach is proposed for graph completion by leveraging GraphSAGE and contrastive learning. • An implementation framework integrating the above schema and approaches is designed for practical bridge maintenance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Teaching the Text Encoding Initiative: Context, Community and Collaboration

Author: Yasmin Faghihi, Matthew Holford, and Huw Jones
Subjects: text encoding, tei, pedagogy, xml, manuscripts, History of scholarship and learning. The humanities, AZ20-999, Language and Literature
Abstract: In common with many technical aspects of digital humanities, the TEI has a reputation for being difficult to teach and difficult to learn, with potential practitioners put off by the large and (at first sight) intimidating set of guidelines, the seemingly complex hierarchical structure and the profusion of angle brackets. One-to-one or small group teaching in the context of a specific project is often the preferred method, where the short but steep learning curve required to engage with the TEI can be addressed in a way which is relevant to the aims and experience of the learner. This, however, is not a particularly efficient way of teaching. In this article, the authors discuss their experience of teaching (and learning) the TEI, and how lessons learned in contexts relating to specific projects might feed into the teaching of TEI in a more general setting – the Digital Humanities at Oxford Summer School being the prime example.
Published: 2022
Full Text: View/download PDF

18. Text Encoding

Author: Jo, Taeho, Kacprzyk, Janusz, Series Editor, and Jo, Taeho
Published: 2019
Full Text: View/download PDF

19. Data Anonymization for Privacy Aware Machine Learning

Author: Jaidan, David Nizar, Carrere, Maxime, Chemli, Zakaria, Poisvert, Rémi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Nicosia, Giuseppe, editor, Pardalos, Panos, editor, Umeton, Renato, editor, Giuffrida, Giovanni, editor, and Sciacca, Vincenzo, editor
Published: 2019
Full Text: View/download PDF

20. Encoding Queer Erasure in Oscar Wilde’s 'The Picture of Dorian Gray'

Author: Filipa da Gama Calado
Subjects: Digital Humanities, Queer Studies, Textual Scholarship, Modernism, Text Encoding, TEI, History of scholarship and learning. The humanities, AZ20-999
Abstract: Literary and textual scholars have long speculated about Wilde’s intentions for revising the homoerotic content of his famous novel, The Picture of Dorian Gray (1891). More recently, electronic editing tools enable scholars to explore textual composition histories within a digital space. This project uses the Text Encoding Initiative (TEI) standard, an electronic editing tool that allows researchers to ‘mark up’, or tag, textual elements. Using the TEI, I mark up the first chapter of Wilde’s manuscript of Dorian Gray, which introduces the story’s three main characters, Basil Hallward, Lord Henry Wotten, and Dorian Gray. Drawing from debates in Textual Scholarship and Queer Historiography, I question how electronic editing with the TEI might register the ways that Wilde suppressed the homoeroticism between these three characters during his revision process. My work here pushes against what I identify as TEI’s main constraint, which is its limitation for handling data that is discrete, rather than smooth or ambiguous data, like the homoeroticism of this text. I conclude by proposing a TEI customization that marks Wilde’s revisions according to the four homoerotic themes of ‘intimacy’, ‘beauty’, ‘passion’ and ‘fatality’. As an experiment in ‘queer encoding’, this customization shows how strict data structures like the TEI might engage the fluidity and complexity of queerness in text.
Published: 2022
Full Text: View/download PDF

21. Encoding of Text Reuse in the Project Beta maṣāḥǝft

Author: Daria Elagina
Subjects: text encoding, digital edition, Ethiopian Studies, text reuse, narrative units, Computer software, QA76.75-76.765
Abstract: The paper discusses the phenomenon of text reuse in the manuscript tradition of Ethiopia and Eritrea and, by examining examples of different forms of text reuse, presents the possibilities of its encoding in TEI XML within the frame of the project Beta maṣāḥǝft. Text reuse, which consists in the implicit or explicit repetition of text, is attested in a variety of forms, including quotations, allusions, paraphrases, and cross-linguistic text reuse. The documentation of this practice contributes to the study of different aspects of the manuscript tradition, for example, to the impact and relative dating of texts, and history of their transmission. The variety of text reuse forms allows for different approaches to their encoding which are presented in this paper to illustrate the possibilities of text reuse markup within the project’s schema. Additionally, the paper discusses the concept of narrative units and its difference to the concept of text reuse, as well as the markup of a type of text reuse which involves the reference to external entities.
Published: 2022
Full Text: View/download PDF

22. WordCode using WordTrie.

Author: Sakthi Murugan, R. and Ananthanarayana, V.S.
Subjects: TEXT files, DATA transmission systems
Abstract: Computers work with text data by assigning a code for each character, called encoding. Character-encoding techniques emerged in the late 1960s, and a similar type of technique is still used to encode text data. Computers can only understand alphabets, not words. In this article, we develop an approach that enables computers to understand words. We introduce a word-based encoding of text data named WordCode. WordCode encodes the most frequent set of characters (i.e., words) found in Internet directories with a dynamic code combination. Although some dictionary-encoding techniques have been proposed, we still tend to use character encoding, such as Unicode, to encode text data. Dictionary-encoding techniques have not been adopted due to the massive size of the code page and the complexity in accessing the code page. In this article, we introduce a customised trie named WordTrie to store words for faster encoding and decoding. We generate the code combination in such a way that the size of the WordCode for a word is always smaller than the total size of the character coding. Our experimental results from encoding text files from the Gutenberg corpus, Canterbury corpus, large corpus, Calgary corpus and Silesia corpus using WordCode show an up to 19.9% reduction in file size with respect to character-based encoding. This smaller file size means that less storage space is needed and results in faster processing and communication of text data. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

23. Teaching the Text Encoding Initiative: Context, Community and Collaboration.

Author: FAGHIHI, YASMIN, HOLFORD, MATTHEW, and JONES, HUW
Subjects: DIGITAL humanities, LEARNING curve, EDUCATION, CULTURAL property, LINGUISTICS
Abstract: In common with many technical aspects of digital humanities, the TEI has a reputation for being difficult to teach and difficult to learn, with potential practitioners put off by the large and (at first sight) intimidating set of guidelines, the seemingly complex hierarchical structure and the profusion of angle brackets. One-to-one or small group teaching in the context of a specific project is often the preferred method, where the short but steep learning curve required to engage with the TEI can be addressed in a way which is relevant to the aims and experience of the learner. This, however, is not a particularly efficient way of teaching. In this article, the authors discuss their experience of teaching (and learning) the TEI, and how lessons learned in contexts relating to specific projects might feed into the teaching of TEI in a more general setting – the Digital Humanities at Oxford Summer School being the prime example. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

24. The concept of version in genetically oriented scholarly editing.

Author: Pereira, Elsa
Subjects: *SCHOOL orientation, *DIGITAL libraries, *TEXTUAL criticism, *EDITING, *DIGITAL humanities
Abstract: The idea of textual variation was notably rejected in the early days of critique génétique, but versions have been playing a prominent role in most editorial schools of genetic orientation. This article presents a systematic review of the literature to distinguish the main working definitions and editorial approaches to the notion of version, both in genetic analogue editions and digital archives based on text encoding and computer-assisted collation. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

25. Digital Texts in Practice

Author: Christian Wittern
Subjects: text encoding, XML, TEI, character encoding, Chinese texts, SGML, Computer software, QA76.75-76.765
Abstract: As a student of intellectual, religious, and cultural developments in areas of the Chinese cultural sphere, my initial motivation for engaging with digital texts thirty years ago was to open up the new possibilities that the digital medium offered to researchers, without losing any of the affordances of a traditional printed edition. This requirement includes use of texts for reading, translating, annotating, quoting, and publishing, thus integrating with the whole of the scholarly workflow. At that time theories of electronic texts started to appear and the Text Encoding Initiative had already begun to create a common text model and interchange specification, based mainly on European languages. For East Asian texts, things were much more complicated because of different and quickly evolving character encoding standards, different textual traditions and approaches to text editing, as well as different institutional embedding. In this paper, I will look back at these developments, first to recount some of the history, albeit from a strictly personal perspective, but also to take stock of the situation and consider where we are now, how we got there, and what remains to be done to realize the dream of the universal digital text, easily shared and annotated, but still tractable, verifiable, and authoritative.
Published: 2020
Full Text: View/download PDF

26. Digitizing the Old English Anonymous and Wulfstanian Homilies through the Electronic Corpus of Anonymous Homilies in Old English (ECHOE) Project.

Author: Rudolf, Winfried
Subjects: OLD English prose literature, SERMON (Literary form), PALEOGRAPHY, EDITING, DIGITIZATION
Abstract: This article first outlines the challenges involved in the editing of Old English anonymous and Wulfstanian homilies before introducing the Electronic Corpus of Anonymous Homilies in Old English (ECHOE) project. This new initiative at the University of Göttingen reverses the traditional collation of texts and instead celebrates the book-historical significance of every individual manuscript version, its textual and palaeographical idiosyncrasies, and its revisional layers up through c. 1200 AD. The project provides new forms of display to expose the complex interversional network of textual representations, and develops a range of digital tools to facilitate the identification and swift comparison of related passages. It includes digital facsimiles, palaeographical and rhetorical version profiles, and the Latin sources for each homily, creating opportunities for unprecedented research on the transmission, composition, variation, and performance of the fluid preaching text. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

27. ISTRAŽIVANJE POVIJESTI HRVATSKOGA JEZIKA U DIGITALNO DOBA.

Author: Horvat, Marijana
Subjects: *CROATIAN language, *HISTORICAL linguistics, *REFERENCE books, *BOOKSTORES, *LIBRARY materials, *DIGITAL media
Abstract: Croatian philology has a long tradition of publishing reference books in the field of historical linguistics; it is very important to continue and foster this tradition in the future, as well as to adapt it to the modern digital era. Recent large-scale digitization initiatives have focused heavily on historical texts. Many reasons have led to the increasing digitization of archival documents, library and museum material, and collections of old books and manuscripts stored in monasteries. One of the main reasons for the digitization of old texts is to protect the original document from possible damage resulting from inappropriate handling. The use of digital copies instead of original documents is necessary to protect and preserve old written texts, as they are the most reliable way to back up original documents. Another very important reason for digitizing old texts is to make them more accessible to scholars, experts, and the public. In this way, young people who prefer using digital media will become more easily acquainted with cultural heritage that would otherwise be inaccessible to them. The digitization model of old Croatian grammar books will be discussed, as the existing resources do not include grammar books from the pre-standard period of the Croatian language. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

28. Rosarium – a text encoding project curating popular writing on roses online

Author: Tryon, Julia Rachel
Published: 2017
Full Text: View/download PDF

29. Transcendental Data: Toward a Cultural History and Aesthetics of the New Encoded Discourse

Author: Liu, Alan
Subjects: digital humanities, text encoding, cultural studies, postindusrialism, scientific management, sublime, Studies in Human Society, Language, Communication and Culture, History and Archaeology, Literary Studies
Published: 2004

30. The Extended Digital Scholarly Edition of “The Name of the Rose”: modelling, workflow and the IDEA paradigm

Author: D'Agata, Christian and D'Agata, Christian
Abstract: The paper presents the Digital Scholarly Edition of Umberto Eco's The Name of the Rose characterized since its conception by the dialogue between Digital Philology and Computational Literary Criticism in a interdisciplinary perspective called 'IDEA' paradigm (Interpretation, Didactics, Edition, Annotation). By bringing together Authorial Philology, Digital Annotation and Literature Didactics, the IDEA paradigm aims to overcome the objective limitations of publishing contemporary literature content on the Web in open format. The contribution then describes the portal «The Variants of the Rose», a virtual place where to present the critical apparatus of variants encoded in XML-TEI (and visualized through EVT2), some didactic content elaborated with TRAViz, Storymap JS and Timeline JS, the annotation in a Domain Specific Language developed on Euporia and further critical insights elaborated from Franco Moretti's Distant reading and Giuseppe Savoca's Literary Lexicography. Subsequently, a reflection on the modelling of edition, the concept of Extended Edition and the project workflow is presented. Finally, a road map of future content is proposed with a view to the increasing integration of scientific research and the Public Humanities.
Published: 2023

31. Reading novels in Einaudi: the case of Natalia Ginzburg. Reading opinions under the scrutiny of Digital Humanities.

Author: Antonietti, Laura and Antonietti, Laura
Abstract: This paper presents the results of the modelling and analysis of the reading process within the Italian publishing house Einaudi after the Second World War, with a special focus on Natalia Ginzburg (1916-1991). More specifically, the corpus of sources consists of the reading reports concerning contemporary narrative works, which represent a fundamental step in the decision-making process that leads to the publication of a work and therefore to the construction of the catalogue of a publishing house. The examination of Natalia Ginzburg’s reading reports, mostly unpublished, provides the basis for a critical reflection on her editorial activity, a reflection which is rooted in the belief that editorial writing represents a complex critical genre that deserves to be investigated by specific methods. The tools (in the present case UML, PostgreSQL, XML TEI) and methods of the Digital Humanities have made a major contribution to the realisation of the scientific objectives of the research work. On the one hand, they have made it possible to model, represent and interrogate the corpus of documents in a relevant and efficient way; on the other hand, they were fundamental and indispensable from a methodological, heuristic and interpretative point of view.
Published: 2023

32. Biography as Compilation: How to Encode Georg Nikolaus Nissen’s Biographie W. A. Mozart’s (1828) in TEI P5

Author: Anja Morgenstern and Agnes Amminger
Subjects: text encoding, digital edition, source criticism, biography, Mozart, Computer software, QA76.75-76.765
Abstract: The project of editing the early Biographie W. A. Mozart’s (1828) by Georg Nikolaus Nissen (Nissen Online) began as part of the Digital Mozart-Edition (DME) at the Mozarteum Foundation Salzburg. The aim of the edition is to reveal the structure of the text by identifying the diverse sources Nissen relied on when writing the biography. These include primary sources such as original letters and documents from the Mozart family, secondary sources such as contemporary literature about Wolfgang Amadeus Mozart, and original text written by the author and later editors. Considering the challenges that arise when creating an edition that tries to define the different strands of a text, this paper describes how XML/TEI markup was applied to encode text passages which often do not correlate with common text structures (paragraphs, chapters); document different types of sources and their authors or editors; and integrate a detailed bibliography of the sources as well as critical annotations for each single text passage.
Published: 2020
Full Text: View/download PDF

33. Tagging Time and Space: TEI and the Canadian Stratford Festival Promptbooks

Author: Janelle Jenstad, Jennifer Roberts-Smith, Joseph Takeda, Liza Giffen, Mark Kaethler, Martin Holmes, and Toby Malone
Subjects: text encoding, TEI, promptbooks, dramatic texts, History of scholarship and learning. The humanities, AZ20-999, Electronic computers. Computer science, QA75.5-76.95
Abstract: This paper presents the first phase in the development of a new, TEI-based protocol for the encoding of promptbooks. Because the principal function of a promptbook is to record spatiotemporal events whose communicative importance supersedes that of the book in which they are recorded, current standards for digital encoding do not always apply. With the stage managerial artifacts of The. John Gray and the Canadian Stratford Festival Archives as case studies, we provide a rationale for exploring additions to the existing TEI guidelines to account for the unique characteristics of promptbooks. Cet article présente la phase initiale du développement d’un nouveau protocole de codage TEI adopté pour des livres rapides. Puisque la fonction principale d’un livre rapide est l’enregistrement des évènements spatio-temporels dont l’importance communicative l’emporte sur celle du livre dans lequel on les écrit, les normes de codage numérique ne s’applique pas toujours. Avec les artefacts du régisseur The. John Gray et des archives du festival de Stratford du Canada, nous fournissons une justification pour la recherche sur l’adjonction aux consignes de TEI qui rend compte des caractéristiques uniques des livres rapides. Mots-clés: codage de texte; TEI; livres rapides; textes littéraires
Published: 2019
Full Text: View/download PDF

34. The Rosarium Project : Building a digital collection on the genus Rosa using and the TEI

Author: Tryon, Julia Rachel
Published: 2016
Full Text: View/download PDF

35. Tagging Time and Space: TEI and the Canadian Stratford Festival Promptbooks.

Author: Roberts-Smith, Jennifer, Kaethler, Mark, Malone, Toby, Giffen, Liza, Holmes, Martin, Jenstad, Janelle, and Takeda, Joseph
Subjects: PROMPTBOOKS, TEXT Encoding Initiative (Document type definition), SPACETIME, DRAMA festivals
Abstract: Copyright of Digital Studies / Champ Numérique is the property of Open Library of Humanities and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2019
Full Text: View/download PDF

36. Qualitative Analysis of Residents′ Perceptions of Tourism Impacts on Historic Districts: A Case Study of Nanluoguxiang in Beijing, China

Author: Linlin Dai, Siyu Wang, Jun Xu, Li Wan, and Bihu Wu
Subjects: historic district, residents′ perceptions, word-frequency analysis, text encoding, nanluoguxiang, Architecture, NA1-9428, Building construction, TH1-9745
Abstract: Tourism is becoming a viable and important economic development strategy in the regeneration of historic districts. Nonetheless, tourism may bring negative impacts to the local communities. As a result, local residents′ perceptions and attitudes toward tourism development are critical to the sustainable development of tourism. This study follows a qualitative research approach, attempting to examine the relationship between local residents′ social-demographic features and their perceptions of tourism development. The framework is applied to the case of Nanluoguxiang in Beijing, China, which is a typical tourism destination benefitting from its traditional urban forms. Data from 24 in-depth interviewees are analyzed using word-frequency analysis through text encoding. The results reveal that the cultural perception of the residents promoted place attachment, which was associated with impact perception, and together, they determined behavioral demand. The stronger the cultural perception of the residents is and the stronger their place attachment is, the more the negative impact of tourism is perceived and the stronger their demand for cultural protection is. Long-term residents, those with occupations unrelated to tourism, and those who live adjacent to the tourism attractions perceived more negative impacts.
Published: 2017
Full Text: View/download PDF

37. Guide d'encodage DHARMA pour les éditions critiques

Author: Griffiths, Arlo, Janiak, Axelle, École française d'Extrême-Orient (EFEO), Centre Asie du Sud-Est (CASE), École des hautes études en sciences sociales (EHESS)-Institut National des Langues et Civilisations Orientales (Inalco)-Centre National de la Recherche Scientifique (CNRS), Centre d'études sud asiatiques et himalayennes (CESAH), École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS), ERC, EFEO - École française d'Extrême-Orient, CASE - Centre Asie du Sud-Est, CESAH - Centre d'études sud asiatiques et himalayennes, ERC-DHARMA, and European Project: 809994,EC:H2020:,Dharma(2019)
Subjects: Digital Humanities, [SHS.INFO]Humanities and Social Sciences/Library and information sciences, Text Encoding Initiative, Digital Editions, Text Encoding, XML-TEI, TEI, South and Southeast Asia, Guidelines, [SHS.HIST]Humanities and Social Sciences/History, [SHS]Humanities and Social Sciences
Abstract: The DHARMA Encoding Guide for Critical Editions is a set of guidelines for creating critical editions of premodern South and Southeast Asian texts written in Sanskrit and/or in vernacular languages heavily impacted by the Sanskrit tradition. Specifically, this Encoding Guide concerns digital editions in XML format compliant with the Text Encoding Initiative (TEI) standard. These guidelines have been developed in the context of the ERC-funded project DHARMA, but it is hoped that they will help to establish new standards in South and Southeast Asian philology also beyond the DHARMA project.
Published: 2023

38. ARABIC WORD PROCESSING.

Author: Becker, Joseph D.
Subjects: *ARABIC language -- Writing, *TEXT processing (Computer science), *COMPUTERIZED typesetting, *DATA entry, *ARABIC document writing, *ELECTRONIC data processing
Abstract: The article focuses on the process of automatically intermixing Arabic writing with text in European or other languages. The predominant language of the Middle East is Arabic. The societies of the Middle East are now aggressively modernizing themselves in the field of computers. Until recently, human interaction with computers in the languages of the Middle East has been hindered by the unusual properties of the native alphabets. Scores of companies are producing computer applications in the Middle East today. A design for Arabic word processing is probably best conceived in the context of Arabic desktop publishing. Automatic layout algorithms can be made remarkably competent; certainly 99.9 percent of all mixed directional text can be laid out correctly by the computer without any kind of directional indications from the typist. The technology of word processing in the Middle East is right now at the exciting phase where the peculiarities of the native scripts are yielding to sophisticated system design. Improved interaction with computers will surely bring economic and cultural gains to the societies of the Middle East, and such developments are bound to hold significance for all.
Published: 1987
Full Text: View/download PDF

39. Resolving the Durand Conundrum

Author: Lou Burnard
Subjects: text encoding, XML, ODD, schema design, Computer software, QA76.75-76.765
Abstract: This paper proposes a minor but significant modification to the TEI ODD language and explores some of its implications. Can we improve on the present compromise whereby TEI content models are expressed in RELAX NG? A very small set of additional elements would permit the ODD language to cut its ties with any existing schema language, and thus permit it to support exactly and only the subset or intersection of their facilities which makes sense in the TEI context. It would make the ODD language an integrated and independent whole rather than an uneasy hybrid, and pave the way for future developments in the management of structured text beyond the XML paradigm.
Published: 2017
Full Text: View/download PDF

40. Data Encoding for SDL in ITU-T Rec. Z.104

Author: Reed, Rick, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Amyot, Daniel, editor, and Williams, Alan W., editor
Published: 2005
Full Text: View/download PDF

41. An Explorative Study on Creating a General Purpose Text Encoding Applied to Multiple Tasks

Author: Aguirrezabal Zabaleta, Manex, Krog, Niels, Goldbæk, Yasmin Shekari, Aguirrezabal Zabaleta, Manex, Krog, Niels, and Goldbæk, Yasmin Shekari
Abstract: The aim of this thesis is to apply a general-purpose encoding of Danish newspaper articles to authorship attribution, newspaper attribution, and headline generation with the aim of assessing the scalability of the encoding across tasks. From a novel dataset of 800k Danish online news articles, we test manually and automatically extracted features separately, as well as a combination of the two. Literature on attributing newspapers and using a handcrafted feature encoding for headline generation is scarce, creating incentive for an investigation of these fields. We found that manual features and a combination of manual and automatic features scale well across classification tasks, with performances consistently above baseline. The results obtained from headline generation were inconclusive as the training time for the model exceeded the thesis deadline.
Published: 2022

42. Encoding sonic devices: what is it good for?

Author: Holmes, Martin David
Subjects: text encoding, rhyme, sonic devices, poetry
Abstract: Presentation at the 2022 Conference of the Text Encoding Initiative Consortium, 2022-09-14, Newcastle, UK.
Published: 2022
Full Text: View/download PDF

43. Adapting CETEIcean for static site building with React and Gatsby

Author: Viglianti, Raffaele
Subjects: Front End Development, Static Sites, Digital Publishing, Text Encoding
Abstract: The JavaScript library CETEIcean, written by Hugh Cayless and Raff Viglianti, relies on the DOM processing of web browsers and HTML5 Custom Elements to publish TEI documents as a component pluggable into any HTML structure. This makes it possible to publish and lightly transform TEI documents directly in the user’s browser, doing away with complex server-side infrastructure for TEI publishing. This lightweight approach to publishing can be valuable in a “text as data” context, where the focus of labor and algorithmic complexity may be more centered on corpus building and analysis as opposed to publication. However, CETEIcean provides a fairly bare-bones API for a fully-fledged TEI publishing solution and, without some additional considerations, TEI documents rendered with CETEIcean can be invisible to search engines. #TEI2022 - 127 This demonstration will showcase an adaptation of the CETEIcean algorithm as a plugin for the static site generator Gatsby, which relies on the popular framework React for building user interfaces (UI). The static site pages generated with Gatsby will contain embedded TEI data, making it visible to search engines. Two plugins will be shown: gatsby-transformer-ceteicean (https://www.gatsbyjs.com/plugins/gatsby-transformer-ceteicean/) prepares XML to be registered as HTML5 Custom Elements. It also allows users to apply custom transformations before and after processing if the TEI data requires it for publication (the demonstration will show an example related to addSpan elements). gatsby-theme-ceteicean (https://www.npmjs.com/package/gatsby-theme-ceteicean) implements HTML5 Custom Elements for XML publishing, particularly with TEI. It re-implements parts of CETEIcean excluding behaviors; instead, users can define React components to customize the behavior of specific TEI elements. This makes it possible to access powerful React functionalities such as state management for user interaction. The demonstration will show examples from the Scholarly Editing journal (https://scholarlyediting.org), which published TEI-based small-scale editions with these tools alongside other essay-like content.
Published: 2022
Full Text: View/download PDF

44. The Rosarium Project: A case of merging traditional reference librarian skills with digital humanities technology.

Author: Tryon, Julia R.
Subjects: *LIBRARIES, *DIGITAL humanities, *PUBLIC relations, *ACADEMIC libraries, *ONLINE databases
Abstract: The role of the reference librarian has changed considerably over the past thirty years. Today reference librarians spend as much time on public relations as on answering reference questions and more time solving log-in issues than on helping with research. Despite this, there is still a role for reference librarians to play using their research and curation skills. That role involves the digital humanities, particularly text encoding projects following the guidelines of the Text Encoding Initiative Consortium (TEI). One such TEI project is the Rosarium Project, which curates online popular materials about roses. [ABSTRACT FROM PUBLISHER]
Published: 2017
Full Text: View/download PDF

45. Materiality of TEI Encoding and Decoding: An Analysis of the Western European Union Archives on Armament Policy

Author: Florentina Armaselu, Verónica Martins, and Catherine Emma Jones
Subjects: text encoding, WEU and armament digital edition, text analysis, digital hermeneutics, materiality, Computer software, QA76.75-76.765
Abstract: By combining traditional historical enquiry with TEI XML encoding and decoding in a corpus analysis phase, the project aims at addressing research questions mainly related to the French and British positions on the topics of armament design and production and of armament control within the Western European Union (WEU) from 1954 to 1982. The paper focuses on the annotation of speakers (different countries and institutional representatives) and their discourse in a selection of institutional documents (minutes, notes, studies, memoranda) (encoding phase) and the identification of linguistic patterns on armament issues in their discourse, as well as the interpretation of results (decoding phase). From a larger perspective, the study considers the TEI encoding as adding to the original text a “material” layer that further supports both machine and human interpretation (decoding). In this sense, this study may move closer to the concept of “material hermeneutics,” by understanding code, and digital technology in general, as an instrument we can use in hermeneutic ways to produce knowledge.
Published: 2016
Full Text: View/download PDF

46. Bibliographic and textual studies and the personal library

Author: Wingate, Alexandra, Walsh, John A., Nurkkala, Caroline, Evans, Daniel, Mertka, Alyssa, and Christie, Jennifer
Subjects: book history, text encoding, citation analysis, text analysis, TEI, private libraries
Abstract: This paper will discuss the application of digital humanities methods to understanding private libraries in three contexts: a corpus of post-mortem library inventories from early modern Navarre, Spain, the library of Isaac Newton, and the library of Victorian poet Algernon Charles Swinburne. The particularities of each case facilitate different analysis techniques from the visualization and quantitative analysis of private libraries to comparative textual analysis of full-text corpora of an author’s library and the author’s own works. The number of viable techniques is affected by factors such as how much can be known about the library’s contents, how much is known about the owner, and the availability of quality OCRed texts of the books in the library from sources such as the HathiTrust Digital Library.
Published: 2022
Full Text: View/download PDF

47. Black DH and a Challenge in Document Data Modeling: Anna Julia Cooper's Responses to the Survey of Negro College Graduates

Author: Rong, Alice, Beshero-Bondar, Elisa, and Moody-Turner, Shirley
Subjects: manuscript paleography, Anna Julia Cooper, text encoding, TEI XML, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, black digital humanities, African American history
Abstract: In the 1930s, Dr. Anna Julia Cooper, renowned educator and foremother of Black feminist thought, responded to a Survey of Negro College Graduates. Cooper's response to one question about her "racial philosophy" exceeded the limits of the form and was later published as an essay, removed from its original context. In Fall 2020, Elisa Beshero-Bondar'sundergraduate class in Text Encoding worked on the challenge of modeling in TEI the entire survey, its typescript and the handwritten input, with guidance from researchers who digitized Cooper's collection of papers at Howard University (see https://dh.howard.edu/ajcooper). We hoped to represent how a distinctive manuscript was composed on a printed, circulated document, and found this interesting and complicated to model in TEI. We came up with a solution suitable for the semester project, which involved students' learning much code for the first time: TEI, XSLT to transform their TEI to HTML, and CSS highlighting to guide the reading experience. See https://alicer98.github.io/DIGIT-110-AJC-Survey/document.html for our coauthor's digital edition, designed purposefully to look like an interactive PDF form. For the TEI conference, we presented this edition's TEI data model and discussed with the TEI community some alternatives for modeling documents like this, in which the encounter with a primary source gains interest from describing and even documenting tensions between survey questions and respondent answers. We are curious about the possibility of representing a survey as a dialogue between print and manuscript hands. And we are curious about the notion of documenting by how much a survey respondent exceeds the spaces provided on a form to reply to a question. What is the best TEI modeling for a historically significant survey form, with a historically significant response?&nbsp
Published: 2022
Full Text: View/download PDF

48. Encoding Verse Texts

Author: Chisholm, David, Robey, David, Ide, Nancy, editor, and Véronis, Jean, editor
Published: 1995
Full Text: View/download PDF

49. The Encoding of Spoken Texts

Author: Johansson, Stig, Ide, Nancy, editor, and Véronis, Jean, editor
Published: 1995
Full Text: View/download PDF

50. Speaking with One Voice: Encoding Standards and the Prospects for an Integrated Approach to Computing in History

Author: Greenstein, Daniel, Burnard, Lou, Ide, Nancy, editor, and Véronis, Jean, editor
Published: 1995
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

227 results on '"text encoding"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources