Author: "Najafirad, Peyman" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Najafirad, Peyman"' showing total 105 results

Start Over Author "Najafirad, Peyman"

105 results on '"Najafirad, Peyman"'

1. Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries

Author: Manuel, Dylan, Islam, Nafis Tanveer, Khoury, Joseph, Nunez, Ana, Bou-Harb, Elias, and Najafirad, Peyman
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence
Abstract: Security experts reverse engineer (decompile) binary code to identify critical security vulnerabilities. The limited access to source code in vital systems - such as firmware, drivers, and proprietary software used in Critical Infrastructures (CI) - makes this analysis even more crucial on the binary level. Even with available source code, a semantic gap persists after compilation between the source and the binary code executed by the processor. This gap may hinder the detection of vulnerabilities in source code. That being said, current research on Large Language Models (LLMs) overlooks the significance of decompiled binaries in this area by focusing solely on source code. In this work, we are the first to empirically uncover the substantial semantic limitations of state-of-the-art LLMs when it comes to analyzing vulnerabilities in decompiled binaries, largely due to the absence of relevant datasets. To bridge the gap, we introduce DeBinVul, a novel decompiled binary code vulnerability dataset. Our dataset is multi-architecture and multi-optimization, focusing on C/C++ due to their wide usage in CI and association with numerous vulnerabilities. Specifically, we curate 150,872 samples of vulnerable and non-vulnerable decompiled binary code for the task of (i) identifying; (ii) classifying; (iii) describing vulnerabilities; and (iv) recovering function names in the domain of decompiled binaries. Subsequently, we fine-tune state-of-the-art LLMs using DeBinVul and report on a performance increase of 19%, 24%, and 21% in the capabilities of CodeLlama, Llama3, and CodeGen2 respectively, in detecting binary code vulnerabilities. Additionally, using DeBinVul, we report a high performance of 80-90% on the vulnerability classification task. Furthermore, we report improved performance in function name recovery and vulnerability description tasks.
Published: 2024

2. Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent

Author: Haji, Fatemeh, Bethany, Mazal, Tabar, Maryam, Chiang, Jason, Rios, Anthony, and Najafirad, Peyman
Subjects: Computer Science - Artificial Intelligence
Abstract: Multi-agent strategies have emerged as a promising approach to enhance the reasoning abilities of Large Language Models (LLMs) by assigning specialized roles in the problem-solving process. Concurrently, Tree of Thoughts (ToT) methods have shown potential in improving reasoning for complex question-answering tasks by exploring diverse reasoning paths. A critical limitation in multi-agent reasoning is the 'Reasoner' agent's shallow exploration of reasoning paths. While ToT strategies could help mitigate this problem, they may generate flawed reasoning branches, which could harm the trustworthiness of the final answer. To leverage the strengths of both multi-agent reasoning and ToT strategies, we introduce a novel approach combining ToT-based Reasoner agents with a Thought Validator agent. Multiple Reasoner agents operate in parallel, employing ToT to explore diverse reasoning paths. The Thought Validator then scrutinizes these paths, considering a Reasoner's conclusion only if its reasoning is valid. This method enables a more robust voting strategy by discarding faulty reasoning paths, enhancing the system's ability to tackle tasks requiring systematic and trustworthy reasoning. Our method demonstrates superior performance compared to existing techniques when evaluated on the GSM8K dataset, outperforming the standard ToT strategy by an average 5.6% across four LLMs. The code and related content can be found in: https://github.com/SecureAIAutonomyLab/MA-ToT
Published: 2024

3. Jailbreaking Large Language Models with Symbolic Mathematics

Author: Bethany, Emet, Bethany, Mazal, Flores, Juan Arturo Nolazco, Jha, Sumit Kumar, and Najafirad, Peyman
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Recent advancements in AI safety have led to increased efforts in training and red-teaming large language models (LLMs) to mitigate unsafe content generation. However, these safety mechanisms may not be comprehensive, leaving potential vulnerabilities unexplored. This paper introduces MathPrompt, a novel jailbreaking technique that exploits LLMs' advanced capabilities in symbolic mathematics to bypass their safety mechanisms. By encoding harmful natural language prompts into mathematical problems, we demonstrate a critical vulnerability in current AI safety measures. Our experiments across 13 state-of-the-art LLMs reveal an average attack success rate of 73.6\%, highlighting the inability of existing safety training mechanisms to generalize to mathematically encoded inputs. Analysis of embedding vectors shows a substantial semantic shift between original and encoded prompts, helping explain the attack's success. This work emphasizes the importance of a holistic approach to AI safety, calling for expanded red-teaming efforts to develop robust safeguards across all potential input types and their associated risks.
Published: 2024

4. AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing

Author: Nunez, Ana, Islam, Nafis Tanveer, Jha, Sumit Kumar, and Najafirad, Peyman
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence
Abstract: Recent advancements in automatic code generation using large language models (LLMs) have brought us closer to fully automated secure software development. However, existing approaches often rely on a single agent for code generation, which struggles to produce secure, vulnerability-free code. Traditional program synthesis with LLMs has primarily focused on functional correctness, often neglecting critical dynamic security implications that happen during runtime. To address these challenges, we propose AutoSafeCoder, a multi-agent framework that leverages LLM-driven agents for code generation, vulnerability analysis, and security enhancement through continuous collaboration. The framework consists of three agents: a Coding Agent responsible for code generation, a Static Analyzer Agent identifying vulnerabilities, and a Fuzzing Agent performing dynamic testing using a mutation-based fuzzing approach to detect runtime errors. Our contribution focuses on ensuring the safety of multi-agent code generation by integrating dynamic and static testing in an iterative process during code generation by LLM that improves security. Experiments using the SecurityEval dataset demonstrate a 13% reduction in code vulnerabilities compared to baseline LLMs, with no compromise in functionality., Comment: Accepted to NeurIPS 2024 Workshop on Safe & Trustworthy Agents
Published: 2024

5. Enhancing Source Code Security with LLMs: Demystifying The Challenges and Generating Reliable Repairs

Author: Islam, Nafis Tanveer, Khoury, Joseph, Seong, Andrew, Bou-Harb, Elias, and Najafirad, Peyman
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence
Abstract: With the recent unprecedented advancements in Artificial Intelligence (AI) computing, progress in Large Language Models (LLMs) is accelerating rapidly, presenting challenges in establishing clear guidelines, particularly in the field of security. That being said, we thoroughly identify and describe three main technical challenges in the security and software engineering literature that spans the entire LLM workflow, namely; \textbf{\textit{(i)}} Data Collection and Labeling; \textbf{\textit{(ii)}} System Design and Learning; and \textbf{\textit{(iii)}} Performance Evaluation. Building upon these challenges, this paper introduces \texttt{SecRepair}, an instruction-based LLM system designed to reliably \textit{identify}, \textit{describe}, and automatically \textit{repair} vulnerable source code. Our system is accompanied by a list of actionable guides on \textbf{\textit{(i)}} Data Preparation and Augmentation Techniques; \textbf{\textit{(ii)}} Selecting and Adapting state-of-the-art LLM Models; \textbf{\textit{(iii)}} Evaluation Procedures. \texttt{SecRepair} uses a reinforcement learning-based fine-tuning with a semantic reward that caters to the functionality and security aspects of the generated code. Our empirical analysis shows that \texttt{SecRepair} achieves a \textit{12}\% improvement in security code repair compared to other LLMs when trained using reinforcement learning. Furthermore, we demonstrate the capabilities of \texttt{SecRepair} in generating reliable, functional, and compilable security code repairs against real-world test cases using automated evaluation metrics.
Published: 2024

6. Enhancing Event Reasoning in Large Language Models through Instruction Fine-Tuning with Semantic Causal Graphs

Author: Bethany, Mazal, Bethany, Emet, Wherry, Brandon, Chiang, Cho-Yu, Vishwamitra, Nishant, Rios, Anthony, and Najafirad, Peyman
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Event detection and text reasoning have become critical applications across various domains. While LLMs have recently demonstrated impressive progress in reasoning abilities, they often struggle with event detection, particularly due to the absence of training methods that consider causal relationships between event triggers and types. To address this challenge, we propose a novel approach for instruction fine-tuning LLMs for event detection. Our method introduces Semantic Causal Graphs (SCGs) to capture both causal relationships and contextual information within text. Building off of SCGs, we propose SCG Instructions for fine-tuning LLMs by focusing on event triggers and their relationships to event types, and employ Low-Rank Adaptation (LoRA) to help preserve the general reasoning abilities of LLMs. Our evaluations demonstrate that training LLMs with SCG Instructions outperforms standard instruction fine-tuning by an average of 35.69\% on Event Trigger Classification. Notably, our fine-tuned Mistral 7B model also outperforms GPT-4 on key event detection metrics by an average of 31.01\% on Event Trigger Identification, 37.40\% on Event Trigger Classification, and 16.43\% on Event Classification. We analyze the retention of general capabilities, observing only a minimal average drop of 2.03 points across six benchmarks. This comprehensive study investigates multiple LLMs for the event detection task across various datasets, prompting strategies, and training approaches.
Published: 2024

7. Unintentional Security Flaws in Code: Automated Defense via Root Cause Analysis

Author: Islam, Nafis Tanveer, Bethany, Mazal, Manuel, Dylan, Jadliwala, Murtuza, and Najafirad, Peyman
Subjects: Computer Science - Software Engineering, Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: Software security remains a critical concern, particularly as junior developers, often lacking comprehensive knowledge of security practices, contribute to codebases. While there are tools to help developers proactively write secure code, their actual effectiveness in helping developers fix their vulnerable code remains largely unmeasured. Moreover, these approaches typically focus on classifying and localizing vulnerabilities without highlighting the specific code segments that are the root cause of the issues, a crucial aspect for developers seeking to fix their vulnerable code. To address these challenges, we conducted a comprehensive study evaluating the efficacy of existing methods in helping junior developers secure their code. Our findings across five types of security vulnerabilities revealed that current tools enabled developers to secure only 36.2\% of vulnerable code. Questionnaire results from these participants further indicated that not knowing the code that was the root cause of the vulnerability was one of their primary challenges in repairing the vulnerable code. Informed by these insights, we developed an automated vulnerability root cause (RC) toolkit called T5-RCGCN, that combines T5 language model embeddings with a graph convolutional network (GCN) for vulnerability classification and localization. Additionally, we integrated DeepLiftSHAP to identify the code segments that were the root cause of the vulnerability. We tested T5-RCGCN with 56 junior developers across three datasets, showing a 28.9\% improvement in code security compared to previous methods. Developers using the tool also gained a deeper understanding of vulnerability root causes, resulting in a 17.0\% improvement in their ability to secure code independently. These results demonstrate the tool's potential for both immediate security enhancement and long-term developer skill growth.
Published: 2024

8. Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?

Author: Karkevandi, Mohammad Bahrami, Vishwamitra, Nishant, and Najafirad, Peyman
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in natural language tasks, but their safety and morality remain contentious due to their training on internet text corpora. To address these concerns, alignment techniques have been developed to improve the public usability and safety of LLMs. Yet, the potential for generating harmful content through these models seems to persist. This paper explores the concept of jailbreaking LLMs-reversing their alignment through adversarial triggers. Previous methods, such as soft embedding prompts, manually crafted prompts, and gradient-based automatic prompts, have had limited success on black-box models due to their requirements for model access and for producing a low variety of manually crafted prompts, making them susceptible to being blocked. This paper introduces a novel approach using reinforcement learning to optimize adversarial triggers, requiring only inference API access to the target model and a small surrogate model. Our method, which leverages a BERTScore-based reward function, enhances the transferability and effectiveness of adversarial triggers on new black-box models. We demonstrate that this approach improves the performance of adversarial triggers on a previously untested language model., Comment: Accepted to AI4CYBER - KDD 2024
Published: 2024

9. Estimating Earthquake Magnitude in Sentinel-1 Imagery via Ranking

Author: Cambrin, Daniele Rege, Corley, Isaac, Garza, Paolo, and Najafirad, Peyman
Subjects: Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Earthquakes are commonly estimated using physical seismic stations, however, due to the installation requirements and costs of these stations, global coverage quickly becomes impractical. An efficient and lower-cost alternative is to develop machine learning models to globally monitor earth observation data to pinpoint regions impacted by these natural disasters. However, due to the small amount of historically recorded earthquakes, this becomes a low-data regime problem requiring algorithmic improvements to achieve peak performance when learning to regress earthquake magnitude. In this paper, we propose to pose the estimation of earthquake magnitudes as a metric-learning problem, training models to not only estimate earthquake magnitude from Sentinel-1 satellite imagery but to additionally rank pairwise samples. Our experiments show at max a 30%+ improvement in MAE over prior regression-only based methods, particularly transformer-based architectures., Comment: Accepted to ECML-PKDD 2024 MACLEAN Workshop
Published: 2024

10. On the Consistency of Fairness Measurement Methods for Regression Tasks

Author: Almajed, Abdalwahab, Tabar, Maryam, and Najafirad, Peyman
Subjects: Computer Science - Machine Learning
Abstract: With growing applications of Machine Learning (ML) techniques in the real world, it is highly important to ensure that these models work in an equitable manner. One main step in ensuring fairness is to effectively measure fairness, and to this end, various metrics have been proposed in the past literature. While the computation of those metrics are straightforward in the classification set-up, it is computationally intractable in the regression domain. To address the challenge of computational intractability, past literature proposed various methods to approximate such metrics. However, they did not verify the extent to which the output of such approximation algorithms are consistent with each other. To fill this gap, this paper comprehensively studies the consistency of the output of various fairness measurement methods through conducting an extensive set of experiments on various regression tasks. As a result, it finds that while some fairness measurement approaches show strong consistency across various regression tasks, certain methods show a relatively poor consistency in certain regression tasks. This, in turn, calls for a more principled approach for measuring fairness in the regression domain., Comment: Accepted at ijcai24 - The Trustworthy AI Workshop
Published: 2024

11. Summarizing Complex Graphical Models of Multiple Chronic Conditions Using the Second Eigenvalue of Graph Laplacian: Algorithm Development and Validation

Author: Faruqui, Syed Hasib Akhter, Alaeddini, Adel, Chang, Mike C, Shirinkam, Sara, Jaramillo, Carlos, NajafiRad, Peyman, Wang, Jing, and Pugh, Mary Jo
Subjects: Computer applications to medicine. Medical informatics, R858-859.7
Abstract: BackgroundIt is important but challenging to understand the interactions of multiple chronic conditions (MCC) and how they develop over time in patients and populations. Clinical data on MCC can now be represented using graphical models to study their interaction and identify the path toward the development of MCC. However, the current graphical models representing MCC are often complex and difficult to analyze. Therefore, it is necessary to develop improved methods for generating these models. ObjectiveThis study aimed to summarize the complex graphical models of MCC interactions to improve comprehension and aid analysis. MethodsWe examined the emergence of 5 chronic medical conditions (ie, traumatic brain injury [TBI], posttraumatic stress disorder [PTSD], depression [Depr], substance abuse [SuAb], and back pain [BaPa]) over 5 years among 257,633 veteran patients. We developed 3 algorithms that utilize the second eigenvalue of the graph Laplacian to summarize the complex graphical models of MCC by removing less significant edges. The first algorithm learns a sparse probabilistic graphical model of MCC interactions directly from the data. The second algorithm summarizes an existing probabilistic graphical model of MCC interactions when a supporting data set is available. The third algorithm, which is a variation of the second algorithm, summarizes the existing graphical model of MCC interactions with no supporting data. Finally, we examined the coappearance of the 100 most common terms in the literature of MCC to validate the performance of the proposed model. ResultsThe proposed summarization algorithms demonstrate considerable performance in extracting major connections among MCC without reducing the predictive accuracy of the resulting graphical models. For the model learned directly from the data, the area under the curve (AUC) performance for predicting TBI, PTSD, BaPa, SuAb, and Depr, respectively, during the next 4 years is as follows—year 2: 79.91%, 84.04%, 78.83%, 82.50%, and 81.47%; year 3: 76.23%, 80.61%, 73.51%, 79.84%, and 77.13%; year 4: 72.38%, 78.22%, 72.96%, 77.92%, and 72.65%; and year 5: 69.51%, 76.15%, 73.04%, 76.72%, and 69.99%, respectively. This demonstrates an overall 12.07% increase in the cumulative sum of AUC in comparison with the classic multilevel temporal Bayesian network. ConclusionsUsing graph summarization can improve the interpretability and the predictive power of the complex graphical models of MCC.
Published: 2020
Full Text: View/download PDF

12. Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content Counterfactually

Author: Bethany, Mazal, Wherry, Brandon, Vishwamitra, Nishant, and Najafirad, Peyman
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Social media platforms are being increasingly used by malicious actors to share unsafe content, such as images depicting sexual activity, cyberbullying, and self-harm. Consequently, major platforms use artificial intelligence (AI) and human moderation to obfuscate such images to make them safer. Two critical needs for obfuscating unsafe images is that an accurate rationale for obfuscating image regions must be provided, and the sensitive regions should be obfuscated (\textit{e.g.} blurring) for users' safety. This process involves addressing two key problems: (1) the reason for obfuscating unsafe images demands the platform to provide an accurate rationale that must be grounded in unsafe image-specific attributes, and (2) the unsafe regions in the image must be minimally obfuscated while still depicting the safe regions. In this work, we address these key issues by first performing visual reasoning by designing a visual reasoning model (VLM) conditioned on pre-trained unsafe image classifiers to provide an accurate rationale grounded in unsafe image attributes, and then proposing a counterfactual explanation algorithm that minimally identifies and obfuscates unsafe regions for safe viewing, by first utilizing an unsafe image classifier attribution matrix to guide segmentation for a more optimal subregion segmentation followed by an informed greedy search to determine the minimum number of subregions required to modify the classifier's output based on attribution score. Extensive experiments on uncurated data from social networks emphasize the efficacy of our proposed method. We make our code available at: https://github.com/SecureAIAutonomyLab/ConditionalVLM
Published: 2024

13. Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text

Author: Bethany, Mazal, Wherry, Brandon, Bethany, Emet, Vishwamitra, Nishant, Rios, Anthony, and Najafirad, Peyman
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: With the recent proliferation of Large Language Models (LLMs), there has been an increasing demand for tools to detect machine-generated text. The effective detection of machine-generated text face two pertinent problems: First, they are severely limited in generalizing against real-world scenarios, where machine-generated text is produced by a variety of generators, including but not limited to GPT-4 and Dolly, and spans diverse domains, ranging from academic manuscripts to social media posts. Second, existing detection methodologies treat texts produced by LLMs through a restrictive binary classification lens, neglecting the nuanced diversity of artifacts generated by different LLMs. In this work, we undertake a systematic study on the detection of machine-generated text in real-world scenarios. We first study the effectiveness of state-of-the-art approaches and find that they are severely limited against text produced by diverse generators and domains in the real world. Furthermore, t-SNE visualizations of the embeddings from a pretrained LLM's encoder show that they cannot reliably distinguish between human and machine-generated text. Based on our findings, we introduce a novel system, T5LLMCipher, for detecting machine-generated text using a pretrained T5 encoder combined with LLM embedding sub-clustering to address the text produced by diverse generators and domains in the real world. We evaluate our approach across 9 machine-generated text systems and 9 domains and find that our approach provides state-of-the-art generalization ability, with an average increase in F1 score on machine-generated text of 19.6\% on unseen generators and domains compared to the top performing existing approaches and correctly attributes the generator of text with an accuracy of 93.6\%.
Published: 2024

14. Code Security Vulnerability Repair Using Reinforcement Learning with Large Language Models

Author: Islam, Nafis Tanveer, Karkevandi, Mohammad Bahrami, and Najafirad, Peyman
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Software Engineering
Abstract: With the recent advancement of Large Language Models (LLMs), generating functionally correct code has become less complicated for a wide array of developers. While using LLMs has sped up the functional development process, it poses a heavy risk to code security. Code generation with proper security measures using LLM is a significantly more challenging task than functional code generation. Security measures may include adding a pair of lines of code with the original code, consisting of null pointer checking or prepared statements for SQL injection prevention. Currently, available code repair LLMs generate code repair by supervised fine-tuning, where the model looks at cross-entropy loss. However, the original and repaired codes are mostly similar in functionality and syntactically, except for a few (1-2) lines, which act as security measures. This imbalance between the lines needed for security measures and the functional code enforces the supervised fine-tuned model to prioritize generating functional code without adding proper security measures, which also benefits the model by resulting in minimal loss. Therefore, in this work, for security hardening and strengthening of generated code from LLMs, we propose a reinforcement learning-based method for program-specific repair with the combination of semantic and syntactic reward mechanisms that focus heavily on adding security and functional measures in the code, respectively.
Published: 2024

15. Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imagery

Author: Robinson, Caleb, Corley, Isaac, Ortiz, Anthony, Dodhia, Rahul, Ferres, Juan M. Lavista, and Najafirad, Peyman
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Fully understanding a complex high-resolution satellite or aerial imagery scene often requires spatial reasoning over a broad relevant context. The human object recognition system is able to understand object in a scene over a long-range relevant context. For example, if a human observes an aerial scene that shows sections of road broken up by tree canopy, then they will be unlikely to conclude that the road has actually been broken up into disjoint pieces by trees and instead think that the canopy of nearby trees is occluding the road. However, there is limited research being conducted to understand long-range context understanding of modern machine learning models. In this work we propose a road segmentation benchmark dataset, Chesapeake Roads Spatial Context (RSC), for evaluating the spatial long-range context understanding of geospatial machine learning models and show how commonly used semantic segmentation models can fail at this task. For example, we show that a U-Net trained to segment roads from background in aerial imagery achieves an 84% recall on unoccluded roads, but just 63.5% recall on roads covered by tree canopy despite being trained to model both the same way. We further analyze how the performance of models changes as the relevant context for a decision (unoccluded roads in our case) varies in distance. We release the code to reproduce our experiments and dataset of imagery and masks to encourage future research in this direction -- https://github.com/isaaccorley/ChesapeakeRSC., Comment: In submission to IGARSS 2024
Published: 2024

16. LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward

Author: Islam, Nafis Tanveer, Khoury, Joseph, Seong, Andrew, Karkevandi, Mohammad Bahrami, Parra, Gonzalo De La Torre, Bou-Harb, Elias, and Najafirad, Peyman
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence
Abstract: In software development, the predominant emphasis on functionality often supersedes security concerns, a trend gaining momentum with AI-driven automation tools like GitHub Copilot. These tools significantly improve developers' efficiency in functional code development. Nevertheless, it remains a notable concern that such tools are also responsible for creating insecure code, predominantly because of pre-training on publicly available repositories with vulnerable code. Moreover, developers are called the "weakest link in the chain" since they have very minimal knowledge of code security. Although existing solutions provide a reasonable solution to vulnerable code, they must adequately describe and educate the developers on code security to ensure that the security issues are not repeated. Therefore we introduce a multipurpose code vulnerability analysis system \texttt{SecRepair}, powered by a large language model, CodeGen2 assisting the developer in identifying and generating fixed code along with a complete description of the vulnerability with a code comment. Our innovative methodology uses a reinforcement learning paradigm to generate code comments augmented by a semantic reward mechanism. Inspired by how humans fix code issues, we propose an instruction-based dataset suitable for vulnerability analysis with LLMs. We further identify zero-day and N-day vulnerabilities in 6 Open Source IoT Operating Systems on GitHub. Our findings underscore that incorporating reinforcement learning coupled with semantic reward augments our model's performance, thereby fortifying its capacity to address code vulnerabilities with improved efficacy.
Published: 2024

17. Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters

Author: Corley, Isaac, Robinson, Caleb, Dodhia, Rahul, Ferres, Juan M. Lavista, and Najafirad, Peyman
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Research in self-supervised learning (SSL) with natural images has progressed rapidly in recent years and is now increasingly being applied to and benchmarked with datasets containing remotely sensed imagery. A common benchmark case is to evaluate SSL pre-trained model embeddings on datasets of remotely sensed imagery with small patch sizes, e.g., 32x32 pixels, whereas standard SSL pre-training takes place with larger patch sizes, e.g., 224x224. Furthermore, pre-training methods tend to use different image normalization preprocessing steps depending on the dataset. In this paper, we show, across seven satellite and aerial imagery datasets of varying resolution, that by simply following the preprocessing steps used in pre-training (precisely, image sizing and normalization methods), one can achieve significant performance improvements when evaluating the extracted features on downstream tasks -- an important detail overlooked in previous work in this space. We show that by following these steps, ImageNet pre-training remains a competitive baseline for satellite imagery based transfer learning tasks -- for example we find that these steps give +32.28 to overall accuracy on the So2Sat random split dataset and +11.16 on the EuroSAT dataset. Finally, we report comprehensive benchmark results with a variety of simple baseline methods for each of the seven datasets, forming an initial benchmark suite for remote sensing imagery.
Published: 2023

18. ZRG: A Dataset for Multimodal 3D Residential Rooftop Understanding

Author: Corley, Isaac, Lwowski, Jonathan, and Najafirad, Peyman
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: A crucial part of any home is the roof over our heads to protect us from the elements. In this paper we present the Zeitview Rooftop Geometry (ZRG) dataset for residential rooftop understanding. ZRG is a large-scale residential rooftop dataset of over 20k properties collected through roof inspections from across the U.S. and contains multiple modalities including high resolution aerial orthomosaics, digital surface models (DSM), colored point clouds, and 3D roof wireframe annotations. We provide an in-depth analysis and perform several experimental baselines including roof outline extraction, monocular height estimation, and planar roof structure extraction, to illustrate a few of the numerous potential applications unlocked by this dataset., Comment: 9 pages, Preprint accepted to WACV 2024
Published: 2023

19. Single-View Height Estimation with Conditional Diffusion Probabilistic Models

Author: Corley, Isaac and Najafirad, Peyman
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing
Abstract: Digital Surface Models (DSM) offer a wealth of height information for understanding the Earth's surface as well as monitoring the existence or change in natural and man-made structures. Classical height estimation requires multi-view geospatial imagery or LiDAR point clouds which can be expensive to acquire. Single-view height estimation using neural network based models shows promise however it can struggle with reconstructing high resolution features. The latest advancements in diffusion models for high resolution image synthesis and editing have yet to be utilized for remote sensing imagery, particularly height estimation. Our approach involves training a generative diffusion model to learn the joint distribution of optical and DSM images across both domains as a Markov chain. This is accomplished by minimizing a denoising score matching objective while being conditioned on the source image to generate realistic high resolution 3D surfaces. In this paper we experiment with conditional denoising diffusion probabilistic models (DDPM) for height estimation from a single remotely sensed image and show promising results on the Vaihingen benchmark dataset.
Published: 2023

20. An Unbiased Transformer Source Code Learning with Semantic Vulnerability Graph

Author: Islam, Nafis Tanveer, Parra, Gonzalo De La Torre, Manuel, Dylan, Bou-Harb, Elias, and Najafirad, Peyman
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Over the years, open-source software systems have become prey to threat actors. Even as open-source communities act quickly to patch the breach, code vulnerability screening should be an integral part of agile software development from the beginning. Unfortunately, current vulnerability screening techniques are ineffective at identifying novel vulnerabilities or providing developers with code vulnerability and classification. Furthermore, the datasets used for vulnerability learning often exhibit distribution shifts from the real-world testing distribution due to novel attack strategies deployed by adversaries and as a result, the machine learning model's performance may be hindered or biased. To address these issues, we propose a joint interpolated multitasked unbiased vulnerability classifier comprising a transformer "RoBERTa" and graph convolution neural network (GCN). We present a training process utilizing a semantic vulnerability graph (SVG) representation from source code, created by integrating edges from a sequential flow, control flow, and data flow, as well as a novel flow dubbed Poacher Flow (PF). Poacher flow edges reduce the gap between dynamic and static program analysis and handle complex long-range dependencies. Moreover, our approach reduces biases of classifiers regarding unbalanced datasets by integrating Focal Loss objective function along with SVG. Remarkably, experimental results show that our classifier outperforms state-of-the-art results on vulnerability detection with fewer false negatives and false positives. After testing our model across multiple datasets, it shows an improvement of at least 2.41% and 18.75% in the best-case scenario. Evaluations using N-day program samples demonstrate that our proposed approach achieves a 93% accuracy and was able to detect 4, zero-day vulnerabilities from popular GitHub repositories.
Published: 2023
Full Text: View/download PDF

21. Supervising Remote Sensing Change Detection Models with 3D Surface Semantics

Author: Corley, Isaac and Najafirad, Peyman
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Remote sensing change detection, identifying changes between scenes of the same location, is an active area of research with a broad range of applications. Recent advances in multimodal self-supervised pretraining have resulted in state-of-the-art methods which surpass vision models trained solely on optical imagery. In the remote sensing field, there is a wealth of overlapping 2D and 3D modalities which can be exploited to supervise representation learning in vision models. In this paper we propose Contrastive Surface-Image Pretraining (CSIP) for joint learning using optical RGB and above ground level (AGL) map pairs. We then evaluate these pretrained models on several building segmentation and change detection datasets to show that our method does, in fact, extract features relevant to downstream applications where natural and artificial surface information is relevant.
Published: 2022

22. The Case for Integrated Advanced Technology in Applied Behavior Analysis

Author: Neely, Leslie, Carnett, Amarie, Quarles, John, MacNaul, Hannah, Park, Se-Woong, Oyama, Sakiko, Chen, Guenevere (Qian), Desai, Kevin, and Najafirad, Peyman
Published: 2023
Full Text: View/download PDF

23. DTwin-TEC: An AI-based TEC district digital twin and emulating security events by leveraging knowledge graph

Author: Wajid, Mohammad Saif, Terashima-Marin, Hugo, Najafirad, Peyman, Pablos, Santiago Enrique Conant, and Wajid, Mohd Anas
Published: 2024
Full Text: View/download PDF

24. Generalized Zero-Shot Learning using Multimodal Variational Auto-Encoder with Semantic Concepts

Author: Bendre, Nihar, Desai, Kevin, and Najafirad, Peyman
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: With the ever-increasing amount of data, the central challenge in multimodal learning involves limitations of labelled samples. For the task of classification, techniques such as meta-learning, zero-shot learning, and few-shot learning showcase the ability to learn information about novel classes based on prior knowledge. Recent techniques try to learn a cross-modal mapping between the semantic space and the image space. However, they tend to ignore the local and global semantic knowledge. To overcome this problem, we propose a Multimodal Variational Auto-Encoder (M-VAE) which can learn the shared latent space of image features and the semantic space. In our approach we concatenate multimodal data to a single embedding before passing it to the VAE for learning the latent space. We propose the use of a multi-modal loss during the reconstruction of the feature embedding through the decoder. Our approach is capable to correlating modalities and exploit the local and global semantic knowledge for novel sample predictions. Our experimental results using a MLP classifier on four benchmark datasets show that our proposed model outperforms the current state-of-the-art approaches for generalized zero-shot learning., Comment: 5 pages, 2 figures, 2 tables
Published: 2021

25. Interpretable Self-supervised Multi-task Learning for COVID-19 Information Retrieval and Extraction

Author: Ebadi, Nima and Najafirad, Peyman
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language
Abstract: The rapidly evolving literature of COVID-19 related articles makes it challenging for NLP models to be effectively trained for information retrieval and extraction with the corresponding labeled data that follows the current distribution of the pandemic. On the other hand, due to the uncertainty of the situation, human experts' supervision would always be required to double check the decision making of these models highlighting the importance of interpretability. In the light of these challenges, this study proposes an interpretable self-supervised multi-task learning model to jointly and effectively tackle the tasks of information retrieval (IR) and extraction (IE) during the current emergency health crisis situation. Our results show that our model effectively leverage the multi-task and self-supervised learning to improve generalization, data efficiency and robustness to the ongoing dataset shift problem. Our model outperforms baselines in IE and IR tasks, respectively by micro-f score of 0.08 (LCA-F score of 0.05), and MAP of 0.05 on average. In IE the zero- and few-shot learning performances are on average 0.32 and 0.19 micro-f score higher than those of the baselines.
Published: 2021

26. Show Why the Answer is Correct! Towards Explainable AI using Compositional Temporal Attention

Author: Bendre, Nihar, Desai, Kevin, and Najafirad, Peyman
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Visual Question Answering (VQA) models have achieved significant success in recent times. Despite the success of VQA models, they are mostly black-box models providing no reasoning about the predicted answer, thus raising questions for their applicability in safety-critical such as autonomous systems and cyber-security. Current state of the art fail to better complex questions and thus are unable to exploit compositionality. To minimize the black-box effect of these models and also to make them better exploit compositionality, we propose a Dynamic Neural Network (DMN), which can understand a particular question and then dynamically assemble various relatively shallow deep learning modules from a pool of modules to form a network. We incorporate compositional temporal attention to these deep learning based modules to increase compositionality exploitation. This results in achieving better understanding of complex questions and also provides reasoning as to why the module predicts a particular answer. Experimental analysis on the two benchmark datasets, VQA2.0 and CLEVR, depicts that our model outperforms the previous approaches for Visual Question Answering task as well as provides better reasoning, thus making it reliable for mission critical applications like safety and security., Comment: 7 pages, 4 figures, 3 tables
Published: 2021

27. Adaptive Clustering of Robust Semantic Representations for Adversarial Image Purification

Author: Silva, Samuel Henrique, Das, Arun, Scarff, Ian, and Najafirad, Peyman
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Deep Learning models are highly susceptible to adversarial manipulations that can lead to catastrophic consequences. One of the most effective methods to defend against such disturbances is adversarial training but at the cost of generalization of unseen attacks and transferability across models. In this paper, we propose a robust defense against adversarial attacks, which is model agnostic and generalizable to unseen adversaries. Initially, with a baseline model, we extract the latent representations for each class and adaptively cluster the latent representations that share a semantic similarity. We obtain the distributions for the clustered latent representations and from their originating images, we learn semantic reconstruction dictionaries (SRD). We adversarially train a new model constraining the latent space representation to minimize the distance between the adversarial latent representation and the true cluster distribution. To purify the image, we decompose the input into low and high-frequency components. The high-frequency component is reconstructed based on the most adequate SRD from the clean dataset. In order to evaluate the most adequate SRD, we rely on the distance between robust latent representations and semantic cluster distributions. The output is a purified image with no perturbation. Image purification on CIFAR-10 and ImageNet-10 using our proposed method improved the accuracy by more than 10% compared to state-of-the-art results., Comment: 11 pages, 5 figures, 4 tables
Published: 2021

28. A Self-supervised Approach for Semantic Indexing in the Context of COVID-19 Pandemic

Author: Ebadi, Nima and Najafirad, Peyman
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: The pandemic has accelerated the pace at which COVID-19 scientific papers are published. In addition, the process of manually assigning semantic indexes to these papers by experts is even more time-consuming and overwhelming in the current health crisis. Therefore, there is an urgent need for automatic semantic indexing models which can effectively scale-up to newly introduced concepts and rapidly evolving distributions of the hyperfocused related literature. In this research, we present a novel semantic indexing approach based on the state-of-the-art self-supervised representation learning and transformer encoding exclusively suitable for pandemic crises. We present a case study on a novel dataset that is based on COVID-19 papers published and manually indexed in PubMed. Our study shows that our self-supervised model outperforms the best performing models of BioASQ Task 8a by micro-F1 score of 0.1 and LCA-F score of 0.08 on average. Our model also shows superior performance on detecting the supplementary concepts which is quite important when the focus of the literature has drastically shifted towards specific concepts related to the pandemic. Our study sheds light on the main challenges confronting semantic indexing models during a pandemic, namely new domains and drastic changes of their distributions, and as a superior alternative for such situations, propose a model founded on approaches which have shown auspicious performance in improving generalization and data efficiency in various NLP tasks. We also show the joint indexing of major Medical Subject Headings (MeSH) and supplementary concepts improves the overall performance.
Published: 2020

29. Stuttering Speech Disfluency Prediction using Explainable Attribution Vectors of Facial Muscle Movements

Author: Das, Arun, Mock, Jeffrey, Chacon, Henry, Irani, Farzan, Golob, Edward, and Najafirad, Peyman
Subjects: Computer Science - Computer Vision and Pattern Recognition, Quantitative Biology - Neurons and Cognition
Abstract: Speech disorders such as stuttering disrupt the normal fluency of speech by involuntary repetitions, prolongations and blocking of sounds and syllables. In addition to these disruptions to speech fluency, most adults who stutter (AWS) also experience numerous observable secondary behaviors before, during, and after a stuttering moment, often involving the facial muscles. Recent studies have explored automatic detection of stuttering using Artificial Intelligence (AI) based algorithm from respiratory rate, audio, etc. during speech utterance. However, most methods require controlled environments and/or invasive wearable sensors, and are unable explain why a decision (fluent vs stuttered) was made. We hypothesize that pre-speech facial activity in AWS, which can be captured non-invasively, contains enough information to accurately classify the upcoming utterance as either fluent or stuttered. Towards this end, this paper proposes a novel explainable AI (XAI) assisted convolutional neural network (CNN) classifier to predict near future stuttering by learning temporal facial muscle movement patterns of AWS and explains the important facial muscles and actions involved. Statistical analyses reveal significantly high prevalence of cheek muscles (p<0.005) and lip muscles (p<0.005) to predict stuttering and shows a behavior conducive of arousal and anticipation to speak. The temporal study of these upper and lower facial muscles may facilitate early detection of stuttering, promote automated assessment of stuttering and have application in behavioral therapies by providing automatic non-invasive feedback in realtime., Comment: Submitting to IEEE Trans. 10 pages, 7 figures. Final Manuscript
Published: 2020

30. Learning from Few Samples: A Survey

Author: Bendre, Nihar, Marín, Hugo Terashima, and Najafirad, Peyman
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep neural networks have been able to outperform humans in some cases like image recognition and image classification. However, with the emergence of various novel categories, the ability to continuously widen the learning capability of such networks from limited samples, still remains a challenge. Techniques like Meta-Learning and/or few-shot learning showed promising results, where they can learn or generalize to a novel category/task based on prior knowledge. In this paper, we perform a study of the existing few-shot meta-learning techniques in the computer vision domain based on their method and evaluation metrics. We provide a taxonomy for the techniques and categorize them as data-augmentation, embedding, optimization and semantics based learning for few-shot, one-shot and zero-shot settings. We then describe the seminal work done in each category and discuss their approach towards solving the predicament of learning from few samples. Lastly we provide a comparison of these techniques on the commonly used benchmark datasets: Omniglot, and MiniImagenet, along with a discussion towards the future direction of improving the performance of these techniques towards the final goal of outperforming humans., Comment: 17 pages, 10 figures
Published: 2020

31. Opportunities and Challenges in Deep Learning Adversarial Robustness: A Survey

Author: Silva, Samuel Henrique and Najafirad, Peyman
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: As we seek to deploy machine learning models beyond virtual and controlled domains, it is critical to analyze not only the accuracy or the fact that it works most of the time, but if such a model is truly robust and reliable. This paper studies strategies to implement adversary robustly trained algorithms towards guaranteeing safety in machine learning algorithms. We provide a taxonomy to classify adversarial attacks and defenses, formulate the Robust Optimization problem in a min-max setting and divide it into 3 subcategories, namely: Adversarial (re)Training, Regularization Approach, and Certified Defenses. We survey the most recent and important results in adversarial example generation, defense mechanisms with adversarial (re)Training as their main defense against perturbations. We also survey mothods that add regularization terms that change the behavior of the gradient, making it harder for attackers to achieve their objective. Alternatively, we've surveyed methods which formally derive certificates of robustness by exactly solving the optimization problem or by approximations using upper or lower bounds. In addition, we discuss the challenges faced by most of the recent algorithms presenting future research perspectives., Comment: 20 pages, 9 figures, submited to IEEE Transactions on Knowledge and Data Engineering
Published: 2020

32. CBCT-guided adaptive radiotherapy using self-supervised sequential domain adaptation with uncertainty estimation

Author: Ebadi, Nima, Li, Ruiqi, Das, Arun, Roy, Arkajyoti, Nikos, Papanikolaou, and Najafirad, Peyman
Published: 2023
Full Text: View/download PDF

33. Deepfake forensics analysis: An explainable hierarchical ensemble of weakly supervised models

Author: Silva, Samuel Henrique, Bethany, Mazal, Votto, Alexis Megan, Scarff, Ian Henry, Beebe, Nicole, and Najafirad, Peyman
Published: 2022
Full Text: View/download PDF

34. Artificial Intelligence in Tactical Human Resource Management: A Systematic Literature Review

Author: Votto, Alexis Megan, Valecha, Rohit, Najafirad, Peyman, and Rao, H. Raghav
Published: 2021
Full Text: View/download PDF

35. Large Language Model Lateral Spear Phishing: A Comparative Study in Large-Scale Organizational Settings

Author: Bethany, Mazal, Galiopoulos, Athanasios, Bethany, Emet, Karkevandi, Mohammad Bahrami, Vishwamitra, Nishant, Najafirad, Peyman, Bethany, Mazal, Galiopoulos, Athanasios, Bethany, Emet, Karkevandi, Mohammad Bahrami, Vishwamitra, Nishant, and Najafirad, Peyman
Abstract: The critical threat of phishing emails has been further exacerbated by the potential of LLMs to generate highly targeted, personalized, and automated spear phishing attacks. Two critical problems concerning LLM-facilitated phishing require further investigation: 1) Existing studies on lateral phishing lack specific examination of LLM integration for large-scale attacks targeting the entire organization, and 2) Current anti-phishing infrastructure, despite its extensive development, lacks the capability to prevent LLM-generated attacks, potentially impacting both employees and IT security incident management. However, the execution of such investigative studies necessitates a real-world environment, one that functions during regular business operations and mirrors the complexity of a large organizational infrastructure. This setting must also offer the flexibility required to facilitate a diverse array of experimental conditions, particularly the incorporation of phishing emails crafted by LLMs. This study is a pioneering exploration into the use of Large Language Models (LLMs) for the creation of targeted lateral phishing emails, targeting a large tier 1 university's operation and workforce of approximately 9,000 individuals over an 11-month period. It also evaluates the capability of email filtering infrastructure to detect such LLM-generated phishing attempts, providing insights into their effectiveness and identifying potential areas for improvement. Based on our findings, we propose machine learning-based detection techniques for such emails to detect LLM-generated phishing emails that were missed by the existing infrastructure, with an F1-score of 98.96.
Published: 2024

36. Causative Insights into Open Source Software Security using Large Language Code Embeddings and Semantic Vulnerability Graph

Author: Islam, Nafis Tanveer, Parra, Gonzalo De La Torre, Manual, Dylan, Jadliwala, Murtuza, Najafirad, Peyman, Islam, Nafis Tanveer, Parra, Gonzalo De La Torre, Manual, Dylan, Jadliwala, Murtuza, and Najafirad, Peyman
Abstract: Open Source Software (OSS) security and resilience are worldwide phenomena hampering economic and technological innovation. OSS vulnerabilities can cause unauthorized access, data breaches, network disruptions, and privacy violations, rendering any benefits worthless. While recent deep-learning techniques have shown great promise in identifying and localizing vulnerabilities in source code, it is unclear how effective these research techniques are from a usability perspective due to a lack of proper methodological analysis. Usually, these methods offload a developer's task of classifying and localizing vulnerable code; still, a reasonable study to measure the actual effectiveness of these systems to the end user has yet to be conducted. To address the challenge of proper developer training from the prior methods, we propose a system to link vulnerabilities to their root cause, thereby intuitively educating the developers to code more securely. Furthermore, we provide a comprehensive usability study to test the effectiveness of our system in fixing vulnerabilities and its capability to assist developers in writing more secure code. We demonstrate the effectiveness of our system by showing its efficacy in helping developers fix source code with vulnerabilities. Our study shows a 24% improvement in code repair capabilities compared to previous methods. We also show that, when trained by our system, on average, approximately 9% of the developers naturally tend to write more secure code with fewer vulnerabilities.
Published: 2024

37. ZRG: A Dataset for Multimodal 3D Residential Rooftop Understanding

Author: Corley, Isaac, primary, Lwowski, Jonathan, additional, and Najafirad, Peyman, additional
Published: 2024
Full Text: View/download PDF

38. Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods

Author: Wajid, Mohammad Saif, primary, Terashima‐Marin, Hugo, additional, Najafirad, Peyman, additional, and Wajid, Mohd Anas, additional
Published: 2023
Full Text: View/download PDF

39. An Unbiased Transformer Source Code Learning with Semantic Vulnerability Graph

Author: Islam, Nafis Tanveer, primary, De La Torre Parra, Gonzalo, additional, Manuel, Dylan, additional, Bou-Harb, Elias, additional, and Najafirad, Peyman, additional
Published: 2023
Full Text: View/download PDF

40. Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods.

Author: Wajid, Mohammad Saif, Terashima‐Marin, Hugo, Najafirad, Peyman, and Wajid, Mohd Anas
Subjects: DEEP learning, KNOWLEDGE graphs, NATURAL language processing, COMPUTER vision, ARTIFICIAL intelligence
Abstract: Generating an image/video caption has always been a fundamental problem of Artificial Intelligence, which is usually performed using the potential of Deep Learning Methods, Computer Vision, Knowledge Graphs, and Natural Language Processing (NLP). The significant task of image/video captioning is to describe visual content in terms of natural language. Due to a semantic gap, this presents a massive problem in understanding and explaining images or videos syntactically and semantically. The current systems need somewhere to fill the gap between low‐level and high‐level features while mapping. Therefore, to tackle this problem, there is a need to describe the latest research and methods to overcome difficulties and to propose effective solutions. This work thoroughly analyses and investigates the most related methods (deep learning and knowledge graph‐based approaches), benchmark datasets, and evaluation metrics with their benefits and limitations. Here we have also reviewed the state‐of‐the‐art methods related to image/video captioning and their applications in the current scenario. Finally, we provide thorough information on existing research with comparisons of results on benchmark datasets. We have also mentioned the existing challenges and future direction of research. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. ZRG: A High Resolution 3D Residential Rooftop Geometry Dataset for Machine Learning

Author: Corley, Isaac, Lwowski, Jonathan, and Najafirad, Peyman
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)
Abstract: In this paper we present the Zeitview Rooftop Geometry (ZRG) dataset. ZRG contains thousands of samples of high resolution orthomosaics of aerial imagery of residential rooftops with corresponding digital surface models (DSM), 3D rooftop wireframes, and multiview imagery generated point clouds for the purpose of residential rooftop geometry and scene understanding. We perform thorough benchmarks to illustrate the numerous applications unlocked by this dataset and provide baselines for the tasks of roof outline extraction, monocular height estimation, and planar roof structure extraction.
Published: 2023

42. Can Hierarchical Transformers Learn Facial Geometry?

Author: Young, Paul, primary, Ebadi, Nima, additional, Das, Arun, additional, Bethany, Mazal, additional, Desai, Kevin, additional, and Najafirad, Peyman, additional
Published: 2023
Full Text: View/download PDF

43. The Case for Integrated Advanced Technology in Applied Behavior Analysis

Author: Neely, Leslie, primary, Carnett, Amarie, additional, Quarles, John, additional, MacNaul, Hannah, additional, Park, Se-Woong, additional, Oyama, Sakiko, additional, Chen, Guenevere, additional, Desai, Kevin, additional, and Najafirad, Peyman, additional
Published: 2022
Full Text: View/download PDF

44. Supervising Remote Sensing Change Detection Models With 3d Surface Semantics

Author: Corley, Isaac, primary and Najafirad, Peyman, additional
Published: 2022
Full Text: View/download PDF

45. Natural Disaster Analytics using High Resolution Satellite Images

Author: Bendre, Nihar, primary, Zand, Neda, additional, Bhattarai, Sujan, additional, Corley, Isaac, additional, Jamshidi, Mo, additional, and Najafirad, Peyman, additional
Published: 2022
Full Text: View/download PDF

46. Multimodal explainable AI predicts upcoming speech behavior in adults who stutter

Author: Das, Arun, primary, Mock, Jeffrey, additional, Irani, Farzan, additional, Huang, Yufei, additional, Najafirad, Peyman, additional, and Golob, Edward, additional
Published: 2022
Full Text: View/download PDF

47. Distributed AI-Driven Search Engine on Visual Internet-of-Things for Event Discovery in the Cloud

Author: Das, Arun, primary, Roopaei, Mehdi, additional, Jamshidi, Mo, additional, and Najafirad, Peyman, additional
Published: 2022
Full Text: View/download PDF

48. Adaptive Clustering of Robust Semantic Representations for Adversarial Image Purification on Social Networks

Author: Silva, Samuel Henrique, primary, Das, Arun, additional, Aladdini, Adel, additional, and Najafirad, Peyman, additional
Published: 2022
Full Text: View/download PDF

49. Show Why the Answer is Correct! Towards Explainable AI using Compositional Temporal Attention

Author: Bendre, Nihar, primary, Desai, Kevin, additional, and Najafirad, Peyman, additional
Published: 2021
Full Text: View/download PDF

50. Generalized Zero-Shot Learning Using Multimodal Variational Auto-Encoder With Semantic Concepts

Author: Bendre, Nihar, primary, Desai, Kevin, additional, and Najafirad, Peyman, additional
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

105 results on '"Najafirad, Peyman"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources