Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Singapore Management University

Discipline
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 181 - 210 of 7445

Full-Text Articles in Physical Sciences and Mathematics

Beyond A Joke: Dead Code Elimination Can Delete Live Code, Haoxin Tu, Lingxiao Jiang, Debin Gao, He Jiang Apr 2024

Beyond A Joke: Dead Code Elimination Can Delete Live Code, Haoxin Tu, Lingxiao Jiang, Debin Gao, He Jiang

Research Collection School Of Computing and Information Systems

Dead Code Elimination (DCE) is a fundamental compiler optimization technique that removes dead code (e.g., unreachable or reachable but whose results are unused) in the program to produce smaller or faster executables. However, since compiler optimizations are typically aggressively performed and there are complex relationships/interplay among a vast number of compiler optimizations (including DCE), it is not known whether DCE is indeed correctly performed and will only delete dead code in practice. In this study, we open a new research problem to investigate: can DCE happen to erroneously delete live code? To tackle this problem, we design a new approach …


Impact Of Government Outsourcing Contracts On High-Tech Vendors: An Empirical Study, Yi Dong, Nan Hu, Yonghua Ji, Chenkai Ni, Jing Xie Apr 2024

Impact Of Government Outsourcing Contracts On High-Tech Vendors: An Empirical Study, Yi Dong, Nan Hu, Yonghua Ji, Chenkai Ni, Jing Xie

Research Collection School Of Computing and Information Systems

Outsourcing is an important strategic decision of high-tech firms. However, while the research has extensively studied the implications of outsourcing to high-tech clients, its impact on high-tech vendors remains underexplored. This study empirically estimates the impact of government outsourcing contracts on high-tech vendors. Employing the earnings-return analyses framework, we find that, for high-tech vendors engaged in government outsourcing contracts, the stock market places a higher value on each unit of unexpected earnings compared to other firms. Additionally, this impact becomes stronger for contracts with longer terms, for contracts outsourced by the U.S. government or by countries with better political and …


Exploring The Potential Of Chatgpt In Automated Code Refinement: An Empirical Study, Guo Qi, Junming Cao, Xiaofei Xie, Shangqing Liu, Xiaohong Li, Bihuan Chen, Xin Peng Apr 2024

Exploring The Potential Of Chatgpt In Automated Code Refinement: An Empirical Study, Guo Qi, Junming Cao, Xiaofei Xie, Shangqing Liu, Xiaohong Li, Bihuan Chen, Xin Peng

Research Collection School Of Computing and Information Systems

Code review is an essential activity for ensuring the quality and maintainability of software projects. However, it is a time-consuming and often error-prone task that can significantly impact the development process. Recently, ChatGPT, a cutting-edge language model, has demonstrated impressive performance in various natural language processing tasks, suggesting its potential to automate code review processes. However, it is still unclear how well ChatGPT performs in code review tasks. To fill this gap, in this paper, we conduct the first empirical study to understand the capabilities of ChatGPT in code review tasks, specifically focusing on automated code refinement based on given …


Out Of Sight, Out Of Mind: Better Automatic Vulnerability Repair By Broadening Input Ranges And Sources, Xin Zhou, Kisub Kim, Bowen Xu, Donggyun Han, David Lo Apr 2024

Out Of Sight, Out Of Mind: Better Automatic Vulnerability Repair By Broadening Input Ranges And Sources, Xin Zhou, Kisub Kim, Bowen Xu, Donggyun Han, David Lo

Research Collection School Of Computing and Information Systems

The advances of deep learning (DL) have paved the way for automatic software vulnerability repair approaches, which effectively learn the mapping from the vulnerable code to the fixed code. Nevertheless, existing DL-based vulnerability repair methods face notable limitations: 1) they struggle to handle lengthy vulnerable code, 2) they treat code as natural language texts, neglecting its inherent structure, and 3) they do not tap into the valuable expert knowledge present in the expert system. To address this, we propose VulMaster, a Transformer-based neural network model that excels at generating vulnerability repairs by comprehensively understanding the entire vulnerable code, irrespective of …


Greening Large Language Models Of Code, Jieke Shi, Zhou Yang, Hong Jin Kang, Bowen Xu, Junda He, David Lo Apr 2024

Greening Large Language Models Of Code, Jieke Shi, Zhou Yang, Hong Jin Kang, Bowen Xu, Junda He, David Lo

Research Collection School Of Computing and Information Systems

Large language models of code have shown remarkable effectiveness across various software engineering tasks. Despite the availability of many cloud services built upon these powerful models, there remain several scenarios where developers cannot take full advantage of them, stemming from factors such as restricted or unreliable internet access, institutional privacy policies that prohibit external transmission of code to third-party vendors, and more. Therefore, developing a compact, efficient, and yet energy-saving model for deployment on developers' devices becomes essential.To this aim, we propose Avatar, a novel approach that crafts a deployable model from a large language model of code by optimizing …


Ps3: Precise Patch Presence Test Based On Semantic Symbolic Signature, Qi Zhan, Xing Hu, Zhiyang Li, Xin Xia, David Lo, Shanping Li Apr 2024

Ps3: Precise Patch Presence Test Based On Semantic Symbolic Signature, Qi Zhan, Xing Hu, Zhiyang Li, Xin Xia, David Lo, Shanping Li

Research Collection School Of Computing and Information Systems

During software development, vulnerabilities have posed a significant threat to users. Patches are the most effective way to combat vulnerabilities. In a large-scale software system, testing the presence of a security patch in every affected binary is crucial to ensure system security. Identifying whether a binary has been patched for a known vulnerability is challenging, as there may only be small differences between patched and vulnerable versions. Existing approaches mainly focus on detecting patches that are compiled in the same compiler options. However, it is common for developers to compile programs with very different compiler options in different situations, which …


Coca: Improving And Explaining Graph Neural Network-Based Vulnerability Detection Systems, Sicong Cao, Xiaobing Sun, Xiaoxue Wu, David Lo, Lili Bo, Bin Li, Wei Liu Apr 2024

Coca: Improving And Explaining Graph Neural Network-Based Vulnerability Detection Systems, Sicong Cao, Xiaobing Sun, Xiaoxue Wu, David Lo, Lili Bo, Bin Li, Wei Liu

Research Collection School Of Computing and Information Systems

Recently, Graph Neural Network (GNN)-based vulnerability detection systems have achieved remarkable success. However, the lack of explainability poses a critical challenge to deploy black-box models in security-related domains. For this reason, several approaches have been proposed to explain the decision logic of the detection model by providing a set of crucial statements positively contributing to its predictions. Unfortunately, due to the weakly-robust detection models and suboptimal explanation strategy, they have the danger of revealing spurious correlations and redundancy issue.In this paper, we propose Coca, a general framework aiming to 1) enhance the robustness of existing GNN-based vulnerability detection models to …


Ppt4j: Patch Presence Test For Java Binaries, Zhiyuan Pan, Xing Hu, Xin Xia, Xian Zhan, David Lo, Xiaohu Yang Apr 2024

Ppt4j: Patch Presence Test For Java Binaries, Zhiyuan Pan, Xing Hu, Xin Xia, Xian Zhan, David Lo, Xiaohu Yang

Research Collection School Of Computing and Information Systems

The number of vulnerabilities reported in open source software has increased substantially in recent years. Security patches provide the necessary measures to protect software from attacks and vulnerabilities. In practice, it is difficult to identify whether patches have been integrated into software, especially if we only have binary files. Therefore, the ability to test whether a patch is applied to the target binary, a.k.a. patch presence test, is crucial for practitioners. However, it is challenging to obtain accurate semantic information from patches, which could lead to incorrect results.In this paper, we propose a new patch presence test framework named Ppt4J …


Exploiting Library Vulnerability Via Migration-Based Automated Test Generation, Zirui Chen, Xing Hu, Xin Xia, Yi Gao, Tongtong Xu, David Lo, Xiaohu Yang Apr 2024

Exploiting Library Vulnerability Via Migration-Based Automated Test Generation, Zirui Chen, Xing Hu, Xin Xia, Yi Gao, Tongtong Xu, David Lo, Xiaohu Yang

Research Collection School Of Computing and Information Systems

In software development, developers extensively utilize third-party libraries to avoid implementing existing functionalities. When a new third-party library vulnerability is disclosed, project maintainers need to determine whether their projects are affected by the vulnerability, which requires developers to invest substantial effort in assessment. However, existing tools face a series of issues: static analysis tools produce false alarms, dynamic analysis tools require existing tests and test generation tools have low success rates when facing complex vulnerabilities.Vulnerability exploits, as code snippets provided for reproducing vulnerabilities after disclosure, contain a wealth of vulnerability-related information. This study proposes a new method based on vulnerability …


Curiosity-Driven Testing For Sequential Decision-Making Process, Junda He, Zhou Yang, Jieke Shi, Chengran Yang, Kisub Kim, Bowen Xu, Xin Zhou, David Lo Apr 2024

Curiosity-Driven Testing For Sequential Decision-Making Process, Junda He, Zhou Yang, Jieke Shi, Chengran Yang, Kisub Kim, Bowen Xu, Xin Zhou, David Lo

Research Collection School Of Computing and Information Systems

Sequential decision-making processes (SDPs) are fundamental for complex real-world challenges, such as autonomous driving, robotic control, and traffic management. While recent advances in Deep Learning (DL) have led to mature solutions for solving these complex problems, SDMs remain vulnerable to learning unsafe behaviors, posing significant risks in safety-critical applications. However, developing a testing framework for SDMs that can identify a diverse set of crash-triggering scenarios remains an open challenge. To address this, we propose CureFuzz, a novel curiosity-driven black-box fuzz testing approach for SDMs. CureFuzz proposes a curiosity mechanism that allows a fuzzer to effectively explore novel and diverse scenarios, …


Context-Aware Representation: Jointly Learning Item Features And Selection From Triplets, Rodrigo Alves, Antoine Ledent Apr 2024

Context-Aware Representation: Jointly Learning Item Features And Selection From Triplets, Rodrigo Alves, Antoine Ledent

Research Collection School Of Computing and Information Systems

In areas of machine learning such as cognitive modeling or recommendation, user feedback is usually context-dependent. For instance, a website might provide a user with a set of recommendations and observe which (if any) of the links were clicked by the user. Similarly, there is growing interest in the so-called “odd-one-out” learning setting, where human participants are provided with a basket of items and asked which is the most dissimilar to the others. In both of those cases, the presence of all the items in the basket can influence the final decision. In this article, we consider a classification task …


Test Optimization In Dnn Testing: A Survey, Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Lei Ma, Mike Papadakis, Yves Le Traon Apr 2024

Test Optimization In Dnn Testing: A Survey, Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Lei Ma, Mike Papadakis, Yves Le Traon

Research Collection School Of Computing and Information Systems

This article presents a comprehensive survey on test optimization in deep neural network (DNN) testing. Here, test optimization refers to testing with low data labeling effort. We analyzed 90 papers, including 43 from the software engineering (SE) community, 32 from the machine learning (ML) community, and 15 from other communities. Our study: (i) unifies the problems as well as terminologies associated with low-labeling cost testing, (ii) compares the distinct focal points of SE and ML communities, and (iii) reveals the pitfalls in existing literature. Furthermore, we highlight the research opportunities in this domain.


Large Language Model For Vulnerability Detection: Emerging Results And Future Directions, Xin Zhou, Ting Zhang, David Lo Apr 2024

Large Language Model For Vulnerability Detection: Emerging Results And Future Directions, Xin Zhou, Ting Zhang, David Lo

Research Collection School Of Computing and Information Systems

Previous learning-based vulnerability detection methods relied on either medium-sized pre-trained models or smaller neural networks from scratch. Recent advancements in Large Pre-Trained Language Models (LLMs) have showcased remarkable few-shot learning capabilities in various tasks. However, the effectiveness of LLMs in detecting software vulnerabilities is largely unexplored. This paper aims to bridge this gap by exploring how LLMs perform with various prompts, particularly focusing on two state-of-the-art LLMs: GPT-3.5 and GPT-4. Our experimental results showed that GPT-3.5 achieves competitive performance with the prior state-of-the-art vulnerability detection approach and GPT-4 consistently outperformed the state-of-the-art.


Enhancing Source Code Representations For Deep Learning With Static Analysis, Xueting Guan, Christoph Treude Apr 2024

Enhancing Source Code Representations For Deep Learning With Static Analysis, Xueting Guan, Christoph Treude

Research Collection School Of Computing and Information Systems

Deep learning techniques applied to program analysis tasks such as code classification, summarization, and bug detection have seen widespread interest. Traditional approaches, however, treat programming source code as natural language text, which may neglect significant structural or semantic details. Additionally, most current methods of representing source code focus solely on the code, without considering beneficial additional context. This paper explores the integration of static analysis and additional context such as bug reports and design patterns into source code representations for deep learning models. We use the Abstract Syntax Tree-based Neural Network (ASTNN) method and augment it with additional context information …


Going Viral: Case Studies On The Impact Of Protestware, Youmei Fan, Dong Wang, Supastsara Wattanakriengkrai, Hathaichanok Damrongsiri, Christoph Treude, Hideaki Hata, Raula Gaikovina Kula Apr 2024

Going Viral: Case Studies On The Impact Of Protestware, Youmei Fan, Dong Wang, Supastsara Wattanakriengkrai, Hathaichanok Damrongsiri, Christoph Treude, Hideaki Hata, Raula Gaikovina Kula

Research Collection School Of Computing and Information Systems

Maintainers are now self-sabotaging their work in order to take political or economic stances, a practice referred to as "protestware". In this poster, we present our approach to understand how the discourse about such an attack went viral, how it is received by the community, and whether developers respond to the attack in a timely manner. We study two notable protestware cases, i.e., Colors.js and es5-ext, comparing with discussions of a typical security vulnerability as a baseline, i.e., Ua-parser, and perform a thematic analysis of more than two thousand protest-related posts to extract the different narratives when discussing protestware.


Creative And Correct: Requesting Diverse Code Solutions From Ai, Scott Blyth, Markus Wagner, Christoph Treude Apr 2024

Creative And Correct: Requesting Diverse Code Solutions From Ai, Scott Blyth, Markus Wagner, Christoph Treude

Research Collection School Of Computing and Information Systems

AI foundation models have the capability to produce a wide array of responses to a single prompt, a feature that is highly beneficial in software engineering to generate diverse code solutions. However, this advantage introduces a significant trade-off between diversity and correctness. In software engineering tasks, diversity is key to exploring design spaces and fostering creativity, but the practical value of these solutions is heavily dependent on their correctness. Our study systematically investigates this trade-off using experiments with HumanEval tasks, exploring various parameter settings and prompting strategies. We assess the diversity of code solutions using similarity metrics from the code …


Minimon: Minimizing Android Applications With Intelligent Monitoring-Based Debloating, Jiakun Liu, Zicheng Zhang, Xing Hu, Thung Ferdian, Shahar Maoz, Debin Gao, Eran Toch, Zhipeng Zhao, David Lo Apr 2024

Minimon: Minimizing Android Applications With Intelligent Monitoring-Based Debloating, Jiakun Liu, Zicheng Zhang, Xing Hu, Thung Ferdian, Shahar Maoz, Debin Gao, Eran Toch, Zhipeng Zhao, David Lo

Research Collection School Of Computing and Information Systems

The size of Android applications is getting larger to fulfill the requirements of various users. However, not all the features of the applications are needed and desired by a specific user. The unnecessary and non-desired features can increase the attack surface and consume system resources such as storage and memory. To address this issue, we propose a framework, MiniMon, to debloat unnecessary features from an Android app based on the logs of specific users' interactions with the app.However, rarely used features may not be recorded during the data collection, and users' preferences may change slightly over time. To address these …


Code Search Is All You Need? Improving Code Suggestions With Code Search, Junkai Chen, Xing Hu, Zhenhao Li, Cuiyun Gao, Xin Xia, David Lo Apr 2024

Code Search Is All You Need? Improving Code Suggestions With Code Search, Junkai Chen, Xing Hu, Zhenhao Li, Cuiyun Gao, Xin Xia, David Lo

Research Collection School Of Computing and Information Systems

Modern integrated development environments (IDEs) provide various automated code suggestion techniques (e.g., code completion and code generation) to help developers improve their efficiency. Such techniques may retrieve similar code snippets from the code base or leverage deep learning models to provide code suggestions. However, how to effectively enhance the code suggestions using code retrieval has not been systematically investigated. In this paper, we study and explore a retrieval-augmented framework for code suggestions. Specifically, our framework leverages different retrieval approaches and search strategies to search similar code snippets. Then the retrieved code is used to further enhance the performance of language …


Streamlining Java Programming: Uncovering Well-Formed Idioms With Idiomine, Yanming Yang, Xing Hu, Xin Xia, David Lo, Xiaohu Yang Apr 2024

Streamlining Java Programming: Uncovering Well-Formed Idioms With Idiomine, Yanming Yang, Xing Hu, Xin Xia, David Lo, Xiaohu Yang

Research Collection School Of Computing and Information Systems

Code idioms are commonly used patterns, techniques, or practices that aid in solving particular problems or specific tasks across multiple software projects. They can improve code quality, performance, and maintainability, and also promote program standardization and reuse across projects. However, identifying code idioms is significantly challenging, as existing studies have still suffered from three main limitations. First, it is difficult to recognize idioms that span non-contiguous code lines. Second, identifying idioms with intricate data flow and code structures can be challenging. Moreover, they only extract dataset-specific idioms, so common idioms or well-established code/design patterns that are rarely found in datasets …


Towards Speedy Permission-Based Debloating For Android Apps, Thung Ferdian, Jiakun Liu, Pattarakrit Rattanukul, Shahar Maoz, Eran Toch, Debin Gao, David Lo Apr 2024

Towards Speedy Permission-Based Debloating For Android Apps, Thung Ferdian, Jiakun Liu, Pattarakrit Rattanukul, Shahar Maoz, Eran Toch, Debin Gao, David Lo

Research Collection School Of Computing and Information Systems

Android apps typically include many functionalities that not all users require. These result in software bloat that increases possible attack surface and app size. Common functionalities that users may not require are related to permissions that they intend to disallow in the first place. As these permissions are disallowed, their related code would never be executed and therefore can be safely removed. Existing work has proposed a solution to debloat Android apps according to the disallowed permissions. However, for large and complex applications, the debloating process could take hours, typically due the long time that may be needed to construct …


Environmental, Social, And Governance (Esg) And Artificial Intelligence In Finance: State-Of-The-Art And Research Takeaways, Tristan Lim Apr 2024

Environmental, Social, And Governance (Esg) And Artificial Intelligence In Finance: State-Of-The-Art And Research Takeaways, Tristan Lim

Research Collection School Of Computing and Information Systems

The rapidly growing research landscape in finance, encompassing environmental, social, and governance (ESG) topics and associated Artificial Intelligence (AI) applications, presents challenges for both new researchers and seasoned practitioners. This study aims to systematically map the research area, identify knowledge gaps, and examine potential research areas for researchers and practitioners. The investigation focuses on three primary research questions: the main research themes concerning ESG and AI in finance, the evolution of research intensity and interest in these areas, and the application and evolution of AI techniques specifically in research studies within the ESG and AI in finance domain. Eight archetypical …


Editorial: Emerging On-Demand Passenger And Logistics Systems: Modelling, Optimization, And Data Analytics, Jintao Ke, Hai Wang, Neda Masoud, Maximilian Schiffer, Goncalo H. A. Correia Apr 2024

Editorial: Emerging On-Demand Passenger And Logistics Systems: Modelling, Optimization, And Data Analytics, Jintao Ke, Hai Wang, Neda Masoud, Maximilian Schiffer, Goncalo H. A. Correia

Research Collection School Of Computing and Information Systems

The proliferation of smart personal devices and mobile internet access has fueled numerous advancements in on-demand transportation services. These services are facilitated by online digital platforms and range from providing rides to delivering products. Their influence is transforming transportation systems and leaving a mark on changing individual mobility, activity patterns, and consumption behaviors. For instance, on-demand transportation companies such as Uber, Lyft, Grab, and DiDi have become increasingly vital for meeting urban transportation needs by connecting available drivers with passengers in real time. The recent surge in door-to-door food delivery (e.g., Uber Eats, DoorDash, Meituan); grocery delivery (e.g., Amazon Fresh, …


W4-Groups: Modeling The Who, What, When And Where Of Group Behavior Via Mobility Sensing, Akansha Atrey, Camellia Zakaria, Rajesh Krishna Balan, Prashant Shenoy Apr 2024

W4-Groups: Modeling The Who, What, When And Where Of Group Behavior Via Mobility Sensing, Akansha Atrey, Camellia Zakaria, Rajesh Krishna Balan, Prashant Shenoy

Research Collection School Of Computing and Information Systems

Human social interactions occur in group settings of varying sizes and locations, depending on the type of social activity. The ability to distinguish group formations based on their purposes transforms how group detection mechanisms function. Not only should such tools support the effective detection of serendipitous encounters, but they can derive categories of relation types among users. Determining who is involved, what activity is performed, and when and where the activity occurs are critical to understanding group processes in greater depth, including supporting goal-oriented applications (e.g., performance, productivity, and mental health) that require sensing social factors. In this work, we …


Unleashing The Power Of Clippy In Real-World Rust Projects, Chunmiao Li, Yijun Yu, Haitao Wu, Luca Carlig, Shijie Nie, Lingxiao Jiang Apr 2024

Unleashing The Power Of Clippy In Real-World Rust Projects, Chunmiao Li, Yijun Yu, Haitao Wu, Luca Carlig, Shijie Nie, Lingxiao Jiang

Research Collection School Of Computing and Information Systems

The error messages generated by the Rust compiler (rustc) are useful for developers to identify and diagnose suspicious code segments. Complementing the compiler, linters can also play an important role in promoting the adherence to certain coding style conventions and best practices. Prominent linters utilized in the Rust ecosystem include Clippy [1] and Rustfmt [2]. Among them, the Rust community particularly emphasizes on the importance of heeding the warnings provided by Clippy to mitigate common errors and promote the adoption of idiomatic conventions. Clippy provides a set of more than 600 lints in addition to the built-in rustc lints. These …


Bidirectional Paper-Repository Tracing In Software Engineering, Daniel Garijo, Miguel Arroyo, Esteban González Guardia, Christoph Treude, Nicola Tarocco Apr 2024

Bidirectional Paper-Repository Tracing In Software Engineering, Daniel Garijo, Miguel Arroyo, Esteban González Guardia, Christoph Treude, Nicola Tarocco

Research Collection School Of Computing and Information Systems

While computer science papers frequently include their associated code repositories, establishing a clear link between papers and their corresponding implementations may be challenging due to the number of code repositories used in research publications. In this paper we describe a lightweight method for effectively identifying bidirectional links between papers and repositories from both LaTeX and PDF sources. We have used our approach to analyze more than 14000 PDF and Latex files in the Software Engineering category of Arxiv, generating a dataset of more than 1400 paper-code implementations and assessing current citation practices on it.


The Impact Of Bug Localization Based On Crash Report Mining: A Developers' Perspective, Marcos Medeiros, Uirá Kulesza, Roberta Coelho, Rodrigo Bonifacio, Christoph Treude, Eiji Adachi Barbosa Apr 2024

The Impact Of Bug Localization Based On Crash Report Mining: A Developers' Perspective, Marcos Medeiros, Uirá Kulesza, Roberta Coelho, Rodrigo Bonifacio, Christoph Treude, Eiji Adachi Barbosa

Research Collection School Of Computing and Information Systems

Developers often use crash reports to understand the root cause of bugs. However, locating the buggy source code snippet from such information is a challenging task, mainly when the log database contains many crash reports. To mitigate this issue, recent research has proposed and evaluated approaches for grouping crash report data and using stack trace information to locate bugs. The effectiveness of such approaches has been evaluated by mainly comparing the candidate buggy code snippets with the actual changed code in bug-fix commits—which happens in the context of retrospective repository mining studies. Therefore, the existing literature still lacks discussing the …


Githubinclusifier: Finding And Fixing Non-Inclusive Language In Github Repositories, Liam Todd, John Grundy, Christoph Treude Apr 2024

Githubinclusifier: Finding And Fixing Non-Inclusive Language In Github Repositories, Liam Todd, John Grundy, Christoph Treude

Research Collection School Of Computing and Information Systems

Non-inclusive language in software artefacts has been recognised as a serious problem. We describe a tool to find and fix non-inclusive language in a variety of GitHub repository artefacts. These include various README files, PDFs, code comments, and code. A wide variety of non-inclusive language including racist, ageist, ableist, violent and others are located and issues created, tagging the artefacts for checking. Suggested fixes can be generated using third-party LLM APIs, and approved changes made to documents, including code refactorings, and committed to the repository. The tool and evaluation data are available from: https://github. com/LiamTodd/github-inclusifier


Unveiling Memorization In Code Models, Zhou Yang, Zhipeng Zhao, Chenyu Wang, Jieke Shi, Dongsun Kim, Donggyun Han, David Lo Apr 2024

Unveiling Memorization In Code Models, Zhou Yang, Zhipeng Zhao, Chenyu Wang, Jieke Shi, Dongsun Kim, Donggyun Han, David Lo

Research Collection School Of Computing and Information Systems

The availability of large-scale datasets, advanced architectures, and powerful computational resources have led to effective code models that automate diverse software engineering activities. The datasets usually consist of billions of lines of code from both open-source and private repositories. A code model memorizes and produces source code verbatim, which potentially contains vulnerabilities, sensitive information, or code with strict licenses, leading to potential security and privacy issues.This paper investigates an important problem: to what extent do code models memorize their training data? We conduct an empirical study to explore memorization in large pre-trained code models. Our study highlights that simply extracting …


Deep Reinforcement Learning For Dynamic Algorithm Selection: A Proof-Of-Principle Study On Differential Evolution, Hongshu Guo, Yining Ma, Zeyuan Ma, Jiacheng Chen, Xinglin Zhang, Zhiguang Cao, Jun Zhang, Yue-Jiao Gong Apr 2024

Deep Reinforcement Learning For Dynamic Algorithm Selection: A Proof-Of-Principle Study On Differential Evolution, Hongshu Guo, Yining Ma, Zeyuan Ma, Jiacheng Chen, Xinglin Zhang, Zhiguang Cao, Jun Zhang, Yue-Jiao Gong

Research Collection School Of Computing and Information Systems

Evolutionary algorithms, such as differential evolution, excel in solving real-parameter optimization challenges. However, the effectiveness of a single algorithm varies across different problem instances, necessitating considerable efforts in algorithm selection or configuration. This article aims to address the limitation by leveraging the complementary strengths of a group of algorithms and dynamically scheduling them throughout the optimization progress for specific problems. We propose a deep reinforcement learning-based dynamic algorithm selection framework to accomplish this task. Our approach models the dynamic algorithm selection a Markov decision process, training an agent in a policy gradient manner to select the most suitable algorithm according …


Discovering Significant Topics From Legal Decisions With Selective Inference, Jerrold Tsin Howe Soh Apr 2024

Discovering Significant Topics From Legal Decisions With Selective Inference, Jerrold Tsin Howe Soh

Research Collection Yong Pung How School Of Law

We propose and evaluate an automated pipeline for discovering significant topics from legal decision texts by passing features synthesized with topic models through penalized regressions and post-selection significance tests. The method identifies case topics significantly correlated with outcomes, topic-word distributions which can be manually interpreted to gain insights about significant topics, and case-topic weights which can be used to identify representative cases for each topic. We demonstrate the method on a new dataset of domain name disputes and a canonical dataset of European Court of Human Rights violation cases. Topic models based on latent semantic analysis as well as language …