Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Computer Sciences (7218)
- Databases and Information Systems (2960)
- Software Engineering (1895)
- Artificial Intelligence and Robotics (1074)
- Numerical Analysis and Scientific Computing (963)
-
- Information Security (937)
- Engineering (795)
- Social and Behavioral Sciences (772)
- Graphics and Human Computer Interfaces (676)
- Business (670)
- Theory and Algorithms (454)
- Computer Engineering (410)
- Operations Research, Systems Engineering and Industrial Engineering (366)
- Programming Languages and Compilers (309)
- Communication (301)
- OS and Networks (300)
- Social Media (244)
- Public Affairs, Public Policy and Public Administration (202)
- Environmental Sciences (168)
- Data Storage Systems (164)
- Medicine and Health Sciences (163)
- Transportation (163)
- International and Area Studies (153)
- Management Information Systems (153)
- Asian Studies (151)
- Education (145)
- Technology and Innovation (126)
- E-Commerce (120)
- Finance and Financial Management (98)
- Keyword
-
- Deep learning (109)
- Machine learning (104)
- Artificial intelligence (76)
- Singapore (76)
- Social media (73)
-
- Data mining (62)
- Cloud computing (57)
- Reinforcement learning (54)
- Optimization (51)
- Security (51)
- Privacy (50)
- Twitter (49)
- Online learning (48)
- Software engineering (47)
- Visualization (46)
- Deep Learning (45)
- Empirical study (45)
- Neural networks (45)
- Task analysis (44)
- Access control (41)
- Feature extraction (41)
- Algorithms (40)
- Blockchain (39)
- Semantics (39)
- Sustainability (39)
- Classification (38)
- Android (35)
- Anomaly detection (35)
- Clustering (35)
- Collaboration (35)
- Publication Year
- Publication
-
- Research Collection School Of Computing and Information Systems (6884)
- Dissertations and Theses Collection (Open Access) (128)
- Research Collection Lee Kong Chian School Of Business (94)
- Research Collection School of Social Sciences (48)
- Research Collection College of Integrative Studies (44)
-
- Perspectives@SMU (38)
- Asian Management Insights (35)
- Research Collection Yong Pung How School Of Law (29)
- Research Collection School Of Accountancy (23)
- Research Collection School Of Economics (22)
- Dissertations and Theses Collection (15)
- SMU Press Releases (12)
- MITB Thought Leadership Series (11)
- Research@SMU: Connecting the Dots (10)
- Research Collection School of Computing and Information Systems (9)
- LARC Research Publications (7)
- Research Collection Library (6)
- Social Space (5)
- SMU Research Data (4)
- Sim Kee Boon Institute for Financial Economics (4)
- AI for Research Week (3)
- Centre for Computational Law (3)
- CMP Research (2)
- Centre for AI & Data Governance (2)
- Research Collection Office of Research (2)
- Library Events (1)
- Oral History Collection (1)
- ROSA Journal Articles and Publications (1)
- Research Collection School of Accountancy (1)
- Research@SMU Infographics (1)
- Publication Type
- File Type
Articles 181 - 210 of 7445
Full-Text Articles in Physical Sciences and Mathematics
Beyond A Joke: Dead Code Elimination Can Delete Live Code, Haoxin Tu, Lingxiao Jiang, Debin Gao, He Jiang
Beyond A Joke: Dead Code Elimination Can Delete Live Code, Haoxin Tu, Lingxiao Jiang, Debin Gao, He Jiang
Research Collection School Of Computing and Information Systems
Dead Code Elimination (DCE) is a fundamental compiler optimization technique that removes dead code (e.g., unreachable or reachable but whose results are unused) in the program to produce smaller or faster executables. However, since compiler optimizations are typically aggressively performed and there are complex relationships/interplay among a vast number of compiler optimizations (including DCE), it is not known whether DCE is indeed correctly performed and will only delete dead code in practice. In this study, we open a new research problem to investigate: can DCE happen to erroneously delete live code? To tackle this problem, we design a new approach …
Impact Of Government Outsourcing Contracts On High-Tech Vendors: An Empirical Study, Yi Dong, Nan Hu, Yonghua Ji, Chenkai Ni, Jing Xie
Impact Of Government Outsourcing Contracts On High-Tech Vendors: An Empirical Study, Yi Dong, Nan Hu, Yonghua Ji, Chenkai Ni, Jing Xie
Research Collection School Of Computing and Information Systems
Outsourcing is an important strategic decision of high-tech firms. However, while the research has extensively studied the implications of outsourcing to high-tech clients, its impact on high-tech vendors remains underexplored. This study empirically estimates the impact of government outsourcing contracts on high-tech vendors. Employing the earnings-return analyses framework, we find that, for high-tech vendors engaged in government outsourcing contracts, the stock market places a higher value on each unit of unexpected earnings compared to other firms. Additionally, this impact becomes stronger for contracts with longer terms, for contracts outsourced by the U.S. government or by countries with better political and …
Exploring The Potential Of Chatgpt In Automated Code Refinement: An Empirical Study, Guo Qi, Junming Cao, Xiaofei Xie, Shangqing Liu, Xiaohong Li, Bihuan Chen, Xin Peng
Exploring The Potential Of Chatgpt In Automated Code Refinement: An Empirical Study, Guo Qi, Junming Cao, Xiaofei Xie, Shangqing Liu, Xiaohong Li, Bihuan Chen, Xin Peng
Research Collection School Of Computing and Information Systems
Code review is an essential activity for ensuring the quality and maintainability of software projects. However, it is a time-consuming and often error-prone task that can significantly impact the development process. Recently, ChatGPT, a cutting-edge language model, has demonstrated impressive performance in various natural language processing tasks, suggesting its potential to automate code review processes. However, it is still unclear how well ChatGPT performs in code review tasks. To fill this gap, in this paper, we conduct the first empirical study to understand the capabilities of ChatGPT in code review tasks, specifically focusing on automated code refinement based on given …
Out Of Sight, Out Of Mind: Better Automatic Vulnerability Repair By Broadening Input Ranges And Sources, Xin Zhou, Kisub Kim, Bowen Xu, Donggyun Han, David Lo
Out Of Sight, Out Of Mind: Better Automatic Vulnerability Repair By Broadening Input Ranges And Sources, Xin Zhou, Kisub Kim, Bowen Xu, Donggyun Han, David Lo
Research Collection School Of Computing and Information Systems
The advances of deep learning (DL) have paved the way for automatic software vulnerability repair approaches, which effectively learn the mapping from the vulnerable code to the fixed code. Nevertheless, existing DL-based vulnerability repair methods face notable limitations: 1) they struggle to handle lengthy vulnerable code, 2) they treat code as natural language texts, neglecting its inherent structure, and 3) they do not tap into the valuable expert knowledge present in the expert system. To address this, we propose VulMaster, a Transformer-based neural network model that excels at generating vulnerability repairs by comprehensively understanding the entire vulnerable code, irrespective of …
Greening Large Language Models Of Code, Jieke Shi, Zhou Yang, Hong Jin Kang, Bowen Xu, Junda He, David Lo
Greening Large Language Models Of Code, Jieke Shi, Zhou Yang, Hong Jin Kang, Bowen Xu, Junda He, David Lo
Research Collection School Of Computing and Information Systems
Large language models of code have shown remarkable effectiveness across various software engineering tasks. Despite the availability of many cloud services built upon these powerful models, there remain several scenarios where developers cannot take full advantage of them, stemming from factors such as restricted or unreliable internet access, institutional privacy policies that prohibit external transmission of code to third-party vendors, and more. Therefore, developing a compact, efficient, and yet energy-saving model for deployment on developers' devices becomes essential.To this aim, we propose Avatar, a novel approach that crafts a deployable model from a large language model of code by optimizing …
Ps3: Precise Patch Presence Test Based On Semantic Symbolic Signature, Qi Zhan, Xing Hu, Zhiyang Li, Xin Xia, David Lo, Shanping Li
Ps3: Precise Patch Presence Test Based On Semantic Symbolic Signature, Qi Zhan, Xing Hu, Zhiyang Li, Xin Xia, David Lo, Shanping Li
Research Collection School Of Computing and Information Systems
During software development, vulnerabilities have posed a significant threat to users. Patches are the most effective way to combat vulnerabilities. In a large-scale software system, testing the presence of a security patch in every affected binary is crucial to ensure system security. Identifying whether a binary has been patched for a known vulnerability is challenging, as there may only be small differences between patched and vulnerable versions. Existing approaches mainly focus on detecting patches that are compiled in the same compiler options. However, it is common for developers to compile programs with very different compiler options in different situations, which …
Coca: Improving And Explaining Graph Neural Network-Based Vulnerability Detection Systems, Sicong Cao, Xiaobing Sun, Xiaoxue Wu, David Lo, Lili Bo, Bin Li, Wei Liu
Coca: Improving And Explaining Graph Neural Network-Based Vulnerability Detection Systems, Sicong Cao, Xiaobing Sun, Xiaoxue Wu, David Lo, Lili Bo, Bin Li, Wei Liu
Research Collection School Of Computing and Information Systems
Recently, Graph Neural Network (GNN)-based vulnerability detection systems have achieved remarkable success. However, the lack of explainability poses a critical challenge to deploy black-box models in security-related domains. For this reason, several approaches have been proposed to explain the decision logic of the detection model by providing a set of crucial statements positively contributing to its predictions. Unfortunately, due to the weakly-robust detection models and suboptimal explanation strategy, they have the danger of revealing spurious correlations and redundancy issue.In this paper, we propose Coca, a general framework aiming to 1) enhance the robustness of existing GNN-based vulnerability detection models to …
Ppt4j: Patch Presence Test For Java Binaries, Zhiyuan Pan, Xing Hu, Xin Xia, Xian Zhan, David Lo, Xiaohu Yang
Ppt4j: Patch Presence Test For Java Binaries, Zhiyuan Pan, Xing Hu, Xin Xia, Xian Zhan, David Lo, Xiaohu Yang
Research Collection School Of Computing and Information Systems
The number of vulnerabilities reported in open source software has increased substantially in recent years. Security patches provide the necessary measures to protect software from attacks and vulnerabilities. In practice, it is difficult to identify whether patches have been integrated into software, especially if we only have binary files. Therefore, the ability to test whether a patch is applied to the target binary, a.k.a. patch presence test, is crucial for practitioners. However, it is challenging to obtain accurate semantic information from patches, which could lead to incorrect results.In this paper, we propose a new patch presence test framework named Ppt4J …
Exploiting Library Vulnerability Via Migration-Based Automated Test Generation, Zirui Chen, Xing Hu, Xin Xia, Yi Gao, Tongtong Xu, David Lo, Xiaohu Yang
Exploiting Library Vulnerability Via Migration-Based Automated Test Generation, Zirui Chen, Xing Hu, Xin Xia, Yi Gao, Tongtong Xu, David Lo, Xiaohu Yang
Research Collection School Of Computing and Information Systems
In software development, developers extensively utilize third-party libraries to avoid implementing existing functionalities. When a new third-party library vulnerability is disclosed, project maintainers need to determine whether their projects are affected by the vulnerability, which requires developers to invest substantial effort in assessment. However, existing tools face a series of issues: static analysis tools produce false alarms, dynamic analysis tools require existing tests and test generation tools have low success rates when facing complex vulnerabilities.Vulnerability exploits, as code snippets provided for reproducing vulnerabilities after disclosure, contain a wealth of vulnerability-related information. This study proposes a new method based on vulnerability …
Curiosity-Driven Testing For Sequential Decision-Making Process, Junda He, Zhou Yang, Jieke Shi, Chengran Yang, Kisub Kim, Bowen Xu, Xin Zhou, David Lo
Curiosity-Driven Testing For Sequential Decision-Making Process, Junda He, Zhou Yang, Jieke Shi, Chengran Yang, Kisub Kim, Bowen Xu, Xin Zhou, David Lo
Research Collection School Of Computing and Information Systems
Sequential decision-making processes (SDPs) are fundamental for complex real-world challenges, such as autonomous driving, robotic control, and traffic management. While recent advances in Deep Learning (DL) have led to mature solutions for solving these complex problems, SDMs remain vulnerable to learning unsafe behaviors, posing significant risks in safety-critical applications. However, developing a testing framework for SDMs that can identify a diverse set of crash-triggering scenarios remains an open challenge. To address this, we propose CureFuzz, a novel curiosity-driven black-box fuzz testing approach for SDMs. CureFuzz proposes a curiosity mechanism that allows a fuzzer to effectively explore novel and diverse scenarios, …
Context-Aware Representation: Jointly Learning Item Features And Selection From Triplets, Rodrigo Alves, Antoine Ledent
Context-Aware Representation: Jointly Learning Item Features And Selection From Triplets, Rodrigo Alves, Antoine Ledent
Research Collection School Of Computing and Information Systems
In areas of machine learning such as cognitive modeling or recommendation, user feedback is usually context-dependent. For instance, a website might provide a user with a set of recommendations and observe which (if any) of the links were clicked by the user. Similarly, there is growing interest in the so-called “odd-one-out” learning setting, where human participants are provided with a basket of items and asked which is the most dissimilar to the others. In both of those cases, the presence of all the items in the basket can influence the final decision. In this article, we consider a classification task …
Test Optimization In Dnn Testing: A Survey, Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Lei Ma, Mike Papadakis, Yves Le Traon
Test Optimization In Dnn Testing: A Survey, Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Lei Ma, Mike Papadakis, Yves Le Traon
Research Collection School Of Computing and Information Systems
This article presents a comprehensive survey on test optimization in deep neural network (DNN) testing. Here, test optimization refers to testing with low data labeling effort. We analyzed 90 papers, including 43 from the software engineering (SE) community, 32 from the machine learning (ML) community, and 15 from other communities. Our study: (i) unifies the problems as well as terminologies associated with low-labeling cost testing, (ii) compares the distinct focal points of SE and ML communities, and (iii) reveals the pitfalls in existing literature. Furthermore, we highlight the research opportunities in this domain.
Large Language Model For Vulnerability Detection: Emerging Results And Future Directions, Xin Zhou, Ting Zhang, David Lo
Large Language Model For Vulnerability Detection: Emerging Results And Future Directions, Xin Zhou, Ting Zhang, David Lo
Research Collection School Of Computing and Information Systems
Previous learning-based vulnerability detection methods relied on either medium-sized pre-trained models or smaller neural networks from scratch. Recent advancements in Large Pre-Trained Language Models (LLMs) have showcased remarkable few-shot learning capabilities in various tasks. However, the effectiveness of LLMs in detecting software vulnerabilities is largely unexplored. This paper aims to bridge this gap by exploring how LLMs perform with various prompts, particularly focusing on two state-of-the-art LLMs: GPT-3.5 and GPT-4. Our experimental results showed that GPT-3.5 achieves competitive performance with the prior state-of-the-art vulnerability detection approach and GPT-4 consistently outperformed the state-of-the-art.
Enhancing Source Code Representations For Deep Learning With Static Analysis, Xueting Guan, Christoph Treude
Enhancing Source Code Representations For Deep Learning With Static Analysis, Xueting Guan, Christoph Treude
Research Collection School Of Computing and Information Systems
Deep learning techniques applied to program analysis tasks such as code classification, summarization, and bug detection have seen widespread interest. Traditional approaches, however, treat programming source code as natural language text, which may neglect significant structural or semantic details. Additionally, most current methods of representing source code focus solely on the code, without considering beneficial additional context. This paper explores the integration of static analysis and additional context such as bug reports and design patterns into source code representations for deep learning models. We use the Abstract Syntax Tree-based Neural Network (ASTNN) method and augment it with additional context information …
Going Viral: Case Studies On The Impact Of Protestware, Youmei Fan, Dong Wang, Supastsara Wattanakriengkrai, Hathaichanok Damrongsiri, Christoph Treude, Hideaki Hata, Raula Gaikovina Kula
Going Viral: Case Studies On The Impact Of Protestware, Youmei Fan, Dong Wang, Supastsara Wattanakriengkrai, Hathaichanok Damrongsiri, Christoph Treude, Hideaki Hata, Raula Gaikovina Kula
Research Collection School Of Computing and Information Systems
Maintainers are now self-sabotaging their work in order to take political or economic stances, a practice referred to as "protestware". In this poster, we present our approach to understand how the discourse about such an attack went viral, how it is received by the community, and whether developers respond to the attack in a timely manner. We study two notable protestware cases, i.e., Colors.js and es5-ext, comparing with discussions of a typical security vulnerability as a baseline, i.e., Ua-parser, and perform a thematic analysis of more than two thousand protest-related posts to extract the different narratives when discussing protestware.
Creative And Correct: Requesting Diverse Code Solutions From Ai, Scott Blyth, Markus Wagner, Christoph Treude
Creative And Correct: Requesting Diverse Code Solutions From Ai, Scott Blyth, Markus Wagner, Christoph Treude
Research Collection School Of Computing and Information Systems
AI foundation models have the capability to produce a wide array of responses to a single prompt, a feature that is highly beneficial in software engineering to generate diverse code solutions. However, this advantage introduces a significant trade-off between diversity and correctness. In software engineering tasks, diversity is key to exploring design spaces and fostering creativity, but the practical value of these solutions is heavily dependent on their correctness. Our study systematically investigates this trade-off using experiments with HumanEval tasks, exploring various parameter settings and prompting strategies. We assess the diversity of code solutions using similarity metrics from the code …
Minimon: Minimizing Android Applications With Intelligent Monitoring-Based Debloating, Jiakun Liu, Zicheng Zhang, Xing Hu, Thung Ferdian, Shahar Maoz, Debin Gao, Eran Toch, Zhipeng Zhao, David Lo
Minimon: Minimizing Android Applications With Intelligent Monitoring-Based Debloating, Jiakun Liu, Zicheng Zhang, Xing Hu, Thung Ferdian, Shahar Maoz, Debin Gao, Eran Toch, Zhipeng Zhao, David Lo
Research Collection School Of Computing and Information Systems
The size of Android applications is getting larger to fulfill the requirements of various users. However, not all the features of the applications are needed and desired by a specific user. The unnecessary and non-desired features can increase the attack surface and consume system resources such as storage and memory. To address this issue, we propose a framework, MiniMon, to debloat unnecessary features from an Android app based on the logs of specific users' interactions with the app.However, rarely used features may not be recorded during the data collection, and users' preferences may change slightly over time. To address these …
Code Search Is All You Need? Improving Code Suggestions With Code Search, Junkai Chen, Xing Hu, Zhenhao Li, Cuiyun Gao, Xin Xia, David Lo
Code Search Is All You Need? Improving Code Suggestions With Code Search, Junkai Chen, Xing Hu, Zhenhao Li, Cuiyun Gao, Xin Xia, David Lo
Research Collection School Of Computing and Information Systems
Modern integrated development environments (IDEs) provide various automated code suggestion techniques (e.g., code completion and code generation) to help developers improve their efficiency. Such techniques may retrieve similar code snippets from the code base or leverage deep learning models to provide code suggestions. However, how to effectively enhance the code suggestions using code retrieval has not been systematically investigated. In this paper, we study and explore a retrieval-augmented framework for code suggestions. Specifically, our framework leverages different retrieval approaches and search strategies to search similar code snippets. Then the retrieved code is used to further enhance the performance of language …
Streamlining Java Programming: Uncovering Well-Formed Idioms With Idiomine, Yanming Yang, Xing Hu, Xin Xia, David Lo, Xiaohu Yang
Streamlining Java Programming: Uncovering Well-Formed Idioms With Idiomine, Yanming Yang, Xing Hu, Xin Xia, David Lo, Xiaohu Yang
Research Collection School Of Computing and Information Systems
Code idioms are commonly used patterns, techniques, or practices that aid in solving particular problems or specific tasks across multiple software projects. They can improve code quality, performance, and maintainability, and also promote program standardization and reuse across projects. However, identifying code idioms is significantly challenging, as existing studies have still suffered from three main limitations. First, it is difficult to recognize idioms that span non-contiguous code lines. Second, identifying idioms with intricate data flow and code structures can be challenging. Moreover, they only extract dataset-specific idioms, so common idioms or well-established code/design patterns that are rarely found in datasets …
Towards Speedy Permission-Based Debloating For Android Apps, Thung Ferdian, Jiakun Liu, Pattarakrit Rattanukul, Shahar Maoz, Eran Toch, Debin Gao, David Lo
Towards Speedy Permission-Based Debloating For Android Apps, Thung Ferdian, Jiakun Liu, Pattarakrit Rattanukul, Shahar Maoz, Eran Toch, Debin Gao, David Lo
Research Collection School Of Computing and Information Systems
Android apps typically include many functionalities that not all users require. These result in software bloat that increases possible attack surface and app size. Common functionalities that users may not require are related to permissions that they intend to disallow in the first place. As these permissions are disallowed, their related code would never be executed and therefore can be safely removed. Existing work has proposed a solution to debloat Android apps according to the disallowed permissions. However, for large and complex applications, the debloating process could take hours, typically due the long time that may be needed to construct …
Environmental, Social, And Governance (Esg) And Artificial Intelligence In Finance: State-Of-The-Art And Research Takeaways, Tristan Lim
Research Collection School Of Computing and Information Systems
The rapidly growing research landscape in finance, encompassing environmental, social, and governance (ESG) topics and associated Artificial Intelligence (AI) applications, presents challenges for both new researchers and seasoned practitioners. This study aims to systematically map the research area, identify knowledge gaps, and examine potential research areas for researchers and practitioners. The investigation focuses on three primary research questions: the main research themes concerning ESG and AI in finance, the evolution of research intensity and interest in these areas, and the application and evolution of AI techniques specifically in research studies within the ESG and AI in finance domain. Eight archetypical …
Editorial: Emerging On-Demand Passenger And Logistics Systems: Modelling, Optimization, And Data Analytics, Jintao Ke, Hai Wang, Neda Masoud, Maximilian Schiffer, Goncalo H. A. Correia
Editorial: Emerging On-Demand Passenger And Logistics Systems: Modelling, Optimization, And Data Analytics, Jintao Ke, Hai Wang, Neda Masoud, Maximilian Schiffer, Goncalo H. A. Correia
Research Collection School Of Computing and Information Systems
The proliferation of smart personal devices and mobile internet access has fueled numerous advancements in on-demand transportation services. These services are facilitated by online digital platforms and range from providing rides to delivering products. Their influence is transforming transportation systems and leaving a mark on changing individual mobility, activity patterns, and consumption behaviors. For instance, on-demand transportation companies such as Uber, Lyft, Grab, and DiDi have become increasingly vital for meeting urban transportation needs by connecting available drivers with passengers in real time. The recent surge in door-to-door food delivery (e.g., Uber Eats, DoorDash, Meituan); grocery delivery (e.g., Amazon Fresh, …
W4-Groups: Modeling The Who, What, When And Where Of Group Behavior Via Mobility Sensing, Akansha Atrey, Camellia Zakaria, Rajesh Krishna Balan, Prashant Shenoy
W4-Groups: Modeling The Who, What, When And Where Of Group Behavior Via Mobility Sensing, Akansha Atrey, Camellia Zakaria, Rajesh Krishna Balan, Prashant Shenoy
Research Collection School Of Computing and Information Systems
Human social interactions occur in group settings of varying sizes and locations, depending on the type of social activity. The ability to distinguish group formations based on their purposes transforms how group detection mechanisms function. Not only should such tools support the effective detection of serendipitous encounters, but they can derive categories of relation types among users. Determining who is involved, what activity is performed, and when and where the activity occurs are critical to understanding group processes in greater depth, including supporting goal-oriented applications (e.g., performance, productivity, and mental health) that require sensing social factors. In this work, we …
Unleashing The Power Of Clippy In Real-World Rust Projects, Chunmiao Li, Yijun Yu, Haitao Wu, Luca Carlig, Shijie Nie, Lingxiao Jiang
Unleashing The Power Of Clippy In Real-World Rust Projects, Chunmiao Li, Yijun Yu, Haitao Wu, Luca Carlig, Shijie Nie, Lingxiao Jiang
Research Collection School Of Computing and Information Systems
The error messages generated by the Rust compiler (rustc) are useful for developers to identify and diagnose suspicious code segments. Complementing the compiler, linters can also play an important role in promoting the adherence to certain coding style conventions and best practices. Prominent linters utilized in the Rust ecosystem include Clippy [1] and Rustfmt [2]. Among them, the Rust community particularly emphasizes on the importance of heeding the warnings provided by Clippy to mitigate common errors and promote the adoption of idiomatic conventions. Clippy provides a set of more than 600 lints in addition to the built-in rustc lints. These …
Bidirectional Paper-Repository Tracing In Software Engineering, Daniel Garijo, Miguel Arroyo, Esteban González Guardia, Christoph Treude, Nicola Tarocco
Bidirectional Paper-Repository Tracing In Software Engineering, Daniel Garijo, Miguel Arroyo, Esteban González Guardia, Christoph Treude, Nicola Tarocco
Research Collection School Of Computing and Information Systems
While computer science papers frequently include their associated code repositories, establishing a clear link between papers and their corresponding implementations may be challenging due to the number of code repositories used in research publications. In this paper we describe a lightweight method for effectively identifying bidirectional links between papers and repositories from both LaTeX and PDF sources. We have used our approach to analyze more than 14000 PDF and Latex files in the Software Engineering category of Arxiv, generating a dataset of more than 1400 paper-code implementations and assessing current citation practices on it.
The Impact Of Bug Localization Based On Crash Report Mining: A Developers' Perspective, Marcos Medeiros, Uirá Kulesza, Roberta Coelho, Rodrigo Bonifacio, Christoph Treude, Eiji Adachi Barbosa
The Impact Of Bug Localization Based On Crash Report Mining: A Developers' Perspective, Marcos Medeiros, Uirá Kulesza, Roberta Coelho, Rodrigo Bonifacio, Christoph Treude, Eiji Adachi Barbosa
Research Collection School Of Computing and Information Systems
Developers often use crash reports to understand the root cause of bugs. However, locating the buggy source code snippet from such information is a challenging task, mainly when the log database contains many crash reports. To mitigate this issue, recent research has proposed and evaluated approaches for grouping crash report data and using stack trace information to locate bugs. The effectiveness of such approaches has been evaluated by mainly comparing the candidate buggy code snippets with the actual changed code in bug-fix commits—which happens in the context of retrospective repository mining studies. Therefore, the existing literature still lacks discussing the …
Githubinclusifier: Finding And Fixing Non-Inclusive Language In Github Repositories, Liam Todd, John Grundy, Christoph Treude
Githubinclusifier: Finding And Fixing Non-Inclusive Language In Github Repositories, Liam Todd, John Grundy, Christoph Treude
Research Collection School Of Computing and Information Systems
Non-inclusive language in software artefacts has been recognised as a serious problem. We describe a tool to find and fix non-inclusive language in a variety of GitHub repository artefacts. These include various README files, PDFs, code comments, and code. A wide variety of non-inclusive language including racist, ageist, ableist, violent and others are located and issues created, tagging the artefacts for checking. Suggested fixes can be generated using third-party LLM APIs, and approved changes made to documents, including code refactorings, and committed to the repository. The tool and evaluation data are available from: https://github. com/LiamTodd/github-inclusifier
Unveiling Memorization In Code Models, Zhou Yang, Zhipeng Zhao, Chenyu Wang, Jieke Shi, Dongsun Kim, Donggyun Han, David Lo
Unveiling Memorization In Code Models, Zhou Yang, Zhipeng Zhao, Chenyu Wang, Jieke Shi, Dongsun Kim, Donggyun Han, David Lo
Research Collection School Of Computing and Information Systems
The availability of large-scale datasets, advanced architectures, and powerful computational resources have led to effective code models that automate diverse software engineering activities. The datasets usually consist of billions of lines of code from both open-source and private repositories. A code model memorizes and produces source code verbatim, which potentially contains vulnerabilities, sensitive information, or code with strict licenses, leading to potential security and privacy issues.This paper investigates an important problem: to what extent do code models memorize their training data? We conduct an empirical study to explore memorization in large pre-trained code models. Our study highlights that simply extracting …
Deep Reinforcement Learning For Dynamic Algorithm Selection: A Proof-Of-Principle Study On Differential Evolution, Hongshu Guo, Yining Ma, Zeyuan Ma, Jiacheng Chen, Xinglin Zhang, Zhiguang Cao, Jun Zhang, Yue-Jiao Gong
Deep Reinforcement Learning For Dynamic Algorithm Selection: A Proof-Of-Principle Study On Differential Evolution, Hongshu Guo, Yining Ma, Zeyuan Ma, Jiacheng Chen, Xinglin Zhang, Zhiguang Cao, Jun Zhang, Yue-Jiao Gong
Research Collection School Of Computing and Information Systems
Evolutionary algorithms, such as differential evolution, excel in solving real-parameter optimization challenges. However, the effectiveness of a single algorithm varies across different problem instances, necessitating considerable efforts in algorithm selection or configuration. This article aims to address the limitation by leveraging the complementary strengths of a group of algorithms and dynamically scheduling them throughout the optimization progress for specific problems. We propose a deep reinforcement learning-based dynamic algorithm selection framework to accomplish this task. Our approach models the dynamic algorithm selection a Markov decision process, training an agent in a policy gradient manner to select the most suitable algorithm according …
Discovering Significant Topics From Legal Decisions With Selective Inference, Jerrold Tsin Howe Soh
Discovering Significant Topics From Legal Decisions With Selective Inference, Jerrold Tsin Howe Soh
Research Collection Yong Pung How School Of Law
We propose and evaluate an automated pipeline for discovering significant topics from legal decision texts by passing features synthesized with topic models through penalized regressions and post-selection significance tests. The method identifies case topics significantly correlated with outcomes, topic-word distributions which can be manually interpreted to gain insights about significant topics, and case-topic weights which can be used to identify representative cases for each topic. We demonstrate the method on a new dataset of domain name disputes and a canonical dataset of European Court of Human Rights violation cases. Topic models based on latent semantic analysis as well as language …