Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Research Collection School Of Computing and Information Systems

Discipline
Keyword
Publication Year
File Type

Articles 1201 - 1230 of 6891

Full-Text Articles in Physical Sciences and Mathematics

Deconfounded Visual Grounding, Jianqiang Huang, Yu Qin, Jiaxin Qi, Qianru Sun, Hanwang Zhang Mar 2022

Deconfounded Visual Grounding, Jianqiang Huang, Yu Qin, Jiaxin Qi, Qianru Sun, Hanwang Zhang

Research Collection School Of Computing and Information Systems

We focus on the confounding bias between language and location in the visual grounding pipeline, where we find that the bias is the major visual reasoning bottleneck. For example, the grounding process is usually a trivial languagelocation association without visual reasoning, e.g., grounding any language query containing sheep to the nearly central regions, due to that most queries about sheep have groundtruth locations at the image center. First, we frame the visual grounding pipeline into a causal graph, which shows the causalities among image, query, target location and underlying confounder. Through the causal graph, we know how to break the …


Generating Music With Emotions, Chunhui Bao, Qianru Sun Mar 2022

Generating Music With Emotions, Chunhui Bao, Qianru Sun

Research Collection School Of Computing and Information Systems

We focus on the music generation conditional on human emotions, specifically the positive and negative emotions. There is no existing large-scale music datasets with the annotation of human emotion labels. It is thus not intuitive how to generate music conditioned on emotion labels. In this paper, we propose an annotation-free method to build a new dataset where each sample is a triplet of lyric, melody and emotion label (without requiring any labours). Specifically, we first train the automated emotion recognition model using the BERT (pre-trained on GoEmotions dataset) on Edmonds Dance dataset. We use it to automatically ‘`label’' the music …


Analyzing Offline Social Engagements: An Empirical Study Of Meetup Events Related To Software Development, Abhishek Sharma, Gede Artha Azriadi Prana, Anamika Sawhney, Nachiappan Nagappan, David Lo Mar 2022

Analyzing Offline Social Engagements: An Empirical Study Of Meetup Events Related To Software Development, Abhishek Sharma, Gede Artha Azriadi Prana, Anamika Sawhney, Nachiappan Nagappan, David Lo

Research Collection School Of Computing and Information Systems

Software developers use a variety of social mediachannels and tools in order to keep themselves up to date,collaborate with other developers, and find projects to contributeto. Meetup is one of such social media used by softwaredevelopers to organize community gatherings. We in this work,investigate the dynamics of Meetup groups and events relatedto software development. Our work is different from previouswork as we focus on the actual event and group data that wascollected using Meetup API.In this work, we performed an empirical study of eventsand groups present on Meetup which are related to softwaredevelopment. First, we identified 6,327 Meetup groups related …


Mwptoolkit: An Open-Source Framework For Deep Learning-Based Math Word Problem Solvers, Yihuai Lan, Lei Wang, Qiyuan Zhang, Yunshi Lan, Bing Tian Dai, Yan Wang, Dongxiang Zhang, Ee-Peng Lim Mar 2022

Mwptoolkit: An Open-Source Framework For Deep Learning-Based Math Word Problem Solvers, Yihuai Lan, Lei Wang, Qiyuan Zhang, Yunshi Lan, Bing Tian Dai, Yan Wang, Dongxiang Zhang, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

While Math Word Problem (MWP) solving has emerged as a popular field of study and made great progress in recent years, most existing methods are benchmarked solely on one or two datasets and implemented with different configurations. In this paper, we introduce the first open-source library for solving MWPs called MWPToolkit, which provides a unified, comprehensive, and extensible framework for the research purpose. Specifically, we deploy 17 deep learning-based MWP solvers and 6 MWP datasets in our toolkit. These MWP solvers are advanced models for MWP solving, covering the categories of Seq2seq, Seq2Tree, Graph2Tree, and Pre-trained Language Models. And these …


Efficient Search Of Live-Coding Screencasts From Online Videos, Chengran Yang, Ferdian Thung, David Lo Mar 2022

Efficient Search Of Live-Coding Screencasts From Online Videos, Chengran Yang, Ferdian Thung, David Lo

Research Collection School Of Computing and Information Systems

Programming videos on the Internet are valuable resources for learning programming skills. To find relevant videos, developers typically search online video platforms (e.g., YouTube) with keywords on topics they wish to learn. Developers often look for live-coding screencasts, in which the videos’ authors perform live coding. Yet, not all programming videos are livecoding screencasts. In this work, we develop a tool named PSFinder to identify live-coding screencasts. PSFinder leverages a classifier to identify whether a video frame contains an IDE window. It uses a sampling strategy to pick a number of frames from an input video, runs the classifer on …


Mask-Guided Deformation Adaptive Network For Human Parsing, Aihua Mao, Yuan Liang, Jianbo Jiao, Yongtuo Liu, Shengfeng He Mar 2022

Mask-Guided Deformation Adaptive Network For Human Parsing, Aihua Mao, Yuan Liang, Jianbo Jiao, Yongtuo Liu, Shengfeng He

Research Collection School Of Computing and Information Systems

Due to the challenges of densely compacted body parts, nonrigid clothing items, and severe overlap in crowd scenes, human parsing needs to focus more on multilevel feature representations compared to general scene parsing tasks. Based on this observation, we propose to introduce the auxiliary task of human mask and edge detection to facilitate human parsing. Different from human parsing, which exploits the discriminative features of each category, human mask and edge detection emphasizes the boundaries of semantic parsing regions and the difference between foreground humans and background clutter, which benefits the parsing predictions of crowd scenes and small human parts. …


Learning Variable Ordering Heuristics For Solving Constraint Satisfaction Problems, Wen Song, Zhiguang Cao, Jie Zhang, Chi Xu, Andrew Lim Mar 2022

Learning Variable Ordering Heuristics For Solving Constraint Satisfaction Problems, Wen Song, Zhiguang Cao, Jie Zhang, Chi Xu, Andrew Lim

Research Collection School Of Computing and Information Systems

Backtracking search algorithms are often used to solve the Constraint Satisfaction Problem (CSP), which is widely applied in various domains such as automated planning and scheduling. The efficiency of backtracking search depends greatly on the variable ordering heuristics. Currently, the most commonly used heuristics are hand-crafted based on expert knowledge. In this paper, we propose a deep reinforcement learning based approach to automatically discover new variable ordering heuristics that are better adapted for a given class of CSP instances, without the need of relying on hand-crafted features and heuristics. We show that directly optimizing the search tree size is not …


Innovative Human Motion Sensing With Earbuds, Dong Ma, Andrea Ferlini, Cecilia Mascolo Mar 2022

Innovative Human Motion Sensing With Earbuds, Dong Ma, Andrea Ferlini, Cecilia Mascolo

Research Collection School Of Computing and Information Systems

Earbuds, ear-worn wearables, have attracted growing attention from both industry and academia. This trend has witnessed manufacturers embedding multiple sensors on earbuds to enrich their functionalities. For example, Apple AirPods, Sony WF-1000XM3, and Bose QuietControl 30, have been equipped with accelerometers for tapping interaction or multiple microphones for noise cancellation. On the other hand, the research community regards earbuds as a powerful personal-scale human sensing and computing platform. By integrating sensors like PPG, barometer, and ultrasonic sensors, researchers have been devising a plethora of earable sensing applications, such as blood pressure monitoring [1], facial expression recognition [2], and authentication [3].


Efficient Certificateless Multi-Copy Integrity Auditing Scheme Supporting Data Dynamics, Lei Zhou, Anmin Fu, Guomin Yang, Huaqun Wang, Yuqing Zhang Mar 2022

Efficient Certificateless Multi-Copy Integrity Auditing Scheme Supporting Data Dynamics, Lei Zhou, Anmin Fu, Guomin Yang, Huaqun Wang, Yuqing Zhang

Research Collection School Of Computing and Information Systems

To improve data availability and durability, cloud users would like to store multiple copies of their original files at servers. The multi-copy auditing technique is proposed to provide users with the assurance that multiple copies are actually stored in the cloud. However, most multi-replica solutions rely on Public Key Infrastructure (PKI), which entails massive overhead of certificate computation and management. In this article, we propose an efficient multi-copy dynamic integrity auditing scheme by employing certificateless signatures (named MDSS), which gets rid of expensive certificate management overhead and avoids the key escrow problem in identity-based signatures. Specifically, we improve the classic …


Mwptoolkit: An Open-Source Framework For Deep Learning-Based Math Word Problem Solvers, Yihuai Lan, Lei Wang, Qiyuan Zhang, Yunshi Lan, Bing Tian Dai, Yan Wang, Dongxiang Zhang, Ee-Peng Lim Mar 2022

Mwptoolkit: An Open-Source Framework For Deep Learning-Based Math Word Problem Solvers, Yihuai Lan, Lei Wang, Qiyuan Zhang, Yunshi Lan, Bing Tian Dai, Yan Wang, Dongxiang Zhang, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

Duplicate record, see https://ink.library.smu.edu.sg/sis_research/7680/. Developing automatic Math Word Problem (MWP) solvers has been an interest of NLP researchers since the 1960s. Over the last few years, there are a growing number of datasets and deep learning-based methods proposed for effectively solving MWPs. However, most existing methods are benchmarked solely on one or two datasets, varying in different configurations, which leads to a lack of unified, standardized, fair, and comprehensive comparison between methods. This paper presents MWPToolkit, the first open-source framework for solving MWPs. In MWPToolkit, we decompose the procedure of existing MWP solvers into multiple core components and decouple …


Debiasing Nlu Models Via Causal Intervention And Counterfactual Reasoning, Bing Tian, Yixin Cao, Yong Zhang, Chunxiao Xing Mar 2022

Debiasing Nlu Models Via Causal Intervention And Counterfactual Reasoning, Bing Tian, Yixin Cao, Yong Zhang, Chunxiao Xing

Research Collection School Of Computing and Information Systems

Recent studies have shown that strong Natural Language Understanding (NLU) models are prone to relying on annotation biases of the datasets as a shortcut, which goes against the underlying mechanisms of the task of interest. To reduce such biases, several recent works introduce debiasing methods to regularize the training process of targeted NLU models. In this paper, we provide a new perspective with causal inference to fnd out the bias. On the one hand, we show that there is an unobserved confounder for the natural language utterances and their respective classes, leading to spurious correlations from training data. To remove …


Coordinated Delivery To Shopping Malls With Limited Docking Capacity, Ruidian Song, Hoong Chuin Lau, Xue Luo, Lei Zhao Mar 2022

Coordinated Delivery To Shopping Malls With Limited Docking Capacity, Ruidian Song, Hoong Chuin Lau, Xue Luo, Lei Zhao

Research Collection School Of Computing and Information Systems

Shopping malls are densely located in major cities such as Singapore and Hong Kong. Tenants in these shopping malls generate a large number of freight orders to their contracted logistics service providers, who independently plan their own delivery schedules. These uncoordinated deliveries and limited docking capacity jointly cause congestion at the shopping malls. A delivery coordination platform centrally plans the vehicle routes for the logistics service providers and simultaneously schedules the dock time slots at the shopping malls for the delivery orders. Vehicle routing and dock scheduling decisions need to be made jointly against the backdrop of travel time and …


Crowdtc: Crowd-Powered Learning For Text Classification, Keyu Yang, Yunjun Gao, Lei Liang, Song Bian, Lu Chen, Baihua Zheng Feb 2022

Crowdtc: Crowd-Powered Learning For Text Classification, Keyu Yang, Yunjun Gao, Lei Liang, Song Bian, Lu Chen, Baihua Zheng

Research Collection School Of Computing and Information Systems

Text classification is a fundamental task in content analysis. Nowadays, deep learning has demonstrated promising performance in text classification compared with shallow models. However, almost all the existing models do not take advantage of the wisdom of human beings to help text classification. Human beings are more intelligent and capable than machine learning models in terms of understanding and capturing the implicit semantic information from text. In this article, we try to take guidance from human beings to classify text. We propose Crowd-powered learning for Text Classification (CrowdTC for short). We design and post the questions on a crowdsourcing platform …


Broken External Links On Stack Overflow, Jiakun Liu, Xin Xia, David Lo, Haoxiang Zhang, Ying Zou, Ahmed E. Hassan, Shanping Li Feb 2022

Broken External Links On Stack Overflow, Jiakun Liu, Xin Xia, David Lo, Haoxiang Zhang, Ying Zou, Ahmed E. Hassan, Shanping Li

Research Collection School Of Computing and Information Systems

Stack Overflow hosts valuable programming-related knowledge with 11,926,354 links that reference to the third-party websites. The links that reference to the resources hosted outside the Stack Overflow websites extend the Stack Overflow knowledge base substantially. However, with the rapid development of programming-related knowledge, many resources hosted on the Internet are not available anymore. Based on our analysis of the Stack Overflow data that was released on Jun. 2, 2019, 14.2 percent of the links on Stack Overflow are broken links. The broken links on Stack Overflow can obstruct viewers from obtaining desired programming-related knowledge, and potentially damage the reputation of …


Field Study In Deploying Restless Multi-Armed Bandits: Assisting Non-Profits In Improving Maternal And Child Health, Aditya Mate, Lovish Madan, Aparna Taneja, Neha Madhiwalla, Shresth Verma, Gargi Singh, Aparna Hegde, Pradeep Varakantham, Milind Tambe Feb 2022

Field Study In Deploying Restless Multi-Armed Bandits: Assisting Non-Profits In Improving Maternal And Child Health, Aditya Mate, Lovish Madan, Aparna Taneja, Neha Madhiwalla, Shresth Verma, Gargi Singh, Aparna Hegde, Pradeep Varakantham, Milind Tambe

Research Collection School Of Computing and Information Systems

The widespread availability of cell phones has enabled non-profits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work to assist non-profits that employ automated messaging programs to deliver timely preventive care information to beneficiaries (new and expecting mothers) during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries drop out of the program. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing …


Emerging App Issue Identification Via Online Joint Sentiment-Topic Tracing, Cuiyun Gao, Jichuan Zeng, Zhiyuan Wen, David Lo, Xin Xia, Irwin King, Michael R. Lyu Feb 2022

Emerging App Issue Identification Via Online Joint Sentiment-Topic Tracing, Cuiyun Gao, Jichuan Zeng, Zhiyuan Wen, David Lo, Xin Xia, Irwin King, Michael R. Lyu

Research Collection School Of Computing and Information Systems

Millions of mobile apps are available in app stores, such as Apple’s App Store and Google Play. For a mobile app, it would be increasingly challenging to stand out from the enormous competitors and become prevalent among users. Good user experience and well-designed functionalities are the keys to a successful app. To achieve this, popular apps usually schedule their updates frequently. If we can capture the critical app issues faced by users in a timely and accurate manner, developers can make timely updates, and good user experience can be ensured. There exist prior studies on analyzing reviews for detecting emerging …


Modeling Functional Similarity In Source Code With Graph-Based Siamese Networks, Nikita Mehrotra, Navdha Agarwal, Piyush Gupta, Saket Anand, David Lo, Rahul Purandare Feb 2022

Modeling Functional Similarity In Source Code With Graph-Based Siamese Networks, Nikita Mehrotra, Navdha Agarwal, Piyush Gupta, Saket Anand, David Lo, Rahul Purandare

Research Collection School Of Computing and Information Systems

Code clones are duplicate code fragments that share (nearly) similar syntax or semantics. Code clone detection plays an important role in software maintenance, code refactoring, and reuse. A substantial amount of research has been conducted in the past to detect clones. A majority of these approaches use lexical and syntactic information to detect clones. However, only a few of them target semantic clones. Recently, motivated by the success of deep learning models in other fields, including natural language processing and computer vision, researchers have attempted to adopt deep learning techniques to detect code clones. These approaches use lexical information (tokens) …


Defectchecker: Automated Smart Contract Defect Detection By Analyzing Evm Bytecode, Jiachi Chen, Xin Xia, David Lo, John Grundy, Xiapu Luo, Ting Chen Feb 2022

Defectchecker: Automated Smart Contract Defect Detection By Analyzing Evm Bytecode, Jiachi Chen, Xin Xia, David Lo, John Grundy, Xiapu Luo, Ting Chen

Research Collection School Of Computing and Information Systems

Smart contracts are Turing-complete programs running on the blockchain. They are immutable and cannot be modified, even when bugs are detected. Therefore, ensuring smart contracts are bug-free and well-designed before deploying them to the blockchain is extremely important. A contract defect is an error, flaw or fault in a smart contract that causes it to produce an incorrect or unexpected result, or to behave in unintended ways. Detecting and removing contract defects can avoid potential bugs and make programs more robust. Our previous work defined 20 contract defects for smart contracts and divided them into five impact levels. According to …


Seqseg: A Sequential Method To Achieve Nasopharyngeal Carcinoma Segmentation Free From Background Dominance, Guihua Tao, Haojiang Li, Jiabin Huang, Chu Han, Jiazhou Chen, Guangying Ruan, Wenjie Huang, Yu Hu, Tingting Dan, Bin Zhang, Shengfeng He Feb 2022

Seqseg: A Sequential Method To Achieve Nasopharyngeal Carcinoma Segmentation Free From Background Dominance, Guihua Tao, Haojiang Li, Jiabin Huang, Chu Han, Jiazhou Chen, Guangying Ruan, Wenjie Huang, Yu Hu, Tingting Dan, Bin Zhang, Shengfeng He

Research Collection School Of Computing and Information Systems

Reliable nasopharyngeal carcinoma (NPC) segmentation plays an important role in radiotherapy planning. However, recent deep learning methods fail to achieve satisfactory NPC segmentation in magnetic resonance (MR) images, since NPC is infiltrative and typically has a small or even tiny volume with indistinguishable border, making it indiscernible from tightly connected surrounding tissues from immense and complex backgrounds. To address such background dominance problems, this paper proposes a sequential method (SeqSeg) to achieve accurate NPC segmentation. Specifically, the proposed SeqSeg is devoted to solving the problem at two scales: the instance level and feature level. At the instance level, SeqSeg is …


Knowledge Graph Embedding By Normalizing Flows, Changyi Xiao, Xiangnan He, Yixin Cao Feb 2022

Knowledge Graph Embedding By Normalizing Flows, Changyi Xiao, Xiangnan He, Yixin Cao

Research Collection School Of Computing and Information Systems

A key to knowledge graph embedding (KGE) is to choose a proper representation space, e.g., point-wise Euclidean space and complex vector space. In this paper, we propose a unified perspective of embedding and introduce uncertainty into KGE from the view of group theory. Our model can incorporate existing models (i.e., generality), ensure the computation is tractable (i.e., efficiency) and enjoy the expressive power of complex random variables (i.e., expressiveness). The core idea is that we embed entities/relations as elements of a symmetric group, i.e., permutations of a set. Permutations of different sets can reflect different properties of embedding. And the …


Multiscale Generative Models: Improving Performance Of A Generative Model Using Feedback From Other Dependent Generative Models, Changyu Chen, Avinandan Bose, Shih-Fen Cheng, Arunesh Sinha Feb 2022

Multiscale Generative Models: Improving Performance Of A Generative Model Using Feedback From Other Dependent Generative Models, Changyu Chen, Avinandan Bose, Shih-Fen Cheng, Arunesh Sinha

Research Collection School Of Computing and Information Systems

Realistic fine-grained multi-agent simulation of real-world complex systems is crucial for many downstream tasks such as reinforcement learning. Recent work has used generative models (GANs in particular) for providing high-fidelity simulation of real-world systems. However, such generative models are often monolithic and miss out on modeling the interaction in multi-agent systems. In this work, we take a first step towards building multiple interacting generative models (GANs) that reflects the interaction in real world. We build and analyze a hierarchical set-up where a higher-level GAN is conditioned on the output of multiple lower-level GANs. We present a technique of using feedback …


Choices Are Not Independent: Stackelberg Security Games With Nested Quantal Response Models, Tien Mai, Arunesh Sinha Feb 2022

Choices Are Not Independent: Stackelberg Security Games With Nested Quantal Response Models, Tien Mai, Arunesh Sinha

Research Collection School Of Computing and Information Systems

The quantal response (QR) model is widely used in Stackelberg security games (SSG) to model a bounded rational adversary. The QR model is a model of human response from among a large variety of prominent models known as discrete choice models. QR is the simplest type of discrete choice models and does not capture commonly observed phenomenon such as correlation among choices. We introduce the nested QR adversary model (based on nested logit model in discrete choice theory) in SSG which addresses shortcoming of the QR model. We present tractable approximation of the resulting equilibrium problem with nested QR adversary. …


Scriptchecker: To Tame Third-Party Script Execution With Task Capabilities, Wu Luo, Xuhua Ding, Pengfei Wu, Xiaolei Zhang, Qingni Shen, Zhonghai Wu Feb 2022

Scriptchecker: To Tame Third-Party Script Execution With Task Capabilities, Wu Luo, Xuhua Ding, Pengfei Wu, Xiaolei Zhang, Qingni Shen, Zhonghai Wu

Research Collection School Of Computing and Information Systems

We present ScriptChecker, a novel browser-based framework to effectively and efficiently restrict third-party script execution according to the host web page’s directives. Different from all existing schemes functioning at the JavaScript layer, ScriptChecker holistically harnesses context separation and the browser’s security monitors to enforce on-demand access controls upon tasks executing untrusted code. The host page can flexibly assign resource-access capabilities to tasks upon their creation. Reaping the benefits of the task capability approach, ScriptChecker outperforms existing techniques in security, usability and performance. We have implemented a prototype of ScriptChecker on Chrome and rigorously evaluated its security against 1373 malicious scripts …


Deep Graph-Level Anomaly Detection By Glocal Knowledge Distillation, Rongrong Ma, Guansong Pang, Ling Chen, Anton Van Den Hengel Feb 2022

Deep Graph-Level Anomaly Detection By Glocal Knowledge Distillation, Rongrong Ma, Guansong Pang, Ling Chen, Anton Van Den Hengel

Research Collection School Of Computing and Information Systems

Graph-level anomaly detection (GAD) describes the problem of detecting graphs that are abnormal in their structure and/or the features of their nodes, as compared to other graphs. One of the challenges in GAD is to devise graph representations that enable the detection of both locally- and globally-anomalous graphs, i.e., graphs that are abnormal in their fine-grained (node-level) or holistic (graph-level) properties, respectively. To tackle this challenge we introduce a novel deep anomaly detection approach for GAD that learns rich global and local normal pattern information by joint random distillation of graph and node representations. The random distillation is achieved by …


Understanding In-App Advertising Issues Based On Large Scale App Review Analysis, Cuiyun Gao, Jichuan Zeng, David Lo, Xin Xia, Irwin King, Michael R. Lyu Feb 2022

Understanding In-App Advertising Issues Based On Large Scale App Review Analysis, Cuiyun Gao, Jichuan Zeng, David Lo, Xin Xia, Irwin King, Michael R. Lyu

Research Collection School Of Computing and Information Systems

Context: In-app advertising closely relates to app revenue. Reckless ad integration could adversely impact app quality and user experience, leading to loss of income. It is very challenging to balance the ad revenue and user experience for app developers. Objective: Towards tackling the challenge, we conduct a study on analyzing user concerns about in-app advertisement. Method: Specifically, we present a large-scale analysis on ad-related user feedback. The large user feedback data from App Store and Google Play allow us to summarize ad-related app issues comprehensively and thus provide practical ad integration strategies for developers. We first define common ad issues …


Collaborative Curating For Discovery And Expansion Of Visual Clusters, Duy Dung Le, Hady W. Lauw Feb 2022

Collaborative Curating For Discovery And Expansion Of Visual Clusters, Duy Dung Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

In many visually-oriented applications, users can select and group images that they find interesting into coherent clusters. For instance, we encounter these in the form of hashtags on Instagram, galleries on Flickr, or boards on Pinterest. The selection and coherence of such user-curated visual clusters arise from a user’s preference for a certain type of content as well as her own perception of which images are similar and thus belong to a cluster. We seek to model such curation behaviors towards supporting users in their future activities such as expanding existing clusters or discovering new clusters altogether. This paper proposes …


Including Everyone, Everywhere: Understanding Opportunities And Challenges Of Geographic Gender-Inclusion In Oss, Gede Artha Azriadi Prana, Denae Ford, Ayushi Rastogi, David Lo, Rahul Purandare, Nachiappan Nagappan Feb 2022

Including Everyone, Everywhere: Understanding Opportunities And Challenges Of Geographic Gender-Inclusion In Oss, Gede Artha Azriadi Prana, Denae Ford, Ayushi Rastogi, David Lo, Rahul Purandare, Nachiappan Nagappan

Research Collection School Of Computing and Information Systems

The gender gap is a significant concern facing the software industry as the development becomes more geographically distributed. Widely shared reports indicate that gender differences may be specific to each region. However, how complete can these reports be with little to no research reflective of the Open Source Software (OSS) process and communities software is now commonly developed in? Our study presents a multi-region geographical analysis of gender inclusion on GitHub. This mixed-methods approach includes quantitatively investigating differences in gender inclusion in projects across geographic regions and investigate these trends over time using data from contributions to 21,456 project repositories. …


Active Learning Of Discriminative Subgraph Patterns For Api Misuse Detection, Hong Jin Kang, David Lo Feb 2022

Active Learning Of Discriminative Subgraph Patterns For Api Misuse Detection, Hong Jin Kang, David Lo

Research Collection School Of Computing and Information Systems

A common cause of bugs and vulnerabilities are the violations of usage constraints associated with Application Programming Interfaces (APIs). API misuses are common in software projects, and while there have been techniques proposed to detect such misuses, studies have shown that they fail to reliably detect misuses while reporting many false positives. One limitation of prior work is the inability to reliably identify correct patterns of usage. Many approaches confuse a usage pattern’s frequency for correctness. Due to the variety of alternative usage patterns that may be uncommon but correct, anomaly detection-based techniques have limited success in identifying misuses. We …


A Deep Dive Into The Impact Of Covid-19 On Software Development, Paulo Anselmo Da Mota Silveira Neto, Umme Ayda Mannan, Eduardo Santana De Almeida, Nachiappan Nagappan, David Lo, Pavneet Singh Kochhar, Cuiyun Gao, Iftekhar Ahmed Feb 2022

A Deep Dive Into The Impact Of Covid-19 On Software Development, Paulo Anselmo Da Mota Silveira Neto, Umme Ayda Mannan, Eduardo Santana De Almeida, Nachiappan Nagappan, David Lo, Pavneet Singh Kochhar, Cuiyun Gao, Iftekhar Ahmed

Research Collection School Of Computing and Information Systems

The COVID-19 pandemic is considered as the most crucial global health calamity of the century. It has impacted different business sectors around the world and software development is not an exception. This study investigates the impact of COVID-19 on software projects and software development professionals. We conducted a mining software repository study based on 100 GitHub projects developed in Java using ten different metrics. Next, we surveyed 279 software development professionals for better understanding the impact of COVID-19 on daily activities and wellbeing. We identified 12 observations related to productivity, code quality, and wellbeing. Our findings highlight that the impact …


Post2vec: Learning Distributed Representations Of Stack Overflow Posts, Bowen Xu, Thong Hoang, Abhishek Sharma, Chengran Yang, Xin Xia, David Lo Feb 2022

Post2vec: Learning Distributed Representations Of Stack Overflow Posts, Bowen Xu, Thong Hoang, Abhishek Sharma, Chengran Yang, Xin Xia, David Lo

Research Collection School Of Computing and Information Systems

Past studies have proposed solutions that analyze Stack Overflow content to help users find desired information or aid various downstream software engineering tasks. A common step performed by those solutions is to extract suitable representations of posts; typically, in the form of meaningful vectors. These vectors are then used for different tasks, for example, tag recommendation, relatedness prediction, post classification, and API recommendation. Intuitively, the quality of the vector representations of posts determines the effectiveness of the solutions in performing the respective tasks. In this work, to aid existing studies that analyze Stack Overflow posts, we propose a specialized deep …