Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

2011

Discipline
Institution
Keyword
Publication
Publication Type
File Type

Articles 841 - 870 of 10325

Full-Text Articles in Physical Sciences and Mathematics

Smartic: Specification Mining Architecture With Trace Filtering And Clustering, David Lo, Siau-Cheng Khoo Nov 2011

Smartic: Specification Mining Architecture With Trace Filtering And Clustering, David Lo, Siau-Cheng Khoo

David LO

Improper management of software evolution, compounded by imprecise, and changing requirements, along with the "short time to market" requirement, commonly leads to a lack of up-to-date specifications. This can result in software that is characterized by bugs, anomalies and even security threats. Software specification mining is a new technique to address this concern by inferring specifications automatically. In this paper, we propose a novel API specification mining architecture called SMArTIC Specification Mining Architecture with Trace fIltering and Clustering) to improve the accuracy, robustness and scalability of specification miners. This architecture is constructed based on two hypotheses: (1) Erroneous traces should …


Mining Software Specifications, David Lo, Siau-Cheng Khoo Nov 2011

Mining Software Specifications, David Lo, Siau-Cheng Khoo

David LO

No abstract provided.


Model Checking In The Absence Of Code, Model And Properties, David Lo, Siau-Cheng Khoo Nov 2011

Model Checking In The Absence Of Code, Model And Properties, David Lo, Siau-Cheng Khoo

David LO

Model checking is a major approach in ensuring software correctness. It verifies a model converted from code against some formal properties. However, difficulties and programmers ’ reluctance to formalize formal properties have been some hurdles to its widespread industrial adoption. Also, with the advent of commercial off-the-shelf (COTS) components provided by third party vendors, model checking is further challenged as often only a binary version of the code is provided by vendors. Interestingly, latest instrumentation tools like PIN and Valgrind have enable execution traces to be collected dynamically from a running program. In this preliminary study, we investigate what can …


Matching Dependence-Related Queries In The System Dependence Graph., Xiaoyin Wang, David Lo, Jiefeng Cheng, Lu Zhang, Hong Mei, Jeffrey Xu Yu Nov 2011

Matching Dependence-Related Queries In The System Dependence Graph., Xiaoyin Wang, David Lo, Jiefeng Cheng, Lu Zhang, Hong Mei, Jeffrey Xu Yu

David LO

In software maintenance and evolution, it is common that developers want to apply a change to a number of similar places. Due to the size and complexity of the code base, it is challenging for developers to locate all the places that need the change. A main challenge in locating the places that need the change is that, these places share certain common dependence conditions but existing code searching techniques can hardly handle dependence relations satisfactorily. In this paper, we propose a technique that enables developers to make queries involving dependence conditions and textual conditions on the system dependence graph …


Mining Iterative Generators And Representative Rules For Software Specification Discovery, David Lo, Jinyan Li, Limsoon Wong, Siau-Cheng Khoo Nov 2011

Mining Iterative Generators And Representative Rules For Software Specification Discovery, David Lo, Jinyan Li, Limsoon Wong, Siau-Cheng Khoo

David LO

Billions of dollars are spent annually on software-related cost. It is estimated that up to 45 percent of software cost is due to the difficulty in understanding existing systems when performing maintenance tasks (i.e., adding features, removing bugs, etc.). One of the root causes is that software products often come with poor, incomplete, or even without any documented specifications. In an effort to improve program understanding, Lo et al. have proposed iterative pattern mining which outputs patterns that are repeated frequently within a program trace, or across multiple traces, or both. Frequent iterative patterns reflect frequent program behaviors that likely …


Mining Past-Time Temporal Rules: A Dynamic Analysis Approach, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Mining Past-Time Temporal Rules: A Dynamic Analysis Approach, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

No abstract provided.


Mining Antagonistic Communities From Social Networks, Kuan Zhang, David Lo, Ee Peng Lim Nov 2011

Mining Antagonistic Communities From Social Networks, Kuan Zhang, David Lo, Ee Peng Lim

David LO

During social interactions in a community, there are often sub-communities that behave in opposite manner. These antagonistic sub-communities could represent groups of people with opposite tastes, factions within a community distrusting one another, etc. Taking as input a set of interactions within a community, we develop a novel pattern mining approach that extracts for a set of antagonistic sub-communities. In particular, based on a set of user specified thresholds, we extract a set of pairs of sub-communities that behave in opposite ways with one another. To prevent a blow up in these set of pairs, we focus on extracting a …


Automatic Steering Of Behavioral Model Inference, David Lo, Leonardo Mariani, Mauro Pezze Nov 2011

Automatic Steering Of Behavioral Model Inference, David Lo, Leonardo Mariani, Mauro Pezze

David LO

Many testing and analysis techniques use finite state models to validate and verify the quality of software systems. Since the specification of such models is complex and time-consuming, researchers defined several techniques to extract finite state models from code and traces. Automatically generating models requires much less effort than designing them, and thus eases the verification and validation of large software systems. However, when models are inferred automatically, the precision of the mining process is critical. Behavioral models mined with imprecise processes can include many spurious behaviors, and can thus compromise the results of testing and analysis techniques that use …


Extracting Paraphrases Of Technical Terms From Noisy Parallel Software Corpus, Xiaoyin Wang, David Lo, Jing Jiang, Lu Zhang, Hong Mei Nov 2011

Extracting Paraphrases Of Technical Terms From Noisy Parallel Software Corpus, Xiaoyin Wang, David Lo, Jing Jiang, Lu Zhang, Hong Mei

David LO

In this paper, we study the problem of extracting technical paraphrases from a parallel software corpus, namely, a collection of duplicate bug reports. Paraphrase acquisition is a fundamental task in the emerging area of text mining for software engineering. Existing paraphrase extraction methods are not entirely suitable here due to the noisy nature of bug reports. We propose a number of techniques to address the noisy data problem. The empirical evaluation shows that our method significantly improves an existing method by upto 58%


Efficient Mining Of Recurrent Rules From A Sequence Database, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Efficient Mining Of Recurrent Rules From A Sequence Database, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

We study a novel problem of mining significant recurrent rules from a sequence database. Recurrent rules have the form "whenever a series of precedent events occurs, eventually a series of consequent events occurs". Recurrent rules are intuitive and characterize behaviors in many domains. An example is in the domain of software specifications, in which the rules capture a family of program properties beneficial to program verification and bug detection. Recurrent rules generalize existing work on sequential and episode rules by considering repeated occurrences of premise and consequent events within a sequence and across multiple sequences, and by removing the "window" …


Efficient Topological Olap On Information Networks, Qiang Qu, Feida Zhu, Xifeng Yan, Jiawei Han, Philip Yu, Hongyan Li Nov 2011

Efficient Topological Olap On Information Networks, Qiang Qu, Feida Zhu, Xifeng Yan, Jiawei Han, Philip Yu, Hongyan Li

David LO

We propose a framework for efficient OLAP on information networks with a focus on the most interesting kind, the topological OLAP (called “T-OLAP”), which incurs topological changes in the underlying networks. T-OLAP operations generate new networks from the original ones by rolling up a subset of nodes chosen by certain constraint criteria. The key challenge is to efficiently compute measures for the newly generated networks and handle user queries with varied constraints. Two effective computational techniques, T-Distributiveness and T-Monotonicity are proposed to achieve efficient query processing and cube materialization. We also provide a T-OLAP query processing framework into which these …


Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi Nov 2011

Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi

David LO

In statistics and data mining communities, there have been many measures proposed to gauge the strength of association between two variables of interest, such as odds ratio, confidence, Yule-Y, Yule-Q, Kappa, and gini index. These association measures have been used in various domains, for example, to evaluate whether a particular medical practice is associated positively to a cure of a disease or whether a particular marketing strategy is associated positively to an increase in revenue, etc. This paper models the problem of locating faults as association between the execution or non-execution of particular program elements with failures. There have been …


Mining Message Sequence Graphs, Sandeep Kumar, Siau-Cheng Khoo, Abhik Roychoudhury, David Lo Nov 2011

Mining Message Sequence Graphs, Sandeep Kumar, Siau-Cheng Khoo, Abhik Roychoudhury, David Lo

David LO

Dynamic specification mining involves discovering software behavior from traces for the purpose of program comprehension and bug detection. However, in concurrent/distributed programs, the inherent partial order relationships among events occurring across processes pose a big challenge to specification mining. In this paper, we propose a framework for mining partial orders so as to understand concurrent program behavior. Our miner takes in a set of concurrent program traces, and produces a message sequence graph (MSG) to represent the concurrent program behavior. An MSG represents a graph where the nodes of the graph are partial orders, represented as Message Sequence Charts. Mining …


Mining Modal Scenarios From Execution Traces, David Lo, Shahar Maoz, Siau-Cheng Khoo Nov 2011

Mining Modal Scenarios From Execution Traces, David Lo, Shahar Maoz, Siau-Cheng Khoo

David LO

Specification mining is a dynamic analysis process aimed at automatically inferring suggested specifications of a program from its execution traces. We describe a method, a framework, and a tool, for mining inter-object scenario-based specifications in the form of a UML2-compliant variant of Damm and Harel's Live Sequence Charts (LSC), which extends the classical partial order semantics of sequence diagrams with temporal liveness and symbolic class level lifelines, in order to generate compact and expressive specifications. Moreover, we use previous research work and tools developed for LSC to visualize, analyze, manipulate, test, and thus evaluate the scenario-based specifications we mine. Our …


Hierarchical Inter-Object Traces For Specification Mining, David Lo, Shahar Maoz Nov 2011

Hierarchical Inter-Object Traces For Specification Mining, David Lo, Shahar Maoz

David LO

Major challenges of dynamic analysis approaches to specification mining include scalability over long traces as well as comprehensibility and expressivity of results. We present a novel use of object hierarchies over inter-object traces as an abstraction/refinement mechanism enabling scalable, incremental, top-down mining of scenario-based specifications.


Towards Better Quality Specification Miners, David Lo, Siau-Cheng Khoo Nov 2011

Towards Better Quality Specification Miners, David Lo, Siau-Cheng Khoo

David LO

Softwares are often built without specification. Tools to automatically extract specification from software are needed and many techniques have been proposed. One type of these specifications – temporal API specification – is often specified in the form of automaton (i.e., FSA/PFSA). There have been many work on mining software temporal specification using dynamic analysis techniques; i.e., analysis of software program traces. Unfortunately, the issues of scalability, robustness and accuracy of these techniques have not been comprehensively addressed. In this paper, we describe a framework that enables assessments of the performance of a specification miner in generating temporal specification of software …


Efficient Mining Of Closed Repetitive Gapped Subsequences From A Sequence Database, Bolin Ding, David Lo, Jiawei Han, Siau-Cheng Khoo Nov 2011

Efficient Mining Of Closed Repetitive Gapped Subsequences From A Sequence Database, Bolin Ding, David Lo, Jiawei Han, Siau-Cheng Khoo

David LO

There is a huge wealth of sequence data available, for example, customer purchase histories, program execution traces, DNA, and protein sequences. Analyzing this wealth of data to mine important knowledge is certainly a worthwhile goal. In this paper, as a step forward to analyzing patterns in sequences, we introduce the problem of mining closed repetitive gapped subsequences and propose efficient solutions. Given a database of sequences where each sequence is an ordered list of events, the pattern we would like to mine is called repetitive gapped subsequence, which is a subsequence (possibly with gaps between two successive events within it) …


Mining Interesting Link Formation Rules In Social Networks, Cane Wing-Ki Leung, Ee Peng Lim, David Lo, Jianshu Weng Nov 2011

Mining Interesting Link Formation Rules In Social Networks, Cane Wing-Ki Leung, Ee Peng Lim, David Lo, Jianshu Weng

David LO

Link structures are important patterns one looks out for when modeling and analyzing social networks. In this paper, we propose the task of mining interesting Link Formation rules (LF-rules) containing link structures known as Link Formation patterns (LF-patterns). LF-patterns capture various dyadic and/or triadic structures among groups of nodes, while LF-rules capture the formation of a new link from a focal node to another node as a postcondition of existing connections between the two nodes. We devise a novel LF-rule mining algorithm, known as LFR-Miner, based on frequent subgraph mining for our task. In addition to using a support-confidence framework …


Mining Hierarchical Scenario-Based Specifications, David Lo, Shahar Maoz Nov 2011

Mining Hierarchical Scenario-Based Specifications, David Lo, Shahar Maoz

David LO

Scalability over long traces, as well as comprehensibility and expressivity of results, are major challenges for dynamic analysis approaches to specification mining. In this work we present a novel use of object hierarchies over traces of inter-object method calls, as an abstraction/refinement mechanism that enables user-guided, top-down or bottom-up mining of layered scenario-based specifications, broken down by hierarchies embedded in the system under investigation. We do this using data mining methods that provide statistically significant sound and complete results modulo user-defined thresholds, in the context of Damm and Harel’s live sequence charts (LSC); a visual, modal, scenario-based, inter-object language. Thus, …


Specification Mining Of Symbolic Scenario-Based Models, David Lo, Shahar Maoz Nov 2011

Specification Mining Of Symbolic Scenario-Based Models, David Lo, Shahar Maoz

David LO

Many dynamic analysis approaches to specification mining, which extract behavioral models from execution traces, do not consider object identities. This limits their power when used to analyze traces of general object oriented programs. In this work we present a novel specification mining approach that considers object identities, and, moreover, generalizes from specifications involving concrete objects to their symbolic class-level abstractions. Our approach uses data mining methods to extract significant scenario-based specifications in the form of Damm and Harel's live sequence charts (LSC), a formal and expressive extension of classic sequence diagrams. We guarantee that all mined symbolic LSCs are significant …


Mining Closed Discriminative Dyadic Sequential Patterns, David Lo, Hong Cheng, - Lucia Nov 2011

Mining Closed Discriminative Dyadic Sequential Patterns, David Lo, Hong Cheng, - Lucia

David LO

A lot of data are in sequential formats. In this study, we are interested in sequential data that goes in pairs. There are many interesting datasets in this format coming from various domains including parallel textual corpora, duplicate bug reports, and other pairs of related sequences of events. Our goal is to mine a set of closed discriminative dyadic sequential patterns from a database of sequence pairs each belonging to one of the two classes +ve and -ve. These dyadic sequential patterns characterize the discriminating facets contrasting the two classes. They are potentially good features to be used for the …


Mining And Ranking Generators Of Sequential Pattern, David Lo, Siau-Cheng Khoo, Jinyan Li Nov 2011

Mining And Ranking Generators Of Sequential Pattern, David Lo, Siau-Cheng Khoo, Jinyan Li

David LO

Sequential pattern mining ¯rst proposed by Agrawal and Srikant has received intensive research due to its wide range applicability in many real-life domains. Various improvements have been proposed which include mining a closed set of sequential patterns. Sequential patterns supported by the same sequences in the database can be considered as belonging to an equivalence class. Each equivalence class contains patterns partially-ordered by sub-sequence relationship and having the same support. Within an equivalence class, the set of maximal and minimal patterns are referred to as closed patterns and generators respectively. Generators used together with closed patterns can provide additional information …


Mining Temporal Rules For Software Maintenance, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Mining Temporal Rules For Software Maintenance, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

Software evolution incurs difficulties in program comprehension and software verification, and hence it increases the cost of software maintenance. In this study, we propose a novel technique to mine from program execution traces a sound and complete set of statistically significant temporal rules of arbitrary lengths. The extracted temporal rules reveal invariants that the program observes, and will consequently guide developers to understand the program behaviors, and facilitate all downstream applications such as verification and debugging. Different from previous studies that were restricted to mining two-event rules (e.g., (lock) →(unlock)), our algorithm discovers rules of arbitrary lengths. In order to …


An Automated Approach For Finding Variable-Constant Pairing Bugs, Julia Lawall, David Lo Nov 2011

An Automated Approach For Finding Variable-Constant Pairing Bugs, Julia Lawall, David Lo

David LO

Named constants are used heavily in operating systems code, both as internal flags and in interactions with devices. Decision making within an operating system thus critically depends on the correct usage of these values. Nevertheless, compilers for the languages typically used in implementing operating systems provide little support for checking the usage of named constants. This affects correctness, when a constant is used in a context where its value is meaningless, and software maintenance, when a constant has the right value for its usage context but the wrong name. We propose a hybrid program-analysis and data-mining based approach to identify …


Mining Quantified Temporal Rules: Formalism, Algorithms, And Evaluation, David Lo, Ganesan Ramalingam, Venkatesh-Prasad Ranganath, Kapil Vaswani Nov 2011

Mining Quantified Temporal Rules: Formalism, Algorithms, And Evaluation, David Lo, Ganesan Ramalingam, Venkatesh-Prasad Ranganath, Kapil Vaswani

David LO

Libraries usually impose constraints on how clients should use them. Often these constraints are not well-documented. In this paper, we address the problem of recovering such constraints automatically, a problem referred to as specification mining. Given some client programs that use a given library, we identify constraints on the library usage that are (almost) satisfied by the given set of clients.The class of rules we target for mining combines simple binary temporal operators with state predicates (involving equality constraints) and quantification. This is a simple yet expressive subclass of temporal properties that allows us to capture many common API usage …


Mining Modal Scenarios-Based Specifications From Execution Trace Of Reactive Systems, David Lo, Shahar Maoz, Siau-Cheng Khoo Nov 2011

Mining Modal Scenarios-Based Specifications From Execution Trace Of Reactive Systems, David Lo, Shahar Maoz, Siau-Cheng Khoo

David LO

Specification mining is a dynamic analysis process aimed at automatically inferring suggested specifications of a program from its execution traces. We describe a novel method, framework, and tool, for mining inter-object scenario-based specifications in the form of a UML2-compliant variant of Damm and Harels Live Sequence Charts (LSC). LSC extends the classical partial order semantics of sequence diagrams with temporal liveness and symbolic class level lifelines, in order to generate compact and expressive specifications. The output of our algorithm is a sound and complete set of statistically significant LSCs (i.e., satisfying given thresholds of support and confidence), mined from an …


Mining Patterns And Rules For Software Specification Discovery, David Lo, Siau-Cheng Khoo Nov 2011

Mining Patterns And Rules For Software Specification Discovery, David Lo, Siau-Cheng Khoo

David LO

Software specifications are often lacking, incomplete and outdated in the industry. Lack and incomplete specifications cause various software engineering problems. Studies have shown that program comprehension takes up to 45% of software development costs. One of the root causes of the high cost is the lack-of documented specification. Also, outdated and incomplete specification might potentially cause bugs and compatibility issues. In this paper, we describe novel data mining techniques to mine or reverse engineer these specifications from the pool of software engineering data. A large amount of software data is available for analysis. One form of software data is program …


Mining Specifications In Diversified Formats From Execution Traces, David Lo Nov 2011

Mining Specifications In Diversified Formats From Execution Traces, David Lo

David LO

Software evolves; this phenomenon causes increase in maintenance efforts, problem in comprehending the ever-changing code base and difficulty in verifying software correctness. As software changes, often the documented specification is not updated. Outdated specification adds challenge to the understanding of the code base during maintenance tasks. Also, software changes might induce bugs, anomalies and even security threats. To address the above issues, we propose an array of specification mining techniques to mine software specifications in diversified formats from program execution traces. Case studies on various systems show that the extracted specifications shed light on the behaviors of systems under analysis. …


Data Mining For Software Engineering, Tao Xie, Suresh Thummalapenta, David Lo, Chao Liu Nov 2011

Data Mining For Software Engineering, Tao Xie, Suresh Thummalapenta, David Lo, Chao Liu

David LO

To improve software productivity and quality, software engineers are increasingly applying data mining algorithms to various software engineering tasks. However, mining SE data poses several challenges. The authors present various algorithms to effectively mine sequences, graphs, and text from such data.


Hierarchical Inter-Object Traces For Specification Mining, David Lo, Shahar Maoz Nov 2011

Hierarchical Inter-Object Traces For Specification Mining, David Lo, Shahar Maoz

David LO

Major challenges of dynamic analysis approaches to specification mining include scalability over long traces as well as comprehensibility and expressivity of results. We present a novel use of object hierarchies over inter-object traces as an abstraction/refinement mechanism enabling scalable, incremental, top-down mining of scenario-based specifications.