Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 3361 - 3390 of 6722

Full-Text Articles in Physical Sciences and Mathematics

Tagcombine: Recommending Tags To Contents In Software Information Sites, Xin Yu Wang, Xin Xia, David Lo Sep 2015

Tagcombine: Recommending Tags To Contents In Software Information Sites, Xin Yu Wang, Xin Xia, David Lo

Research Collection School Of Computing and Information Systems

Nowadays, software engineers use a variety of online media to search and become informed of new and interesting technologies, and to learn from and help one another. We refer to these kinds of online media which help software engineers improve their performance in software development, maintenance, and test processes as software information sites. In this paper, we propose TagCombine, an automatic tag recommendation method which analyzes objects in software information sites. TagCombine has three different components: 1) multi-label ranking component which considers tag recommendation as a multi-label learning problem; 2) similarity-based ranking component which recommends tags from similar objects; 3) …


Answering Why-Not Questions On Reverse Top-K Queries, Yunjun Gao, Qing Liu, Gang Chen, Baihua Zheng, Linlin Zhou Sep 2015

Answering Why-Not Questions On Reverse Top-K Queries, Yunjun Gao, Qing Liu, Gang Chen, Baihua Zheng, Linlin Zhou

Research Collection School Of Computing and Information Systems

Why-not questions, which aim to seek clarifications on the missing tuples for query results, have recently received considerable attention from the database community. In this paper, we systematically explore why-not questions on reverse top-k queries, owing to its importance in multi-criteria decision making. Given an initial reverse top-k query and a missing/why-not weighting vector set Wm that is absent from the query result, why-not questions on reverse top-k queries explain why Wm does not appear in the query result and provide suggestions on how to refine the initial query with minimum penalty to include Wm in the refined query result. …


Multi-Factor Duplicate Question Detection In Stack Overflow, Yun Zhang, David Lo, Xin Xia, Jian Ling Sun Sep 2015

Multi-Factor Duplicate Question Detection In Stack Overflow, Yun Zhang, David Lo, Xin Xia, Jian Ling Sun

Research Collection School Of Computing and Information Systems

Stack Overflow is a popular on-line question and answer site for software developers to share their experience and expertise. Among the numerous questions posted in Stack Overflow, two or more of them may express the same point and thus are duplicates of one another. Duplicate questions make Stack Overflow site maintenance harder, waste resources that could have been used to answer other questions, and cause developers to unnecessarily wait for answers that are already available. To reduce the problem of duplicate questions, Stack Overflow allows questions to be manually marked as duplicates of others. Since there are thousands of questions …


Using Content-Level Structures For Summarizing Microblog Repost Trees, Jing Li, Wei Gao, Zhongyu Wei, Baolin Peng, Kam-Fai Wong Sep 2015

Using Content-Level Structures For Summarizing Microblog Repost Trees, Jing Li, Wei Gao, Zhongyu Wei, Baolin Peng, Kam-Fai Wong

Research Collection School Of Computing and Information Systems

A microblog repost tree provides strong clues on how an event described therein develops. To help social media users capture the main clues of events on microblogging sites, we propose a novel repost tree summarization framework by effectively differentiating two kinds of messages on repost trees called leaders and followers, which are derived from contentlevel structure information, i.e., contents of messages and the reposting relations. To this end, Conditional Random Fields (CRF) model is used to detect leaders across repost tree paths. We then present a variant of random-walk-based summarization model to rank and select salient messages based on the …


Maximum Rank Query, Kyriakos Mouratidis, Jilian Zhang, Hwee Hwa Pang Sep 2015

Maximum Rank Query, Kyriakos Mouratidis, Jilian Zhang, Hwee Hwa Pang

Research Collection School Of Computing and Information Systems

The top-k query is a common means to shortlist a number of options from a set of alternatives, based on the user's preferences. Typically, these preferences are expressed as a vector of query weights, defined over the options' attributes. The query vector implicitly associates each alternative with a numeric score, and thus imposes a ranking among them. The top-k result includes the k options with the highest scores. In this context, we define the maximum rank query (MaxRank). Given a focal option in a set of alternatives, the MaxRank problem is to compute the highest rank this option may achieve …


Did You Expect Your Users To Say This?: Distilling Unexpected Micro-Reviews For Venue Owners, Wen-Haw Chong, Bingtian Dai, Ee-Peng Lim Sep 2015

Did You Expect Your Users To Say This?: Distilling Unexpected Micro-Reviews For Venue Owners, Wen-Haw Chong, Bingtian Dai, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

With social media platforms such as Foursquare, users can now generate concise reviews, i.e. micro-reviews, about entities such as venues (or products). From the venue owner's perspective, analysing these micro-reviews will offer interesting insights, useful for event detection and customer relationship management. However not all micro-reviews are equally important, especially since a venue owner should already be familiar with his venue's primary aspects. Instead we envisage that a venue owner will be interested in micro-reviews that are unexpected to him. These can arise in many ways, such as users focusing on easily overlooked aspects (by the venue owner), making comparisons …


Latent Factors Meet Homophily In Diffusion Modelling, Duc Minh Luu, Ee-Peng Lim Sep 2015

Latent Factors Meet Homophily In Diffusion Modelling, Duc Minh Luu, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

Diffusion is an important dynamics that helps spreading information within an online social network. While there are already numerous models for single item diffusion, few have studied diffusion of multiple items, especially when items can interact with one another due to their inter-similarity. Moreover, the well-known homophily effect is rarely considered explicitly in the existing diffusion models. This work therefore fills this gap by proposing a novel model called Topic level Interaction Homophily Aware Diffusion (TIHAD) to include both latent factor level interaction among items and homophily factor in diffusion. The model determines item interaction based on latent factors and …


Towards Opinion Summarization From Online Forums, Ding Ying, Jing Jiang Sep 2015

Towards Opinion Summarization From Online Forums, Ding Ying, Jing Jiang

Research Collection School Of Computing and Information Systems

Summarizing opinions expressed in online forums can potentially benefit many people. However, special characteristics of this problem may require changes to standard text summarization techniques. In this work, we present our initial attempt at extractive summarization of opinionated online forum threads. Given the nature of user generated content in online discussion forums, we hypothesize that besides relevance, text quality and subjectivity also play important roles in deciding which sentences are good summary sentences. We therefore construct an annotated corpus to facilitate our study of extractive summarization of online discussion forums. We define a set of features to capture relevance, text …


Using Content-Level Structures For Summarizing Microblog Repost Trees, Jing Li, Wei Gao, Zhongyu Wei, Baolin Peng, Kam-Fai Wong Sep 2015

Using Content-Level Structures For Summarizing Microblog Repost Trees, Jing Li, Wei Gao, Zhongyu Wei, Baolin Peng, Kam-Fai Wong

Research Collection School Of Computing and Information Systems

A microblog repost tree provides strong clues on how an event described therein develops. To help social media users capture the main clues of events on microblogging sites, we propose a novel repost tree summarization framework by effectively differentiating two kinds of messages on repost trees called leaders and followers, which are derived from contentlevel structure information, i.e., contents of messages and the reposting relations. To this end, Conditional Random Fields (CRF) model is used to detect leaders across repost tree paths. We then present a variant of random-walk-based summarization model to rank and select salient messages based on the …


A Survey On Artificial Intelligence-Based Modeling Techniques For High Speed Milling Processes, Amin Jahromi Torabi, Meng Joo Er, Xiang Li, Beng Siong Lim, Lianyin Zhai, Richard Jayadi Oentaryo, Gan Oon Peen, Jacek M. Zurada Sep 2015

A Survey On Artificial Intelligence-Based Modeling Techniques For High Speed Milling Processes, Amin Jahromi Torabi, Meng Joo Er, Xiang Li, Beng Siong Lim, Lianyin Zhai, Richard Jayadi Oentaryo, Gan Oon Peen, Jacek M. Zurada

Research Collection School Of Computing and Information Systems

The process of high speed milling is regarded as one of the most sophisticated and complicated manufacturing operations. In the past four decades, many investigations have been conducted on this process, aiming to better understand its nature and improve the surface quality of the products as well as extending tool life. To achieve these goals, it is necessary to form a general descriptive reference model of the milling process using experimental data, thermomechanical analysis, statistical or artificial intelligence (AI) models. Moreover, increasing demands for more efficient milling processes, qualified surface finishing, and modeling techniques have propelled the development of more …


Mining Revenue-Maximizing Bundling Configuration, Loc Do, Hady Wirawan Lauw, Ke Wang Sep 2015

Mining Revenue-Maximizing Bundling Configuration, Loc Do, Hady Wirawan Lauw, Ke Wang

Research Collection School Of Computing and Information Systems

With greater prevalence of social media, there is an increasing amount of user-generated data revealing consumer preferences for various products and services. Businesses seek to harness this wealth of data to improve their marketing strategies. Bundling, or selling two or more items for one price is a highly-practiced marketing strategy. In this paper, we address the bundle configuration problem from the data-driven perspective. Given a set of items in a seller’s inventory, we seek to determine which items should belong to which bundle so as to maximize the total revenue, by mining consumer preferences data. We show that this problem …


Real-Time Targeted Influence Maximization For Online Advertisements, Yuchen Li, Dongxiang Zhang, Kian-Lee Tan Sep 2015

Real-Time Targeted Influence Maximization For Online Advertisements, Yuchen Li, Dongxiang Zhang, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Advertising in social network has become a multi-billion dollar industry. A main challenge is to identify key influencers who can effectively contribute to the dissemination of information. Although the influence maximization problem, which finds a seed set of k most influential users based on certain propagation models, has been well studied, it is not target-aware and cannot be directly applied to online advertising. In this paper, we propose a new problem, named Keyword-Based Targeted Influence Maximization (KB-TIM), to find a seed set that maximizes the expected influence over users who are relevant to a given advertisement. To solve the problem, …


Developing Java Programs On Android Mobile Phones Using Speech Recognition, Santhrushna Gande Sep 2015

Developing Java Programs On Android Mobile Phones Using Speech Recognition, Santhrushna Gande

Electronic Theses, Projects, and Dissertations

Nowadays Android operating system based mobile phones and tablets are widely used and had millions of users around the world. The popularity of this operating system is due to its multi-tasking, ease of access and diverse device options. “Java Programming Speech Recognition Application” is an Android application used for handicapped individuals who are not able or have difficultation to type on a keyboard. This application allows the user to write a compute program (in Java Language) by dictating the words and without using a keyboard. The user needs to speak out the commands and symbols required for his/her program. The …


Storage And Analysis Of Big Data Tools For Sessionized Data, Robert Mcginley, Jason Etter Aug 2015

Storage And Analysis Of Big Data Tools For Sessionized Data, Robert Mcginley, Jason Etter

Mathematics and Computer Science Capstones

The Oracle database currently used to mine data at PEGGY is approaching end-of-life and a new infrastructure overhaul is required. It has also been identified that a critical business requirement is the need to load and store very large historical data sets. These data sets contain raw electronic consumer events and interactions from a website such as page views, clicks, downloads, return visits, length of time spent on pages, and how they got to the site / originated.

This project will be focused on finding a tool to analyze and measure sessionized data, which is a unit of measurement in …


From Sensors To Sense Making: Leveraging Open-Access Scientific Data To Assess Arctic Maritime Risks, Mark A. Stoddard, Melanie Fournier Ph.D, Laurent Etienne Ph.D, Leah Beveridge Ph.D Aug 2015

From Sensors To Sense Making: Leveraging Open-Access Scientific Data To Assess Arctic Maritime Risks, Mark A. Stoddard, Melanie Fournier Ph.D, Laurent Etienne Ph.D, Leah Beveridge Ph.D

ShipArc 2015 Conference

No abstract provided.


Web-Based Fragment Library, Junjie Wang, Lyudmila Slipchenko Aug 2015

Web-Based Fragment Library, Junjie Wang, Lyudmila Slipchenko

The Summer Undergraduate Research Fellowship (SURF) Symposium

A new polarized force field BioEFP for modeling process in biology is far superior in accuracy to the common classical force fields. One of the main shortcomings of BioEFP is that the parameters are not readily available, thus it will take a lot of time to be calculated.

Developing an online repository of pre-computed fragment parameters and a similarity algorithm will allow ascribing each fragment of a biological macromolecule to a pre-defined fragment.

This study incorporates three parts to create the online repository. First, the visual design for the website using the Hypertext Markup Language and the Cascading Style Sheets …


Three Essays On Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing And Text Mining, Euisung Jung Aug 2015

Three Essays On Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing And Text Mining, Euisung Jung

Theses and Dissertations

Patient recruitment and enrollment are critical factors for a successful clinical trial; however, recruitment tends to be the most common problem in most clinical trials. The success of a clinical trial depends on efficiently recruiting suitable patients to conduct the trial. Every clinical trial research has a protocol, which describes what will be done in the study and how it will be conducted. Also, the protocol ensures the safety of the trial subjects and the integrity of the data collected. The eligibility criteria section of clinical trial protocols is important because it specifies the necessary conditions that participants have to …


Three Research Essays On The Effects Of Culture Across It Diffusion Within Social Networks, Organizations, And Hospitals, Yu Zhao Aug 2015

Three Research Essays On The Effects Of Culture Across It Diffusion Within Social Networks, Organizations, And Hospitals, Yu Zhao

Theses and Dissertations

This dissertation focuses on two research streams: IT diffusion and culture, and each can be examined in various contexts. Specifically, this study investigates IT diffusion through online social network use, knowledge sharing towards the general organizational information systems, and hospital information systems usage. In terms of culture, espoused national cultural values, IT occupational subculture, and organizational cultural variables are examined in the following essays.

Essay1: Espoused National Cultural Values and Online Social Network Use: Towards an Extension of UTAUT

Prior research has developed a number of models for examining the acceptance and use of technology. This paper extends the unified …


Gibberish, Assistant, Or Master? Using Tweets Linking To News For Extractive Single-Document Summarization, Zhongyu Wei, Wei Gao Aug 2015

Gibberish, Assistant, Or Master? Using Tweets Linking To News For Extractive Single-Document Summarization, Zhongyu Wei, Wei Gao

Research Collection School Of Computing and Information Systems

Single-document summarization is a challenging task. In this paper, we explore effective ways using the tweets linking to news for generating extractive summary of each document. We reveal the very basic value of tweets that can be utilized by regarding every tweet as a vote for candidate sentences. Base on such finding, we resort to unsupervised summarization models by leveraging the linking tweets to master the ranking of candidate extracts via random walk on a heterogeneous graph. The advantage is that we can use the linking tweets to opportunistically "supervise" the summarization with no need of reference summaries. Furthermore, we …


Fusing Heterogeneous Data For Alzheimer's Disease Classification, P. S. Pillai, Tze-Yun Leong Aug 2015

Fusing Heterogeneous Data For Alzheimer's Disease Classification, P. S. Pillai, Tze-Yun Leong

Research Collection School Of Computing and Information Systems

In multi-view learning, multimodal representations of a real world object or situation are integrated to learn its overall picture. Feature sets from distinct data sources carry different, yet complementary, information which, if analysed together, usually yield better insights and more accurate results. Neuro-degenerative disorders such as dementia are characterized by changes in multiple biomarkers. This work combines the features from neuroimaging and cerebrospinal fluid studies to distinguish Alzheimer's disease patients from healthy subjects. We apply statistical data fusion techniques on 101 subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. We examine whether fusion of biomarkers helps to improve diagnostic …


Faitcrowd: Fine Grained Truth Discovery For Crowdsourced Data Aggregation, Fenglong Ma, Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Jiawei Han Aug 2015

Faitcrowd: Fine Grained Truth Discovery For Crowdsourced Data Aggregation, Fenglong Ma, Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Jiawei Han

Research Collection School Of Computing and Information Systems

In crowdsourced data aggregation task, there exist conflicts in the answers provided by large numbers of sources on the same set of questions. The most important challenge for this task is to estimate source reliability and select answers that are provided by high-quality sources. Existing work solves this problem by simultaneously estimating sources' reliability and inferring questions' true answers (i.e., the truths). However, these methods assume that a source has the same reliability degree on all the questions, but ignore the fact that sources' reliability may vary significantly among different topics. To capture various expertise levels on different topics, we …


Semi-Supervised Hashing With Semantic Confidence For Large Scale Visual Search, Yingwei Pan, Ting Yao, Houqiang Li, Chong-Wah Ngo, Tao Mei Aug 2015

Semi-Supervised Hashing With Semantic Confidence For Large Scale Visual Search, Yingwei Pan, Ting Yao, Houqiang Li, Chong-Wah Ngo, Tao Mei

Research Collection School Of Computing and Information Systems

Similarity search is one of the fundamental problems for large scale multimedia applications. Hashing techniques, as one popular strategy, have been intensively investigated owing to the speed and memory efficiency. Recent research has shown that leveraging supervised information can lead to high quality hashing. However, most existing supervised methods learn hashing function by treating each training example equally while ignoring the different semantic degree related to the label, i.e. semantic confidence, of different examples. In this paper, we propose a novel semi-supervised hashing framework by leveraging semantic confidence. Specifically, a confidence factor is first assigned to each example by neighbor …


Neural Modeling Of Sequential Inferences And Learning Over Episodic Memory, Budhitama Subagdja, Ah-Hwee Tan Aug 2015

Neural Modeling Of Sequential Inferences And Learning Over Episodic Memory, Budhitama Subagdja, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

Episodic memory is a significant part of cognition for reasoning and decision making. Retrieval in episodic memory depends on the order relationships of memory items which provides flexibility in reasoning and inferences regarding sequential relations for spatio-temporal domain. However, it is still unclear how they are encoded and how they differ from representations in other types of memory like semantic or procedural memory. This paper presents a neural model of sequential representation and inferences on episodic memory. It contrasts with the common views on sequential representation in neural networks that instead of maintaining transitions between events to represent sequences, they …


Facilitating Image Search With A Scalable And Compact Semantic Mapping, Meng Wang, Weisheng Li, Dong Liu, Bingbing Ni, Jialie Shen, Shuicheng Yan Aug 2015

Facilitating Image Search With A Scalable And Compact Semantic Mapping, Meng Wang, Weisheng Li, Dong Liu, Bingbing Ni, Jialie Shen, Shuicheng Yan

Research Collection School Of Computing and Information Systems

This paper introduces a novel approach to facilitating image search based on a compact semantic embedding. A novel method is developed to explicitly map concepts and image contents into a unified latent semantic space for the representation of semantic concept prototypes. Then, a linear embedding matrix is learned that maps images into the semantic space, such that each image is closer to its relevant concept prototype than other prototypes. In our approach, the semantic concepts equated with query keywords and the images mapped into the vicinity of the prototype are retrieved by our scheme. In addition, a computationally efficient method …


Memes As Building Blocks: A Case Study On Evolutionary Optimization + Transfer Learning For Routing Problems, Liang Feng, Yew-Soon Ong, Ah-Hwee Tan, Ivor W. Tsang Aug 2015

Memes As Building Blocks: A Case Study On Evolutionary Optimization + Transfer Learning For Routing Problems, Liang Feng, Yew-Soon Ong, Ah-Hwee Tan, Ivor W. Tsang

Research Collection School Of Computing and Information Systems

A significantly under-explored area of evolutionary optimization in the literature is the study of optimization methodologies that can evolve along with the problems solved. Particularly, present evolutionary optimization approaches generally start their search from scratch or the ground-zero state of knowledge, independent of how similar the given new problem of interest is to those optimized previously. There has thus been the apparent lack of automated knowledge transfers and reuse across problems. Taking this cue, this paper presents a Memetic Computational Paradigm based on Evolutionary Optimization + Transfer Learning for search, one that models how human solves problems, and embarks on …


Fast Object Retrieval Using Direct Spatial Matching, Zhiyuan Zhong, Jianke Zhu, Steven C. H. Hoi Aug 2015

Fast Object Retrieval Using Direct Spatial Matching, Zhiyuan Zhong, Jianke Zhu, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

The conventional bag-of-visual-words (BoW) model is popular for the large-scale object retrieval system but suffers from the critical drawback of ignoring spatial information. RANSAC-based methods attempt to remedy this drawback, but often require traversing all the feature matches for each hypothesis, leading to the heavy computational cost which limits the number of gallery images to be verified for each online query. We propose an efficient direct spatial matching (DSM) approach to directly estimate the scale variation using region sizes, in which all feature matches voted for estimating geometric transformation. DSM is much faster than RANSAC-based methods and exhaustive enumeration approaches. …


On Mining Lifestyles From User Trip Data, Meng-Fen Chiang, Ee-Peng Lim Aug 2015

On Mining Lifestyles From User Trip Data, Meng-Fen Chiang, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

Large cities today are facing major challenges in planning and policy formulation to keep their growth sustainable. In this paper, we aim to gain useful insights about people living in a city by developing novel models to mine user lifestyles represented by the users' activity centers. Two models, namely ACMM and ACHMM, have been developed to learn the activity centers of each user using a large dataset of bus and subway train trips performed by passengers in Singapore. We show that ACHMM and ACMM yield similar accuracies in location prediction task. We also propose methods to automatically predict "home", "work" …


Event Detection: Exploiting Socio-Physical Interactions In Physical Spaces, Kasthuri Jayarajah, Archan Misra, Xiao-Wen Ruan, Ee-Peng Lim Aug 2015

Event Detection: Exploiting Socio-Physical Interactions In Physical Spaces, Kasthuri Jayarajah, Archan Misra, Xiao-Wen Ruan, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

This paper investigates how digital traces of people's movements and activities in the physical world (e.g., at college campuses and commutes) may be used to detect local, short-lived events in various urban spaces. Past work that use occupancy-related features can only identify high-intensity events (those that cause large-scale disruption in visit patterns). In this paper, we first show how longitudinal traces of the coordinated and group-based movement episodes obtained from individual-level movement data can be used to create a socio-physical network (with edges representing tie strengths among individuals based on their physical world movement & collocation behavior). We then investigate …


Tweet Sentiment: From Classification To Quantification, Wei Gao, Fabrizio Sebastiani Aug 2015

Tweet Sentiment: From Classification To Quantification, Wei Gao, Fabrizio Sebastiani

Research Collection School Of Computing and Information Systems

Sentiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. "prevalence") …


Event Identification And Analysis On Twitter, Qiming Diao Aug 2015

Event Identification And Analysis On Twitter, Qiming Diao

Dissertations and Theses Collection (Open Access)

With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant messages. Because of such wide adoption of Twitter, events like breaking news and release of popular videos can easily capture people’s attention and spread rapidly on Twitter. Therefore, the popularity and importance of an event can be approximately gauged by the volume of tweets covering the event. Moreover, the relevant tweets also reflect the public’s opinions and reactions to events. It is therefore very important to identify and analyze the events on Twitter. In this dissertation, …