Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 4921 - 4950 of 6716

Full-Text Articles in Physical Sciences and Mathematics

Faceted Topic Retrieval Of News Video Using Joint Topic Modeling Of Visual Features And Speech Transcripts, Kong-Wah Wan, Ah-Hwee Tan, Joo-Hwee Lim, Liang-Tien Chia Jul 2010

Faceted Topic Retrieval Of News Video Using Joint Topic Modeling Of Visual Features And Speech Transcripts, Kong-Wah Wan, Ah-Hwee Tan, Joo-Hwee Lim, Liang-Tien Chia

Research Collection School Of Computing and Information Systems

Because of the inherent ambiguity in user queries, an important task of modern retrieval systems is faceted topic retrieval (FTR), which relates to the goal of returning diverse or novel information elucidating the wide range of topics or facets of the query need. We introduce a generative model for hypothesizing facets in the (news) video domain by combining the complementary information in the visual keyframes and the speech transcripts. We evaluate the efficacy of our multimodal model on the standard TRECVID-2005 video corpus annotated with facets. We find that: (1) the joint modeling of the visual and text (speech transcripts) …


Effective Music Tagging Through Advanced Statistical Modeling, Jialie Shen, Meng Wang, Shuicheng Yan, Hwee Hwa Pang, Xian-Sheng Hua Jul 2010

Effective Music Tagging Through Advanced Statistical Modeling, Jialie Shen, Meng Wang, Shuicheng Yan, Hwee Hwa Pang, Xian-Sheng Hua

Research Collection School Of Computing and Information Systems

Music information retrieval (MIR) holds great promise as a technology for managing large music archives. One of the key components of MIR that has been actively researched into is music tagging. While significant progress has been achieved, most of the existing systems still adopt a simple classification approach, and apply machine learning classifiers directly on low level acoustic features. Consequently, they suffer the shortcomings of (1) poor accuracy, (2) lack of comprehensive evaluation results and the associated analysis based on large scale datasets, and (3) incomplete content representation, arising from the lack of multimodal and temporal information integration. In this …


A Heuristic Algorithm For Trust-Oriented Service Provider Selection In Complex Social Networks, Guanfeng Liu, Yan Wang, Mehmet A. Orgun, Ee Peng Lim Jul 2010

A Heuristic Algorithm For Trust-Oriented Service Provider Selection In Complex Social Networks, Guanfeng Liu, Yan Wang, Mehmet A. Orgun, Ee Peng Lim

Research Collection School Of Computing and Information Systems

In a service-oriented online social network consisting of service providers and consumers, a service consumer can search trustworthy service providers via the social network. This requires the evaluation of the trustworthiness of a service provider along a certain social trust path from the service consumer to the service provider. However, there are usually many social trust paths between participants in social networks. Thus, a challenging problem is which social trust path is the optimal one that can yield the most trustworthy evaluation result. In this paper, we first present a novel complex social network structure and a new concept, Quality …


Generating Templates Of Entity Summaries With An Entity-Aspect Model And Pattern Mining, Peng Li, Jing Jiang, Yinglin Wang Jul 2010

Generating Templates Of Entity Summaries With An Entity-Aspect Model And Pattern Mining, Peng Li, Jing Jiang, Yinglin Wang

Research Collection School Of Computing and Information Systems

In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles. This kind of summary templates can be useful in various applications. We first develop an entity-aspect LDA model to simultaneously cluster both sentences and words into aspects. We then apply frequent subtree pattern mining on the dependency parse trees of the clustered and labeled sentences to discover sentence patterns that well represent the aspects. Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in. We …


Extracting Common Emotions From Blogs Based On Fine-Grained Sentiment Clustering, Shi Feng, Daling Wang, Ge Yu, Wei Gao, Kam-Fai Wong Jul 2010

Extracting Common Emotions From Blogs Based On Fine-Grained Sentiment Clustering, Shi Feng, Daling Wang, Ge Yu, Wei Gao, Kam-Fai Wong

Research Collection School Of Computing and Information Systems

Recently, blogs have emerged as the major platform for people to express their feelings and sentiments in the age of Web 2.0. The common emotions, which reflect people’s collective and overall sentiments, are becoming the major concern for governments, business companies and individual users. Different from previous literatures on sentiment classification and summarization, the major issue of common emotion extraction is to find out people’s collective sentiments and their corresponding distributions on the Web. Most existing blog clustering methods take into account keywords, stories or timelines but neglect the embedded sentiments, which are considered very important features of blogs. In …


Semantics-Preserving Bag-Of-Words Models And Applications, Lei Wu, Steven C. H. Hoi, Nenghai Yu Jul 2010

Semantics-Preserving Bag-Of-Words Models And Applications, Lei Wu, Steven C. H. Hoi, Nenghai Yu

Research Collection School Of Computing and Information Systems

The Bag-of-Words (BoW) model is a promising image representation technique for image categorization and annotation tasks. One critical limitation of existing BoW models is that much semantic information is lost during the codebook generation process, an important step of BoW. This is because the codebook generated by BoW is often obtained via building the codebook simply by clustering visual features in Euclidian space. However, visual features related to the same semantics may not distribute in clusters in the Euclidian space, which is primarily due to the semantic gap between low-level features and high-level semantics. In this paper, we propose a …


Non-Parametric Kernel Ranking Approach For Social Image Retrieval, Jinfeng Zhuang, Steven C. H. Hoi Jul 2010

Non-Parametric Kernel Ranking Approach For Social Image Retrieval, Jinfeng Zhuang, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Social image retrieval has become an emerging research challenge in web rich media search. In this paper, we address the research problem of text-based social image retrieval, which aims to identify and return a set of relevant social images that are related to a text-based query from a corpus of social images. Regular approaches for social image retrieval simply adopt typical text-based image retrieval techniques to search for the relevant social images based on the associated tags, which may suffer from noisy tags. In this paper, we present a novel framework for social image re-ranking based on a non-parametric kernel …


Evaluation Of Protein Backbone Alphabets: Using Predicted Local Structure For Fold Recognition, Kyong Jin Shim Jul 2010

Evaluation Of Protein Backbone Alphabets: Using Predicted Local Structure For Fold Recognition, Kyong Jin Shim

Research Collection School Of Computing and Information Systems

Optimally combining available information is one of the key challenges in knowledge-driven prediction techniques. In this study, we evaluate six Phi and Psi-based backbone alphabets. We show that the addition of predicted backbone conformations to SVM classifiers can improve fold recognition. Our experimental results show that the inclusion of predicted backbone conformations in our feature representation leads to higher overall accuracy compared to when using amino acid residues alone.


Show Me The Numbers: Visual Analytics For Insights, Tin Seong Kam Jul 2010

Show Me The Numbers: Visual Analytics For Insights, Tin Seong Kam

Research Collection School Of Computing and Information Systems

In this highly volatile and fast-paced financial market, traders and managers working in banking and financial organizations must struggle to cope with large and complex data from multi-sources, that move throughout the market at increasingly high speed. The cost of making poor business and investment decisions is very high. This places great demands on data analysts, who are responsible for providing process information, to support the activities of traders and managers. Static reports and traditional business intelligence tools simply cannot keep up with a market that is changing on a second-to-second basis. By the time the traders and bankers have …


Can The Presence Of Online Word Of Mouth Increase Product Sales?, Alanah Mitchell, Deepak Khazanchi Jul 2010

Can The Presence Of Online Word Of Mouth Increase Product Sales?, Alanah Mitchell, Deepak Khazanchi

Information Systems and Quantitative Analysis Faculty Publications

The power and potential impact of online word of mouth has increased substantially. Consumers have come to accept and rely upon online word of mouth, so it is important to understand how it works and what kind of impact it has on online product sales. This article provides an assessment of this question through an analysis of sales and online word of mouth data from a multi-product e-commerce retail firm.


How To Make Linked Data More Than Data, Prateek Jain, Amit P. Sheth, Kunal Verma, Pascal Hitzler, Peter Z. Yeh Jun 2010

How To Make Linked Data More Than Data, Prateek Jain, Amit P. Sheth, Kunal Verma, Pascal Hitzler, Peter Z. Yeh

Kno.e.sis Publications

The LOD cloud has a potential for applicability in many AI-related tasks, such as open domain question answering, knowledge discovery, and the Semantic Web. An important prerequisite before the LOD cloud can enable these goals is allowing its users (and applications) to effectively pose queries to and retrieve answers from it. However, this prerequisite is still an open problem for the LOD cloud and has restricted it to 'merely more data.' To transform the LOD cloud from 'merely more data' to 'semantically linked data' there are plenty of open issues which should be addressed. We believe this transformation of the …


Semantically Annotated Restful Services For Large-Scale Metabolomics Data Analysis, Ashwin Manjunatha, Paul E. Anderson, Satya S. Sahoo, Ajith H. Ranabahu, Michael L. Raymer, Amit P. Sheth Jun 2010

Semantically Annotated Restful Services For Large-Scale Metabolomics Data Analysis, Ashwin Manjunatha, Paul E. Anderson, Satya S. Sahoo, Ajith H. Ranabahu, Michael L. Raymer, Amit P. Sheth

Kno.e.sis Publications

No abstract provided.


Active Collaboration Learning Environments: The Class Of Web 2.0, Dirk Hovorka, Michael J. Rees Jun 2010

Active Collaboration Learning Environments: The Class Of Web 2.0, Dirk Hovorka, Michael J. Rees

Michael J Rees

The maturity and increased integration of online collaboration, networking, and research tools offer Information Systems faculty opportunities to provide unique learning environments at multiple levels. A growing ensemble of Web 2.0 technologies provide the background to introduce and explore fundamental aspects of information system development, design, application, and use, while simultaneously providing a functional suite of tools which will aid students in other aspects of their university learning. A selection of these technologies and case studies of their classroom usage is discussed. In addition, an agenda for research in both pedagogy and in information systems phenomena is outlined.


Measurement And Interpolation Of Sea Surface Temperature And Salinity In The Tropical Pacific: A 9,000 Nautical Mile Research Odyssey, Amber Brooks Jun 2010

Measurement And Interpolation Of Sea Surface Temperature And Salinity In The Tropical Pacific: A 9,000 Nautical Mile Research Odyssey, Amber Brooks

Earth and Soil Sciences

The purpose of this project was to compare spline and inverse distance weighting interpolation tools on data collected in the tropical Pacific Ocean by ship and data from a global network of CTD floats, known as Argo floats (fig.1), to provide evidence that technological advancement and integration is aiding our understanding of the ocean-atmosphere system of planet Earth. Thirty-one sea surface temperature and salinity samples were manually taken across a 9,000 nautical mile trek of the Pacific Ocean for the months of April, May and June 2008. Argo ASCII globally gridded monthly averaged sea surface temperature and salinity data, from …


Customer Communicator, Eddie Tavarez Jun 2010

Customer Communicator, Eddie Tavarez

Computer Science and Software Engineering

No abstract provided.


Employee Time Scheduling, Mark Peter Smith Jun 2010

Employee Time Scheduling, Mark Peter Smith

Computer Science and Software Engineering

Small business managers face the common problem of employee time scheduling. There is a solution to this problem in the form of an application called Lemming Scheduler. Lemming Scheduler is a Java based employee time scheduling program. Its features include a desktop based application that stores employee and business information as well as a web interface for employees to view schedules and update availability. The desktop application uses employee and shift information to automatically generate schedules. The generated schedules are viewable by employees outside of work by way of the web interface. Lemming Scheduler provides a light weight interface for …


Re-Solving Stochastic Programming Models For Airline Revenue Management, Lijian Chen, Tito Homem-De-Mello Jun 2010

Re-Solving Stochastic Programming Models For Airline Revenue Management, Lijian Chen, Tito Homem-De-Mello

MIS/OM/DS Faculty Publications

We study some mathematical programming formulations for the origin-destination model in airline revenue management. In particular, we focus on the traditional probabilistic model proposed in the literature. The approach we study consists of solving a sequence of two-stage stochastic programs with simple recourse, which can be viewed as an approximation to a multi-stage stochastic programming formulation to the seat allocation problem. Our theoretical results show that the proposed approximation is robust, in the sense that solving more successive two-stage programs can never worsen the expected revenue obtained with the corresponding allocation policy. Although intuitive, such a property is known not …


Janus: From Workflows To Semantic Provenance And Linked Open Data, Paolo Missier, Satya S. Sahoo, Jun Zhao, Carole Goble, Amit P. Sheth Jun 2010

Janus: From Workflows To Semantic Provenance And Linked Open Data, Paolo Missier, Satya S. Sahoo, Jun Zhao, Carole Goble, Amit P. Sheth

Kno.e.sis Publications

Data provenance graphs are form of metadata that can be used to establish a variety of properties of data products that undergo sequences of transformations, typically specified as workflows. Their usefulness for answering user provenance queries is limited, however, unless the graphs are enhanced with domain-specific annotations. In this paper we propose a model and architecture for semantic, domain-aware provenance, and demonstrate its usefulness in answering typical user queries. Furthermore, we discuss the additional benefits and the technical implications of publishing provenance graphs as a form of Linked Data. A prototype implementation of the model is available for data produced …


Provenance Management In Parasite Research, Vinh Nguyen, Priti Parikh, Satya S. Sahoo, Amit P. Sheth Jun 2010

Provenance Management In Parasite Research, Vinh Nguyen, Priti Parikh, Satya S. Sahoo, Amit P. Sheth

Kno.e.sis Publications

The objective of this research is to create a semantic problem solving environment (PSE) for human parasite Trypanosoma cruzi. As a part of the PSE, we are trying to manage provenance of the experiment data as it is generated. It requires to capture the provenance which is often collected through web forms used by biologists to input the information about experiments they conduct. We have created Parasite Experiment Ontology (PEO) that represents provenance information used in the project. We have modified the back end which processes the data gathered from biologists, generates RDF triples and serializes them into the triple …


Do Wikipedians Follow Domain Experts? A Domain-Specific Study On Wikipedia Contribution, Yi Zhang, Aixin Sun, Anwitaman Datta, Kuiyu Chang, Ee Peng Lim Jun 2010

Do Wikipedians Follow Domain Experts? A Domain-Specific Study On Wikipedia Contribution, Yi Zhang, Aixin Sun, Anwitaman Datta, Kuiyu Chang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Wikipedia is one of the most successful online knowledge bases, attracting millions of visits daily. Not surprisingly, its huge success has in turn led to immense research interest for a better understanding of the collaborative knowledge building process. In this paper, we performed a (terrorism) domain-specific case study, comparing and contrasting the knowledge evolution in Wikipedia with a knowledge base created by domain experts. Specifically, we used the Terrorism Knowledge Base (TKB) developed by experts at MIPT. We identified 409 Wikipedia articles matching TKB records, and went ahead to study them from three aspects: creation, revision, and link evolution. We …


Z-Sky: An Efficient Skyline Query Processing Framework Based On Z-Order, Ken C. K. Lee, Wang-Chien Lee, Baihua Zheng, Huajing Li, Yuan Tian Jun 2010

Z-Sky: An Efficient Skyline Query Processing Framework Based On Z-Order, Ken C. K. Lee, Wang-Chien Lee, Baihua Zheng, Huajing Li, Yuan Tian

Research Collection School Of Computing and Information Systems

Given a set of data points in a multidimensional space, a skyline query retrieves those data points that are not dominated by any other point in the same dataset. Observing that the properties of Z-order space filling curves (or Z-order curves) perfectly match with the dominance relationships among data points in a geometrical data space, we, in this paper, develop and present a novel and efficient processing framework to evaluate skyline queries and their variants, and to support skyline result updates based on Z-order curves. This framework consists of ZBtree, i.e., an index structure to organize a source dataset and …


Visualizing And Exploring Evolving Information Networks In Wikipedia, Ee Peng Lim, Agus Trisnajaya Kwee, Nelman Lubis Ibrahim, Aixin Sun, Anwitaman Datta, Kuiyu Chang, Maureen Maureen Jun 2010

Visualizing And Exploring Evolving Information Networks In Wikipedia, Ee Peng Lim, Agus Trisnajaya Kwee, Nelman Lubis Ibrahim, Aixin Sun, Anwitaman Datta, Kuiyu Chang, Maureen Maureen

Research Collection School Of Computing and Information Systems

Information networks in Wikipedia evolve as users collaboratively edit articles that embed the networks. These information networks represent both the structure and content of community’s knowledge and the networks evolve as the knowledge gets updated. By observing the networks evolve and finding their evolving patterns, one can gain higher order knowledge about the networks and conduct longitudinal network analysis to detect events and summarize trends. In this paper, we present SSNetViz+, a visual analytic tool to support visualization and exploration of Wikipedia’s information networks. SSNetViz+ supports time-based network browsing, content browsing and search. Using a terrorism information network as an …


Stevent: Spatio-Temporal Event Model For Social Network Discovery, Hady W. Lauw, Ee Peng Lim, Hwee Hwa Pang, Teck-Tim Tan Jun 2010

Stevent: Spatio-Temporal Event Model For Social Network Discovery, Hady W. Lauw, Ee Peng Lim, Hwee Hwa Pang, Teck-Tim Tan

Research Collection School Of Computing and Information Systems

Spatio-temporal data concerning the movement of individuals over space and time contains latent information on the associations among these individuals. Sources of spatio-temporal data include usage logs of mobile and Internet technologies. This article defines a spatio-temporal event by the co-occurrences among individuals that indicate potential associations among them. Each spatio-temporal event is assigned a weight based on the precision and uniqueness of the event. By aggregating the weights of events relating two individuals, we can determine the strength of association between them. We conduct extensive experimentation to investigate both the efficacy of the proposed model as well as the …


Efficient Processing Of Exact Top-K Queries Over Disk-Resident Sorted Lists, Hwee Hwa Pang, Xuhua Ding, Baihua Zheng Jun 2010

Efficient Processing Of Exact Top-K Queries Over Disk-Resident Sorted Lists, Hwee Hwa Pang, Xuhua Ding, Baihua Zheng

Research Collection School Of Computing and Information Systems

The top-k query is employed in a wide range of applications to generate a ranked list of data that have the highest aggregate scores over certain attributes. As the pool of attributes for selection by individual queries may be large, the data are indexed with per-attribute sorted lists, and a threshold algorithm (TA) is applied on the lists involved in each query. The TA executes in two phases--find a cut-off threshold for the top-k result scores, then evaluate all the records that could score above the threshold. In this paper, we focus on exact top-k queries that involve monotonic linear …


Information Hiding Using Stochastic Diffusion For The Covert Transmission Of Encrypted Images, Jonathan Blackledge Jun 2010

Information Hiding Using Stochastic Diffusion For The Covert Transmission Of Encrypted Images, Jonathan Blackledge

Conference papers

A principal weakness of all encryption systems is that the output data can be `seen' to be encrypted. In other words, encrypted data provides a 'flag' on the potential value of the information that has been encrypted. In this paper, we provide a novel approach to `hiding' encrypted data in a digital image. We consider an approach in which a plaintext image is encrypted with a cipher using the processes of `stochastic diffusion' and the output quantized into a 1-bit array generating a binary image cipher-text. This output is then `embedded' in a host image which is undertaken either in …


A Social Transitivity-Based Data Dissemination Scheme For Opportunistic Networks, Jaesung Ku, Yangwoo Ko, Jisun An, Dongman Lee Jun 2010

A Social Transitivity-Based Data Dissemination Scheme For Opportunistic Networks, Jaesung Ku, Yangwoo Ko, Jisun An, Dongman Lee

Research Collection School Of Computing and Information Systems

A social-based routing protocol for opportunistic networks considers the direct delivery as forwarding metrics. By ignoring the indirect delivery through intermediate nodes, it misses chances to find paths that are better in terms of delivery ratio and time. To overcome this limitation, we propose to incorporate transitivity, which considers the indirect delivery through intermediate nodes, as one of the forwarding metrics. We also found that some message forwards do not improve the delivery performance. To reduce the number of these useless forwards, the proposed scheme forwards messages to an encountered node when the increase of total utility value is greater …


Using Hadoop And Cassandra For Taxi Data Analytics: A Feasibility Study, Alvin Jun Yong Koh, Xuan Khoa Nguyen, C. Jason Woodard Jun 2010

Using Hadoop And Cassandra For Taxi Data Analytics: A Feasibility Study, Alvin Jun Yong Koh, Xuan Khoa Nguyen, C. Jason Woodard

Research Collection School Of Computing and Information Systems

This paper reports on a preliminary study to assess the feasibility of using the Open Cirrus Cloud Computing Research testbed to provide offline and online analytical support for taxi fleet operations. In the study, we benchmarked the performance gains from distributing the offline analysis of GPS location traces over multiple virtual machines using the Apache Hadoop implementation of the MapReduce paradigm. We also explored the use of the Apache Cassandra distributed database system for online retrieval of vehicle trace data. While configuring the testbed infrastructure was straightforward, we encountered severe I/O bottlenecks in running the benchmarks due to the lack …


Semantic Context Modeling With Maximal Margin Conditional Random Fields For Automatic Image Annotation, Yu Xiang, Xiangdong Zhou, Zuotao Liu, Tat-Seng Chua, Chong-Wah Ngo Jun 2010

Semantic Context Modeling With Maximal Margin Conditional Random Fields For Automatic Image Annotation, Yu Xiang, Xiangdong Zhou, Zuotao Liu, Tat-Seng Chua, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Context modeling for Vision Recognition and Automatic Image Annotation (AIA) has attracted increasing attentions in recent years. For various contextual information and resources, semantic context has been exploited in AIA and brings promising results. However, previous works either casted the problem into structural classification or adopted multi-layer modeling, which suffer from the problems of scalability or model efficiency. In this paper, we propose a novel discriminative Conditional Random Field (CRF) model for semantic context modeling in AIA, which is built over semantic concepts and treats an image as a whole observation without segmentation. Our model captures the interactions between semantic …


Satrap: Data And Network Heterogeneity Aware P2p Data-Mining, Hock Kee Ang, Vivekanand Gopalkrishnan, Anwitaman Datta, Wee Keong Ng, Steven C. H. Hoi Jun 2010

Satrap: Data And Network Heterogeneity Aware P2p Data-Mining, Hock Kee Ang, Vivekanand Gopalkrishnan, Anwitaman Datta, Wee Keong Ng, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Distributed classification aims to build an accurate classifier by learning from distributed data while reducing computation and communication cost A P2P network where numerous users come together to share resources like data content, bandwidth, storage space and CPU resources is an excellent platform for distributed classification However, two important aspects of the learning environment have often been overlooked by other works, viz., 1) location of the peers which results in variable communication cost and 2) heterogeneity of the peers' data which can help reduce redundant communication In this paper, we examine the properties of network and data heterogeneity and propose …


Prediction Of Protein Subcellular Localization: A Machine Learning Approach, Kyong Jin Shim Jun 2010

Prediction Of Protein Subcellular Localization: A Machine Learning Approach, Kyong Jin Shim

Research Collection School Of Computing and Information Systems

Subcellular localization is a key functional characteristic of proteins. Optimally combining available information is one of the key challenges in today's knowledge-based subcellular localization prediction approaches. This study explores machine learning approaches for the prediction of protein subcellular localization that use resources concerning Gene Ontology and secondary structures. Using the spectrum kernel for feature representation of amino acid sequences and secondary structures, we explore an SVM-based learning method that classifies six subcellular localization sites: endoplasmic reticulum, extracellular, Golgi, membrane, mitochondria, and nucleus.