Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 2491 - 2520 of 6720

Full-Text Articles in Physical Sciences and Mathematics

Design And Implementation Of A Stand-Alone Tool For Metabolic Simulations, Milad Ghiasi Rad Dec 2017

Design And Implementation Of A Stand-Alone Tool For Metabolic Simulations, Milad Ghiasi Rad

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

In this thesis, we present the design and implementation of a stand-alone tool for metabolic simulations. This system is able to integrate custom-built SBML models along with external user’s input information and produces the estimation of any reactants participating in the chain of the reactions in the provided model, e.g., ATP, Glucose, Insulin, for the given duration using numerical analysis and simulations. This tool offers the food intake arguments in the calculations to consider the personalized metabolic characteristics in the simulations. The tool has also been generalized to take into consideration of temporal genomic information and be flexible for simulation …


On Modeling Sense Relatedness In Multi-Prototype Word Embedding, Yixin Cao, Juanzi Li, Jiaxin Shi, Zhiyuan Liu, Chengjiang Li Dec 2017

On Modeling Sense Relatedness In Multi-Prototype Word Embedding, Yixin Cao, Juanzi Li, Jiaxin Shi, Zhiyuan Liu, Chengjiang Li

Research Collection School Of Computing and Information Systems

To enhance the expression ability of distributional word representation learning model, many researchers tend to induce word senses through clustering, and learn multiple embedding vectors for each word, namely multi-prototype word embedding model. However, most related work ignores the relatedness among word senses which actually plays an important role. In this paper, we propose a novel approach to capture word sense relatedness in multi-prototype word embedding model. Particularly, we differentiate the original sense and extended senses of a word by introducing their global occurrence information and model their relatedness through the local textual context information. Based on the idea of …


Leveraging Auxiliary Tasks For Document-Level Cross-Domain Sentiment Classification, Jianfei Yu, Jing Jiang Dec 2017

Leveraging Auxiliary Tasks For Document-Level Cross-Domain Sentiment Classification, Jianfei Yu, Jing Jiang

Research Collection School Of Computing and Information Systems

In this paper, we study domain adaptationwith a state-of-the-art hierarchicalneural network for document-level sentimentclassification. We first design a newauxiliary task based on sentiment scoresof domain-independent words. We thenpropose two neural network architecturesto respectively induce document embeddingsand sentence embeddings that workwell for different domains. When thesedocument and sentence embeddings areused for sentiment classification, we findthat with both pseudo and external sentimentlexicons, our proposed methods canperform similarly to or better than severalhighly competitive domain adaptationmethods on a benchmark dataset of productreviews.


Using Teaching Cases For Achieving Bloom’S High-Order Cognitive Levels: An Application In Technically-Oriented Information Systems Course, Kar Way Tan Dec 2017

Using Teaching Cases For Achieving Bloom’S High-Order Cognitive Levels: An Application In Technically-Oriented Information Systems Course, Kar Way Tan

Research Collection School Of Computing and Information Systems

Case-teaching has been an attractive pedagogy method for bringing in real-world examples into the classroom. However, it is challenging to introduce cases to address high-order cognitive skills such as analyzing and creating new IT solutions in technically-oriented computing course. In this research, we present our experience in introducing three types of case studies -- Story-Telling case, Design-and-Problem-Solving case, and Create-Design-Implement case to a course in an undergraduate Information Systems programme. For each case study, we plan and map the learning objectives to address various cognitive levels in the revised Bloom’s Taxonomy. Using surveys conducted over two academic years, we show …


Disease Gene Classification With Metagraph Representations, Sezin Kircali Ata, Yuan Fang, Min Wu, Xiao-Li Li, Xiaokui Xiao Dec 2017

Disease Gene Classification With Metagraph Representations, Sezin Kircali Ata, Yuan Fang, Min Wu, Xiao-Li Li, Xiaokui Xiao

Research Collection School Of Computing and Information Systems

Protein-protein interaction (PPI) networks play an important role in studying the functional roles of proteins, including their association with diseases. However, protein interaction networks are not sufficient without the support of additional biological knowledge for proteins such as their molecular functions and biological processes. To complement and enrich PPI networks, we propose to exploit biological properties of individual proteins. More specifically, we integrate keywords describing protein properties into the PPI network, and construct a novel PPI-Keywords (PPIK) network consisting of both proteins and keywords as two different types of nodes. As disease proteins tend to have a similar topological characteristics …


Inferring Social Media Users’ Demographics From Profile Pictures: A Face++ Analysis On Twitter Users, Soon-Gyo Jung, Jisun An, Haewoon Kwak, Joni Salminen, Bernard J. Jansen Dec 2017

Inferring Social Media Users’ Demographics From Profile Pictures: A Face++ Analysis On Twitter Users, Soon-Gyo Jung, Jisun An, Haewoon Kwak, Joni Salminen, Bernard J. Jansen

Research Collection School Of Computing and Information Systems

In this research, we evaluate the applicability of using facial recognition of social media account profile pictures to infer the demographic attributes of gender, race, and age of the account owners leveraging a commercial and well-known image service, specifically Face++. Our goal is to determine the feasibility of this approach for actual system implementation. Using a dataset of approximately 10,000 Twitter profile pictures, we use Face++ to classify this set of images for gender, race, and age. We determine that about 30% of these profile pictures contain identifiable images of people using the current state-of-the-art automated means. We then employ …


Using Data Analytics For Discovering Library Resource Insights: Case From Singapore Management University, Ning Lu, Rui Song, Dina Li Gwek Heng, Swapna Gottipati, Aaron Tay Dec 2017

Using Data Analytics For Discovering Library Resource Insights: Case From Singapore Management University, Ning Lu, Rui Song, Dina Li Gwek Heng, Swapna Gottipati, Aaron Tay

Research Collection School Of Computing and Information Systems

Library resources are critical in supporting teaching, research and learning processes. Several universities have employed online platforms and infrastructure for enabling the online services to students, faculty and staff. To provide efficient services by understanding and predicting user needs libraries are looking into the area of data analytics. Library analytics in Singapore Management University is the project committed to provide an interface for data-intensive project collaboration, while supporting one of the library’s key pillars on its commitment to collaborate on initiatives with SMU Communities and external groups. In this paper, we study the transaction logs for user behavior analysis that …


Ethics And Bias In Machine Learning: A Technical Study Of What Makes Us “Good”, Ashley Nicole Shadowen Dec 2017

Ethics And Bias In Machine Learning: A Technical Study Of What Makes Us “Good”, Ashley Nicole Shadowen

Student Theses

The topic of machine ethics is growing in recognition and energy, but bias in machine learning algorithms outpaces it to date. Bias is a complicated term with good and bad connotations in the field of algorithmic prediction making. Especially in circumstances with legal and ethical consequences, we must study the results of these machines to ensure fairness. This paper attempts to address ethics at the algorithmic level of autonomous machines. There is no one solution to solving machine bias, it depends on the context of the given system and the most reasonable way to avoid biased decisions while maintaining the …


Utilizing Consumer Health Posts For Pharmacovigilance: Identifying Underlying Factors Associated With Patients’ Attitudes Towards Antidepressants, Maryam Zolnoori Dec 2017

Utilizing Consumer Health Posts For Pharmacovigilance: Identifying Underlying Factors Associated With Patients’ Attitudes Towards Antidepressants, Maryam Zolnoori

Theses and Dissertations

Non-adherence to antidepressants is a major obstacle to antidepressants therapeutic benefits, resulting in increased risk of relapse, emergency visits, and significant burden on individuals and the healthcare system. Several studies showed that non-adherence is weakly associated with personal and clinical variables, but strongly associated with patients’ beliefs and attitudes towards medications. The traditional methods for identifying the key dimensions of patients’ attitudes towards antidepressants are associated with some methodological limitations, such as concern about confidentiality of personal information. In this study, attempts have been made to address the limitations by utilizing patients’ self report experiences in online healthcare forums to …


A Novel Density Peak Clustering Algorithm Based On Squared Residual Error, Milan Parmar, Di Wang, Ah-Hwee Tan, Chunyan Miao, Jianhua Jiang, You Zhou Dec 2017

A Novel Density Peak Clustering Algorithm Based On Squared Residual Error, Milan Parmar, Di Wang, Ah-Hwee Tan, Chunyan Miao, Jianhua Jiang, You Zhou

Research Collection School Of Computing and Information Systems

The density peak clustering (DPC) algorithm is designed to quickly identify intricate-shaped clusters with high dimensionality by finding high-density peaks in a non-iterative manner and using only one threshold parameter. However, DPC has certain limitations in processing low-density data points because it only takes the global data density distribution into account. As such, DPC may confine in forming low-density data clusters, or in other words, DPC may fail in detecting anomalies and borderline points. In this paper, we analyze the limitations of DPC and propose a novel density peak clustering algorithm to better handle low-density clustering tasks. Specifically, our algorithm …


Secure Server-Aided Top-K Monitoring, Yujue Wang, Hwee Hwa Pang, Yanjiang Yang, Xuhua Ding Dec 2017

Secure Server-Aided Top-K Monitoring, Yujue Wang, Hwee Hwa Pang, Yanjiang Yang, Xuhua Ding

Research Collection School Of Computing and Information Systems

In a data streaming model, a data owner releases records or documents to a set of users with matching interests, in such a way that the match in interest can be calculated from the correlation between each pair of document and user query. For scalability and availability reasons, this calculation is delegated to third-party servers, which gives rise to the need to protect the integrity and privacy of the documents and user queries. In this paper, we propose a server-aided data stream monitoring scheme (DSM) to address the aforementioned integrity and privacy challenges, so that the users are able to …


Btci: A New Framework For Identifying Congestion Cascades Using Bus Trajectory Data, Meng-Fen Chiang, Ee Peng Lim, Wang-Chien Lee, Agus Trisnajaya Kwee Dec 2017

Btci: A New Framework For Identifying Congestion Cascades Using Bus Trajectory Data, Meng-Fen Chiang, Ee Peng Lim, Wang-Chien Lee, Agus Trisnajaya Kwee

Research Collection School Of Computing and Information Systems

The knowledge of traffic health status is essential to the general public and urban traffic management. To identify congestion cascades, an important phenomenon of traffic health, we propose a Bus Trajectory based Congestion Identification (BTCI) framework that explores the anomalous traffic health status and structure properties of congestion cascades using bus trajectory data. BTCI consists of two main steps, congested segment extraction and congestion cascades identification. The former constructs path speed models from historical vehicle transitions and design a non-parametric Kernel Density Estimation (KDE) function to derive a measure of congestion score. The latter aggregates congested segments (i.e., those with …


Analyzing The E-Learning Video Environment Requirements Of Generation Z Students Using Echo360 Platform, Swapna Gottipati, Venky Shankararaman Dec 2017

Analyzing The E-Learning Video Environment Requirements Of Generation Z Students Using Echo360 Platform, Swapna Gottipati, Venky Shankararaman

Research Collection School Of Computing and Information Systems

As with any other generational cohort,Generation Z students have their own unique characteristics that influencetheir approach to learning process. They are the future workforce and severalefforts are undertaken by Government and education institutes to consider thecharacteristics of Gen-Z in developing the curriculum and teaching environmentsuitable for these students. E-learning plays a key role in students learningprocess and has been widely adopted by many education institutions. Inparticular, videos play a major role in the learning process of Gen-Zstudents. The purpose of this paper isto focus the on requirements of Gen-Z students and to provide suggestions forhow to create a e-learning video …


D-Watch: Embracing “Bad” Multipaths For Device-Free Localization With Cots Rfid Devices, Ju Wang, Jie Xiong, Hongbo Jiang, Xiaojiang Chen, Dingyi Fang Dec 2017

D-Watch: Embracing “Bad” Multipaths For Device-Free Localization With Cots Rfid Devices, Ju Wang, Jie Xiong, Hongbo Jiang, Xiaojiang Chen, Dingyi Fang

Research Collection School Of Computing and Information Systems

Device-free localization, which does not require any device attached to the target, is playing a critical role in many applications, such as intrusion detection, elderly monitoring and so on. This paper introduces D-Watch, a device-free system built on the top of low cost commodity-off-the-shelf RFID hardware. Unlike previous works which consider multipaths detrimental, D-Watch leverages the ''bad'' multipaths to provide a decimeter-level localization accuracy without offline training. D-Watch harnesses the angle-of-arrival information from the RFID tags' backscatter signals. The key intuition is that whenever a target blocks a signal's propagation path, the signal power experiences a drop which can be …


Leveraging The Trade-Off Between Accuracy And Interpretability In A Hybrid Intelligent System, Di Wang, Chai Quek, Ah-Hwee Tan, Chunyan Miao, Geok See Ng, You Zhou Dec 2017

Leveraging The Trade-Off Between Accuracy And Interpretability In A Hybrid Intelligent System, Di Wang, Chai Quek, Ah-Hwee Tan, Chunyan Miao, Geok See Ng, You Zhou

Research Collection School Of Computing and Information Systems

Neural Fuzzy Inference System (NFIS) is a widely adopted paradigm to develop a data-driven learning system. This hybrid system has been widely adopted due to its accurate reasoning procedure and comprehensible inference rules. Although most NFISs primarily focus on accuracy, we have observed an ever increasing demand on improving the interpretability of NFISs and other types of machine learning systems. In this paper, we illustrate how we leverage the trade-off between accuracy and interpretability in an NFIS called Genetic Algorithm and Rough Set Incorporated Neural Fuzzy Inference System (GARSINFIS). In a nutshell, GARSINFIS self-organizes its network structure with a small …


Who Are Your Users? Comparing Media Professionals' Preconception Of Users To Data-Driven Personas, Lene Nielsen, Soon-Gyu Jung, Jisun An, Joni Salminen, Haewoon Kwak, Bernard J. Jansen Dec 2017

Who Are Your Users? Comparing Media Professionals' Preconception Of Users To Data-Driven Personas, Lene Nielsen, Soon-Gyu Jung, Jisun An, Joni Salminen, Haewoon Kwak, Bernard J. Jansen

Research Collection School Of Computing and Information Systems

One of the reasons for using personas is to align user understandings across project teams and sites. As part of a larger persona study, at Al Jazeera English (AJE), we conducted 16 qualitative interviews with media producers, the end users of persona descriptions. We asked the participants about their understanding of a typical AJE media consumer, and the variety of answers shows that the understandings are not aligned and are built on a mix of own experiences, own self, assumptions, and data given by the company. The answers are sometimes aligned with the data-driven personas and sometimes not. The end …


The Graph Database: Jack Of All Trades Or Just Not Sql?, George F. Hurlburt, Maria R. Lee, George K. Thiruvathukal Nov 2017

The Graph Database: Jack Of All Trades Or Just Not Sql?, George F. Hurlburt, Maria R. Lee, George K. Thiruvathukal

Computer Science: Faculty Publications and Other Works

This special issue of IT Professional focuses on the graph database. The graph database, a relatively new phenomenon, is well suited to the burgeoning information era in which we are increasingly becoming immersed. Here, the guest editors briefly explain how a graph database works, its relation to the relational database management system (RDBMS), and its quantitative and qualitative pros and cons, including how graph databases can be harnessed in a hybrid environment. They also survey the excellent articles submitted for this special issue.


Nbpmf: Novel Network-Based Inference Methods For Peptide Mass Fingerprinting, Zhewei Liang Nov 2017

Nbpmf: Novel Network-Based Inference Methods For Peptide Mass Fingerprinting, Zhewei Liang

Electronic Thesis and Dissertation Repository

Proteins are large, complex molecules that perform a vast array of functions in every living cell. A proteome is a set of proteins produced in an organism, and proteomics is the large-scale study of proteomes. Several high-throughput technologies have been developed in proteomics, where the most commonly applied are mass spectrometry (MS) based approaches. MS is an analytical technique for determining the composition of a sample. Recently it has become a primary tool for protein identification, quantification, and post translational modification (PTM) characterization in proteomics research. There are usually two different ways to identify proteins: top-down and bottom-up. Top-down approaches …


Multi-Step Tokenization Of Automated Clearing House Payment Transactions, Privin Alexander Nov 2017

Multi-Step Tokenization Of Automated Clearing House Payment Transactions, Privin Alexander

USF Tampa Graduate Theses and Dissertations

Since its beginnings in 1974, the Automated Clearing House (ACH) network has grown into one of the largest, safest, and most efficient payment systems in the world. An ACH transaction is an electronic funds transfer between bank accounts using a batch processing system.

Currently, the ACH Network moves almost $43 trillion and 25 billion electronic financial transactions each year. With the increasing movement toward an electronic, interconnected and mobile infrastructure, it is critical that electronic payments work safely and efficiently for all users. ACH transactions carry sensitive data, such as a consumer's name, account number, tax identification number, account holder …


A Study On The Practical Use Of Operations Research And Vessels Big Data In Benefit Of Efficient Ports Utilization In Panama, Gabriel Fuentes Lezcano Nov 2017

A Study On The Practical Use Of Operations Research And Vessels Big Data In Benefit Of Efficient Ports Utilization In Panama, Gabriel Fuentes Lezcano

World Maritime University Dissertations

No abstract provided.


Constructing A Clinical Research Data Management System, Michael C. Quintero Nov 2017

Constructing A Clinical Research Data Management System, Michael C. Quintero

USF Tampa Graduate Theses and Dissertations

Clinical study data is usually collected without knowing what kind of data is going to be collected in advance. In addition, all of the possible data points that can apply to a patient in any given clinical study is almost always a superset of the data points that are actually recorded for a given patient. As a result of this, clinical data resembles a set of sparse data with an evolving data schema. To help researchers at the Moffitt Cancer Center better manage clinical data, a tool was developed called GURU that uses the Entity Attribute Value model to handle …


Uncovering User-Triggered Privacy Leaks In Mobile Applications And Their Utility In Privacy Protection, Joo Keng Joseph Chan Nov 2017

Uncovering User-Triggered Privacy Leaks In Mobile Applications And Their Utility In Privacy Protection, Joo Keng Joseph Chan

Dissertations and Theses Collection

Mobile applications are increasingly popular, and help mobile users in many aspects of their lifestyle. Applications have access to a wealth of information about the user through powerful developer APIs. It is known that most applications, even popular and highly regarded ones, utilize and leak privacy data to the network. It is also common for applications to over-access privacy data that does not fit the functionality profile of the application. Although there are available privacy detection tools, they might not provide sufficient context to help users better understand the privacy behaviours of their applications. In this dissertation, I present the …


Indexable Bayesian Personalized Ranking For Efficient Top-K Recommendation, Dung D. Le, Hady W. Lauw Nov 2017

Indexable Bayesian Personalized Ranking For Efficient Top-K Recommendation, Dung D. Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Top-k recommendation seeks to deliver a personalized recommendation list of k items to a user. The dual objectives are (1) accuracy in identifying the items a user is likely to prefer, and (2) efficiency in constructing the recommendation list in real time. One direction towards retrieval efficiency is to formulate retrieval as approximate k nearest neighbor (kNN) search aided by indexing schemes, such as locality-sensitive hashing, spatial trees, and inverted index. These schemes, applied on the output representations of recommendation algorithms, speed up the retrieval process by automatically discarding a large number of potentially irrelevant items when given a user …


Color-Sketch Simulator: A Guide For Color-Based Visual Known-Item Search, Jakub Lokoč, Anh Nguyen Phuong, Marta Vomlelová, Chong-Wah Ngo Nov 2017

Color-Sketch Simulator: A Guide For Color-Based Visual Known-Item Search, Jakub Lokoč, Anh Nguyen Phuong, Marta Vomlelová, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

In order to evaluate the effectiveness of a color-sketch retrieval system for a given multimedia database, tedious evaluations involving real users are required as users are in the center of query sketch formulation. However, without any prior knowledge about the bottlenecks of the underlying sketch-based retrieval model, the evaluations may focus on wrong settings and thus miss the desired effect. Furthermore, users have usually no clues or recommendations to draw color-sketches effectively. In this paper, we aim at a preliminary analysis to identify potential bottlenecks of a flexible color-sketch retrieval model. We present a formal framework based on position-color feature …


An Integrated Framework For Modeling And Predicting Spatiotemporal Phenomena In Urban Environments, Tuc Viet Le Nov 2017

An Integrated Framework For Modeling And Predicting Spatiotemporal Phenomena In Urban Environments, Tuc Viet Le

Dissertations and Theses Collection (Open Access)

This thesis proposes a general solution framework that integrates methods in machine learning in creative ways to solve a diverse set of problems arising in urban environments. It particularly focuses on modeling spatiotemporal data for the purpose of predicting urban phenomena. Concretely, the framework is applied to solve three specific real-world problems: human mobility prediction, trac speed prediction and incident prediction. For human mobility prediction, I use visitor trajectories collected a large theme park in Singapore as a simplified microcosm of an urban area. A trajectory is an ordered sequence of attraction visits and corresponding timestamps produced by a visitor. …


Scalable Online Kernel Learning, Jing Lu Nov 2017

Scalable Online Kernel Learning, Jing Lu

Dissertations and Theses Collection (Open Access)

One critical deficiency of traditional online kernel learning methods is their increasing and unbounded number of support vectors (SV’s), making them inefficient and non-scalable for large-scale applications. Recent studies on budget online learning have attempted to overcome this shortcoming by bounding the number of SV’s. Despite being extensively studied, budget algorithms usually suffer from several drawbacks.
First of all, although existing algorithms attempt to bound the number of SV’s at each iteration, most of them fail to bound the number of SV’s for the final averaged classifier, which is commonly used for online-to-batch conversion. To solve this problem, we propose …


Selective Value Coupling Learning For Detecting Outliers In High-Dimensional Categorical Data, Guansong Pang, Hongzuo Xu, Cao Longbing, Wentao Zhao Nov 2017

Selective Value Coupling Learning For Detecting Outliers In High-Dimensional Categorical Data, Guansong Pang, Hongzuo Xu, Cao Longbing, Wentao Zhao

Research Collection School Of Computing and Information Systems

This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a full data space or feature subspaces that are identified independently from subsequent outlier scoring. As a result, they are significantly challenged by overwhelming irrelevant features in high-dimensional data due to the noise brought by the irrelevant features and its huge search space. In contrast, SelectVC works on a clean and condensed data space spanned by selective …


Unsupervised Topic Hypergraph Hashing For Efficient Mobile Image Retrieval, Lei Zhu, Jialie Shen, Liang Xie, Zhiyong Cheng Nov 2017

Unsupervised Topic Hypergraph Hashing For Efficient Mobile Image Retrieval, Lei Zhu, Jialie Shen, Liang Xie, Zhiyong Cheng

Research Collection School Of Computing and Information Systems

Hashing compresses high-dimensional features into compact binary codes. It is one of the promising techniques to support efficient mobile image retrieval, due to its low data transmission cost and fast retrieval response. However, most of existing hashing strategies simply rely on low-level features. Thus, they may generate hashing codes with limited discriminative capability. Moreover, many of them fail to exploit complex and high-order semantic correlations that inherently exist among images. Motivated by these observations, we propose a novel unsupervised hashing scheme, called topic hypergraph hashing (THH), to address the limitations. THH effectively mitigates the semantic shortage of hashing codes by …


Predicting Indoor Crowd Density Using Column-Structured Deep Neural Network, Akihito Sudo, Teck Hou (Deng Dehao) Teng, Hoong Chuin Lau, Yoshihide Sekimoto Nov 2017

Predicting Indoor Crowd Density Using Column-Structured Deep Neural Network, Akihito Sudo, Teck Hou (Deng Dehao) Teng, Hoong Chuin Lau, Yoshihide Sekimoto

Research Collection School Of Computing and Information Systems

This work proposes a deep neural network approach known as the column-structured deep neural network (COL-DNN-R) for predicting crowd density in an indoor environment using historical Wi-Fi traces of individual visitors. With a structure designed to minimize feature engineering, COL-DNN accepts raw features such as crowd density, opening and closing hours and peak visitor counts for extracting features. The extracted features are used by a regression model R for predicting the crowd densities. Standard regression models such as MLP, RF and SVM can be used as R. Experiments are performed to investigate the effect of feature representation and model structure …


Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu Nov 2017

Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu

Research Collection School Of Computing and Information Systems

Data fusion is a fundamental research problem of identifyingtrue values of data items of interest from conflicting multi-sourceddata. Although considerable research efforts have been conducted on thistopic, existing approaches generally assume every data item has exactlyone true value, which fails to reflect the real world where data items withmultiple true values widely exist. In this paper, we propose a novel approach,SourceVote, to estimate value veracity for multi-valued data items.SourceVote models the endorsement relations among sources by quantifyingtheir two-sided inter-source agreements. In particular, two graphs areconstructed to model inter-source relations. Then two aspects of sourcereliability are derived from these graphs and …