Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 4141 - 4170 of 6727

Full-Text Articles in Physical Sciences and Mathematics

Hunts: A Trajectory Recommendation System For Effective And Efficient Hunting Of Taxi Passengers, Ye Ding, Siyuan Liu, Jiansu Pu, Lionel Ni Jun 2013

Hunts: A Trajectory Recommendation System For Effective And Efficient Hunting Of Taxi Passengers, Ye Ding, Siyuan Liu, Jiansu Pu, Lionel Ni

Research Collection School Of Computing and Information Systems

Nowadays, there are many taxis traversing around the city searching for available passengers, but their hunts of passengers are not always efficient. To the dynamics of traffic and biased passenger distributions, current offline recommendations based on place of interests may not work well. In this paper, we define a new problem, global-optimal trajectory retrieving (GOTR), as finding a connected trajectory of high profit and high probability to pick up a passenger within a given time period in real-time. To tackle this challenging problem, we present a system, called HUNTS, based on the knowledge from both historical and online GPS data …


Based On Repeated Experience, System For Modification Of Expression And Negating Overload From Media And Optimizing Referential Efficiency, Peter R. Badovinatz, Veronika M. Megler Jun 2013

Based On Repeated Experience, System For Modification Of Expression And Negating Overload From Media And Optimizing Referential Efficiency, Peter R. Badovinatz, Veronika M. Megler

Computer Science Faculty Publications and Presentations

Content items are revealed to a user based on whether they have been previously reviewed by the user. A number of content items are thus received over time. The content items may be discrete content items, or may be portions of a content stream, and may be received over different media. For each content item, it is determined whether the content item was previously reviewed by a user. Where the content item was not previously reviewed, the item is revealed to the user, such as by being displayed or announced to the user. Where the content item was previously reviewed, …


A Latent Variable Model For Viewpoint Discovery From Threaded Forum Posts, Minghui Qiu, Jing Jiang Jun 2013

A Latent Variable Model For Viewpoint Discovery From Threaded Forum Posts, Minghui Qiu, Jing Jiang

Research Collection School Of Computing and Information Systems

Threaded discussion forums provide an important social media platform. Its rich user generated content has served as an important source of public feedback. To automatically discover the viewpoints or stances on hot issues from forum threads is an important and useful task. In this paper, we propose a novel latent variable model for viewpoint discovery from threaded forum posts. Our model is a principled generative latent variable model which captures three important factors: viewpoint specific topic preference, user identity and user interactions. Evaluation results show that our model clearly outperforms a number of baseline models in terms of both clustering …


Mining User Relations From Online Discussions Using Sentiment Analysis And Probabilistic Matrix Factorization, Minghui Qiu, Liu Yang, Jing Jiang Jun 2013

Mining User Relations From Online Discussions Using Sentiment Analysis And Probabilistic Matrix Factorization, Minghui Qiu, Liu Yang, Jing Jiang

Research Collection School Of Computing and Information Systems

Advances in sentiment analysis have enabled extraction of user relations implied in online textual exchanges such as forum posts. However, recent studies in this direction only consider direct relation extraction from text. As user interactions can be sparse in online discussions, we propose to apply collaborative filtering through probabilistic matrix factorization to generalize and improve the opinion matrices extracted from forum posts. Experiments with two tasks show that the learned latent factor representation can give good performance on a relation polarity prediction task and improve the performance of a subgroup detection task.


Anomaly Detection On Social Data, Hanbo Dai Jun 2013

Anomaly Detection On Social Data, Hanbo Dai

Dissertations and Theses Collection (Open Access)

The advent of online social media including Facebook, Twitter, Flickr and Youtube has drawn massive attention in recent years. These online platforms generate massive data capturing the behavior of multiple types of human actors as they interact with one another and with resources such as pictures, books and videos. Unfortunately, the openness of these platforms often leaves them highly susceptible to abuse by suspicious entities such as spammers. It therefore becomes increasingly important to automatically identify these suspicious entities and eliminate their threats. We call these suspicious entities anomalies in social data, as they often hold different agenda comparing to …


An Analysis Of Generational Caching Implemented In A Production Website, Marc E. Zych Jun 2013

An Analysis Of Generational Caching Implemented In A Production Website, Marc E. Zych

Master's Theses

Website scaling has been an issue since the inception of the web. The demand for user generated content and personalized web pages requires the use of a database for a storage engine. Unfortunately, scaling the database to handle large amounts of traffic is still a problem many companies face. One such company is iFixit, a provider of free, publicly-editable, online repair manuals. Like many websites, iFixit uses Memcached to decrease database load and improve response time. However, the caching strategy used is a very ad hoc one and therefore can be greatly improved.

Most research regarding web application caching focuses …


A Direct Mining Approach To Efficient Constrained Graph Pattern Discovery, Feida Zhu, Zequn Zhang, Qiang Qu Jun 2013

A Direct Mining Approach To Efficient Constrained Graph Pattern Discovery, Feida Zhu, Zequn Zhang, Qiang Qu

Research Collection School Of Computing and Information Systems

Despite the wealth of research on frequent graph pattern mining, how to efficiently mine the complete set of those with constraints still poses a huge challenge to the existing algorithms mainly due to the inherent bottleneck in the mining paradigm. In essence, mining requests with explicitly-specified constraints cannot be handled in a way that is direct and precise. In this paper, we propose a direct mining framework to solve the problem and illustrate our ideas in the context of a particular type of constrained frequent patterns — the “skinny” patterns, which are graph patterns with a long backbone from which …


Real Time Event Detection In Twitter, Xun Wang, Feida Zhu, Jing Jiang, Sujian Li Jun 2013

Real Time Event Detection In Twitter, Xun Wang, Feida Zhu, Jing Jiang, Sujian Li

Research Collection School Of Computing and Information Systems

Event detection has been an important task for a long time. When it comes to Twitter, new problems are presented. Twitter data is a huge temporal data flow with much noise and various kinds of topics. Traditional sophisticated methods with a high computational complexity aren’t designed to handle such data flow efficiently. In this paper, we propose a mixture Gaussian model for bursty word extraction in Twitter and then employ a novel time-dependent HDP model for new topic detection. Our model can grasp new events, the location and the time an event becomes bursty promptly and accurately. Experiments show the …


A Latent Variable Model For Viewpoint Discovery From Threaded Forum Posts, Minghui Qiu, Jing Jiang Jun 2013

A Latent Variable Model For Viewpoint Discovery From Threaded Forum Posts, Minghui Qiu, Jing Jiang

Research Collection School Of Computing and Information Systems

No abstract provided.


Mining User Relations From Online Discussions Using Sentiment Analysis And Probabilistic Matrix Factorization, Minghui Qiu, Liu Yang, Jing Jiang Jun 2013

Mining User Relations From Online Discussions Using Sentiment Analysis And Probabilistic Matrix Factorization, Minghui Qiu, Liu Yang, Jing Jiang

Research Collection School Of Computing and Information Systems

No abstract provided.


Automated Extraction Of Community Mobility Measures From Gps Stream Data Using Temporal Dbscan, Sungsoon Hwang, Timothy Hanke, Christian Evans May 2013

Automated Extraction Of Community Mobility Measures From Gps Stream Data Using Temporal Dbscan, Sungsoon Hwang, Timothy Hanke, Christian Evans

Sungsoon Hwang

Inferring community mobility of patients from GPS data has received much attention in health research. Developing robust mobility (or physical activity) monitoring systems relies on the automated algorithm that classifies GPS track points into events (such as stops where activities are conducted, and routes taken) accurately. This paper describes the method that automatically extracts community mobility measures from GPS track data. The method uses temporal DBSCAN in classifying track points, and temporal filtering in removing noises (any misclassified track points). The result shows that the proposed method classifies track points with 88% accuracy. The percent of misclassified track points decreased …


Concept Graphs: Applications To Biomedical Text Categorization And Concept Extraction, Said Bleik May 2013

Concept Graphs: Applications To Biomedical Text Categorization And Concept Extraction, Said Bleik

Dissertations

As science advances, the underlying literature grows rapidly providing valuable knowledge mines for researchers and practitioners. The text content that makes up these knowledge collections is often unstructured and, thus, extracting relevant or novel information could be nontrivial and costly. In addition, human knowledge and expertise are being transformed into structured digital information in the form of vocabulary databases and ontologies. These knowledge bases hold substantial hierarchical and semantic relationships of common domain concepts. Consequently, automating learning tasks could be reinforced with those knowledge bases through constructing human-like representations of knowledge. This allows developing algorithms that simulate the human reasoning …


A Comparative Look At Entity Framework Code First, Casey Griffin May 2013

A Comparative Look At Entity Framework Code First, Casey Griffin

Mathematics and Computer Science Capstones

The motivation behind this project is to examine what “Entity Framework Code First” is bringing to the world of object relational mappers and data access and how it compares to the more traditional methods of the past. The problem is whether Entity Framework’s high level of abstraction from the database schema is useful to the developers in reducing code development or if traditional approaches with their robust, custom data layers provide developers with an overall better performance. To analyze Entity Framework, a real-world business web application was developed implementing an Entity Framework Code First approach to data access. Using this …


Main-Stream Media Behaviour Analysis On Twitter: A Case Study On Uk General Election, Zhongyu Wei, Yulan He, Wei Gao, Binyang Li, Lanjun Zhou, Kam-Fai Wong May 2013

Main-Stream Media Behaviour Analysis On Twitter: A Case Study On Uk General Election, Zhongyu Wei, Yulan He, Wei Gao, Binyang Li, Lanjun Zhou, Kam-Fai Wong

Research Collection School Of Computing and Information Systems

With the development of social media tools such as Facebook and Twitter, mainstream media organizations including newspapers and TV media have played an active role in engaging with their audience and strengthening their influence on the recently emerged platforms. In this paper, we analyze the behavior of mainstream media on Twitter and study how they exert their influence to shape public opinion during the UK's 2010 General Election. We first propose an empirical measure to quantify mainstream media bias based on sentiment analysis and show that it correlates better with the actual political bias in the UK media than the …


Global Technical Communication And Content Management: A Study Of Multilingual Quality, Tatiana Batova May 2013

Global Technical Communication And Content Management: A Study Of Multilingual Quality, Tatiana Batova

Theses and Dissertations

The field of technical communication (TC) is facing a dilemma. Content management (CM) strategies and technologies that completely reshape writing and translation practices are adopted in an increasing number of TC work groups. One driving factor in CM adoption is the promise of improving quality of multilingual technical texts, all the while reducing time/cost of technical translation and localization. Yet, CM relies on automation and privileges consistency¯an approach that is problematic in global TC with its focus on adapting texts based on the characteristics of end-users.

To better understand the interdisciplinary dilemma of multilingual quality in CM, during my dissertation …


A Comparison Of Leading Database Storage Engines In Support Of Online Analytical Processing In An Open Source Environment, Gabriel Tocci May 2013

A Comparison Of Leading Database Storage Engines In Support Of Online Analytical Processing In An Open Source Environment, Gabriel Tocci

Electronic Theses and Dissertations

Online Analytical Processing (OLAP) has become the de facto data analysis technology used in modern decision support systems. It has experienced tremendous growth, and is among the top priorities for enterprises. Open source systems have become an effective alternative to proprietary systems in terms of cost and function. The purpose of the study was to investigate the performance of two leading database storage engines in an open source OLAP environment. Despite recent upgrades in performance features for the InnoDB database engine, the MyISAM database engine is shown to outperform the InnoDB database engine under a standard benchmark. This result was …


Unified Entity Search In Social Media Community, Ting Yao, Yuan Liu, Chong-Wah Ngo, Tao Mei May 2013

Unified Entity Search In Social Media Community, Ting Yao, Yuan Liu, Chong-Wah Ngo, Tao Mei

Research Collection School Of Computing and Information Systems

The search for entities is the most common search behavior on the Web, especially in social media communities where entities (such as images, videos, people, locations, and tags) are highly heterogeneous and correlated. While previous research usually deals with these social media entities separately, we are investigating in this paper a unified, multilevel, and correlative entity graph to represent the unstructured social media data, through which various applications (e.g., friend suggestion, personalized image search, image tagging, etc.) can be realized more effectively in one single framework. We regard the social media objects equally as “entities” and all of these applications …


Why Individuals Seek Diverse Opinions (Or Why They Don't), Jisun An, Daniele Quercia, Jon Crowcroft May 2013

Why Individuals Seek Diverse Opinions (Or Why They Don't), Jisun An, Daniele Quercia, Jon Crowcroft

Research Collection School Of Computing and Information Systems

Fact checking has been hard enough to do in traditional settings, but, as news consumption is moving on the Internet and sources multiply, it is almost unmanageable. To solve this problem, researchers have created applications that expose people to diverse opinions and, as a result, expose them to balanced information. The wisdom of this solution is, however, placed in doubt by this paper. Survey responses of 60 individuals in the UK and South Korea and in-depth structured interviews of 10 respondents suggest that exposure to diverse opinions would not always work. That is partly because not all individuals equally value …


R-Energy For Evaluating Robustness Of Dynamic Networks, Ming Gao, Ee Peng Lim, David Lo May 2013

R-Energy For Evaluating Robustness Of Dynamic Networks, Ming Gao, Ee Peng Lim, David Lo

Research Collection School Of Computing and Information Systems

The robustness of a network is determined by how well its vertices are connected to one another so as to keep the network strong and sustainable. As the network evolves its robustness changes and may reveal events as well as periodic trend patterns that affect the interactions among users in the network. In this paper, we develop R-energy as a new measure of network robustness based on the spectral analysis of normalized Laplacian matrix. R-energy can cope with disconnected networks, and is efficient to compute with a time complexity of O (jV j + jEj) where V and E are …


Fragmented Social Media: A Look Into Selective Exposure To Political News, Jisun An, Daniele Quercia, Jon Crowcroft May 2013

Fragmented Social Media: A Look Into Selective Exposure To Political News, Jisun An, Daniele Quercia, Jon Crowcroft

Research Collection School Of Computing and Information Systems

The hypothesis of selective exposure assumes that people crave like-minded information and eschew information that conflicts with their beliefs, and that has negative consequences on political life. Yet, despite decades of research, this hypothesis remains theoretically promising but empirically difficult to test. We look into news articles shared on Facebook and examine whether selective exposure exists or not in social media. We find a concrete evidence for a tendency that users predominantly share like-minded news articles and avoid conflicting ones, and partisans are more likely to do that. Building tools to counter partisanship on social media would require the ability …


Enabling Generative, Emergent Artificial Culture, Jaroslaw Kochanowicz, Ah-Hwee Tan, Daniel Thalmann May 2013

Enabling Generative, Emergent Artificial Culture, Jaroslaw Kochanowicz, Ah-Hwee Tan, Daniel Thalmann

Research Collection School Of Computing and Information Systems

Despite the demand for culturally placed agent models, an adequate simulation approach to the relationship between group-cultural and individual-psychological qualities, including culture emergence, is just appearing. It could be argued that we are at the beginning of a domain forming process, a dawn of generative, emergent artificial culture. In this context we discuss current limitations and argue e.g. that too far reaching agent simplicity within Agent Based Modeling limits the emergence of realistic cultural-conventional level and we advocate psychologically rich models of culture forming mechanisms. We propose an approach to cultural phenomena modeling based on the interaction of habitual, affective …


Vehicle Localization Along A Previously Driven Route Using An Image Database, Hideyuki Kume, Arne Suppe, Arne Suppe May 2013

Vehicle Localization Along A Previously Driven Route Using An Image Database, Hideyuki Kume, Arne Suppe, Arne Suppe

Research Collection School Of Computing and Information Systems

In most autonomous driving applications, such as parking and commuting, a vehicle follows a previously taken route, or almost the same route. In this paper, we propose a method to localize a vehicle along a previously driven route using images. The proposed method consists of two stages: offline creation of a database, and online localization. In the offline stage, a database is created from images that are captured when the vehicle drives a route for the first time. The database consists of images, 3D positions of feature points estimated by structure-from-motion, and a topological graph. In the online stage, the …


Personal Informatics In Chronic Illness Management, Haley Macleod, Anthony Tang, Sheelagh Carpendale May 2013

Personal Informatics In Chronic Illness Management, Haley Macleod, Anthony Tang, Sheelagh Carpendale

Research Collection School Of Computing and Information Systems

Many people with chronic illness suffer from debilitating symptoms or episodes that inhibit normal day-to-day function. Pervasive tools offer the possibility to help manage these conditions, particularly by helping people understand their conditions. But, it is unclear how to design these tools, as prior designs have focused on effortful tracking and many see those tools as a burden to use. We report here on an interview study with 12 individuals with chronic illnesses who collect personal data. We learn that these people are motivated through self-discovery and curiosity. We explore how these concepts may support the design of tools that …


Impact Of Multimedia In Sina Weibo: Popularity And Life Span, Xun Zhao, Feida Zhu, Weining Qian, Aoying Zhou May 2013

Impact Of Multimedia In Sina Weibo: Popularity And Life Span, Xun Zhao, Feida Zhu, Weining Qian, Aoying Zhou

Research Collection School Of Computing and Information Systems

Multimedia contents such as images and videos are widely used in social network sites nowadays. Sina Weibo, a Chinese microblogging service, is one of the first microblog platforms to incorporate multimedia content sharing features. This work provides statistical analysis on how multimedia contents are produced, consumed, and propagated in Sina Weibo. Based on 230 million tweets and 1.8 million user profiles in Sina Weibo, we study the impact of multimedia contents on the popularity of both users and tweets as well as tweet life span. Our preliminary study shows that multimedia tweets dominant pure text ones in Sina Weibo. Multimedia …


Cost-Sensitive Double Updating Online Learning And Its Application To Online Anomaly Detection, Peilin Zhao, Steven C. H. Hoi May 2013

Cost-Sensitive Double Updating Online Learning And Its Application To Online Anomaly Detection, Peilin Zhao, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Although both cost-sensitive classification and online learning have been well studied separately in data mining and machine learning, there was very few comprehensive study of cost-sensitive online classification in literature. In this paper, we formally investigate this problem by directly optimizing cost-sensitive measures for an online classification task. As the first comprehensive study, we propose the Cost-Sensitive Double Updating Online Learning (CSDUOL) algorithms, which explores a recent double updating technique to tackle the online optimization task of cost-sensitive classification by maximizing the weighted sum or minimizing the weighted misclassification cost. We theoretically analyze the cost-sensitive measure bounds of the proposed …


Retweeting: An Act Of Viral Users, Susceptible Users, Or Viral Topics?, Tuan-Anh Hoang, Ee Peng Lim May 2013

Retweeting: An Act Of Viral Users, Susceptible Users, Or Viral Topics?, Tuan-Anh Hoang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

When a user retweets, there are three behavioral factors that cause the actions. They are the topic virality, user virality and user susceptibility. Topic virality captures the degree to which a topic attracts retweets by users. For each topic, user virality and susceptibility refer to the likelihood that a user attracts retweets and performs retweeting respectively. To model a set of observed retweet data as a result of these three topic specific factors, we first represent the retweets as a three-dimensional tensor of the tweet authors, their followers, and the tweets themselves. We then propose the V 2S model, a …


A Secure And Fair Resource Sharing Model For Community Clouds, Santhosh S. Anand May 2013

A Secure And Fair Resource Sharing Model For Community Clouds, Santhosh S. Anand

Graduate Theses and Dissertations

Cloud computing has gained a lot of importance and has been one of the most discussed segment of today's IT industry. As enterprises explore the idea of using clouds, concerns have emerged related to cloud security and standardization. This thesis explores whether the Community Cloud Deployment Model can provide solutions to some of the concerns associated with cloud computing. A secure framework based on trust negotiations for resource sharing within the community is developed as a means to provide standardization and security while building trust during resource sharing within the community. Additionally, a model for fair sharing of resources is …


Data Near Here: Bringing Relevant Data Closer To Scientists, Veronika M. Megler, David Maier May 2013

Data Near Here: Bringing Relevant Data Closer To Scientists, Veronika M. Megler, David Maier

Computer Science Faculty Publications and Presentations

Large scientific repositories run the risk of losing value as their holdings expand, if it means increased effort for a scientist to locate particular datasets of interest. We discuss the challenges that scientists face in locating relevant data, and present our work in applying Information Retrieval techniques to dataset search, as embodied in the Data Near Here application.


Behind The Magical Numbers: Hierarchical Chunking And The Human Working Memory Capacity, Guoqi Li, Ning Ning, Kiruthika Ramanathan, Wei He, Li Pan, Luping Shi May 2013

Behind The Magical Numbers: Hierarchical Chunking And The Human Working Memory Capacity, Guoqi Li, Ning Ning, Kiruthika Ramanathan, Wei He, Li Pan, Luping Shi

Research Collection School Of Computing and Information Systems

To explore the influence of chunking on the capacity limits of working memory, a model for chunking in sequential working memory is proposed, using hierarchical bidirectional inhibition-connected neural networks with winnerless competition. With the assumption of the existence of an upper bound to the inhibitory weights in neurobiological networks, it is shown that chunking increases the number of memorized items in working memory from the "magical number 7" to 16 items. The optimal number of chunks and the number of the memorized items in each chunk are the "magical number 4".


A Hybrid Recommendation System Based On Association Rules, Ahmed Alsalama May 2013

A Hybrid Recommendation System Based On Association Rules, Ahmed Alsalama

Masters Theses & Specialist Projects

Recommendation systems are widely used in e-commerce applications. The
engine of a current recommendation system recommends items to a particular user based on user preferences and previous high ratings. Various recommendation schemes such as collaborative filtering and content-based approaches are used to build a recommendation system. Most of current recommendation systems were developed to fit a certain domain such as books, articles, and movies. We propose a hybrid framework recommendation system to be applied on two dimensional spaces (User × Item) with a large number of users and a small number of items. Moreover, our proposed framework makes use of …