Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine learning

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1441 - 1470 of 1686

Full-Text Articles in Physical Sciences and Mathematics

Where Is The Goldmine? Finding Promising Business Locations Through Facebook Data Analytics, Jovian Lin, Richard Oentaryo, Ee-Peng Lim, Casey Vu, Adrian Vu, Agus Kwee Jul 2016

Where Is The Goldmine? Finding Promising Business Locations Through Facebook Data Analytics, Jovian Lin, Richard Oentaryo, Ee-Peng Lim, Casey Vu, Adrian Vu, Agus Kwee

Research Collection School Of Computing and Information Systems

If you were to open your own cafe, would you not want to effortlessly identify the most suitable location to set up your shop? Choosing an optimal physical location is a critical decision for numerous businesses, as many factors contribute to the final choice of the location. In this paper, we seek to address the issue by investigating the use of publicly available Facebook Pages data-which include user "check-ins", types of business, and business locations-to evaluate a user-selected physical location with respect to a type of business. Using a dataset of 20,877 food businesses in Singapore, we conduct analysis of …


Machine Learning Methods For Medical And Biological Image Computing, Rongjian Li Jul 2016

Machine Learning Methods For Medical And Biological Image Computing, Rongjian Li

Computer Science Theses & Dissertations

Medical and biological imaging technologies provide valuable visualization information of structure and function for an organ from the level of individual molecules to the whole object. Brain is the most complex organ in body, and it increasingly attracts intense research attentions with the rapid development of medical and bio-logical imaging technologies. A massive amount of high-dimensional brain imaging data being generated makes the design of computational methods for efficient analysis on those images highly demanded. The current study of computational methods using hand-crafted features does not scale with the increasing number of brain images, hindering the pace of scientific discoveries …


Machine Learning Methods For Brain Image Analysis, Ahmed Fakhry Jul 2016

Machine Learning Methods For Brain Image Analysis, Ahmed Fakhry

Computer Science Theses & Dissertations

Understanding how the brain functions and quantifying compound interactions between complex synaptic networks inside the brain remain some of the most challenging problems in neuroscience. Lack or abundance of data, shortage of manpower along with heterogeneity of data following from various species all served as an added complexity to the already perplexing problem. The ability to process vast amount of brain data need to be performed automatically, yet with an accuracy close to manual human-level performance. These automated methods essentially need to generalize well to be able to accommodate data from different species. Also, novel approaches and techniques are becoming …


Determining The Effectiveness Of Soil Treatment On Plant Stress Using Smart-Phone Cameras, Anurag Panwar Jun 2016

Determining The Effectiveness Of Soil Treatment On Plant Stress Using Smart-Phone Cameras, Anurag Panwar

USF Tampa Graduate Theses and Dissertations

Plants are vital to the health of our biosphere, and effectively sustaining their growth is fundamental to the existence of life on this planet. A critical aspect, which decides the sustainability of plant growth is the quality of soil. All other things being fixed, the quality of soil greatly impacts the plant stress, which in turn impacts overall health. Although plant stress manifests in many ways, one of the clearest indicators are colors of the leaves. In this thesis, we conducted an experimental study in a greenhouse for detecting plant stress caused by nutrient deficienceies in soil using smartphone cameras, …


Categorizing Blog Spam, Brandon Bevans Jun 2016

Categorizing Blog Spam, Brandon Bevans

Master's Theses

The internet has matured into the focal point of our era. Its ecosystem is vast, complex, and in many regards unaccounted for. One of the most prevalent aspects of the internet is spam. Similar to the rest of the internet, spam has evolved from simply meaning ‘unwanted emails’ to a blanket term that encompasses any unsolicited or illegitimate content that appears in the wide range of media that exists on the internet.

Many forms of spam permeate the internet, and spam architects continue to develop tools and methods to avoid detection. On the other side, cyber security engineers continue to …


Machine Learning For Disease Prediction, Abraham Jacob Frandsen Jun 2016

Machine Learning For Disease Prediction, Abraham Jacob Frandsen

Theses and Dissertations

Millions of people in the United States alone suffer from undiagnosed or late-diagnosed chronic diseases such as Chronic Kidney Disease and Type II Diabetes. Catching these diseases earlier facilitates preventive healthcare interventions, which in turn can lead to tremendous cost savings and improved health outcomes. We develop algorithms for predicting disease occurrence by drawing from ideas and techniques in the field of machine learning. We explore standard classification methods such as logistic regression and random forest, as well as more sophisticated sequence models, including recurrent neural networks. We focus especially on the use of medical code data for disease prediction, …


Exploring Data Mining Techniques For Tree Species Classification Using Co-Registered Lidar And Hyperspectral Data, Julia K. Marrs May 2016

Exploring Data Mining Techniques For Tree Species Classification Using Co-Registered Lidar And Hyperspectral Data, Julia K. Marrs

Theses and Dissertations

NASA Goddard’s LiDAR, Hyperspectral, and Thermal imager provides co-registered remote sensing data on experimental forests. Data mining methods were used to achieve a final tree species classification accuracy of 68% using a combined LiDAR and hyperspectral dataset, and show promise for addressing deforestation and carbon sequestration on a species-specific level.


An Exercise And Sports Equipment Recognition System, Siddarth Kalra May 2016

An Exercise And Sports Equipment Recognition System, Siddarth Kalra

Electronic Thesis and Dissertation Repository

Most mobile health management applications today require manual input or use sensors like the accelerometer or GPS to record user data. The onboard camera remains underused. We propose an Exercise and Sports Equipment Recognition System (ESRS) that can recognize physical activity equipment from raw image data. This system can be integrated with mobile phones to allow the camera to become a primary input device for recording physical activity. We employ a deep convolutional neural network to train models capable of recognizing 14 different equipment categories. Furthermore, we propose a preprocessing scheme that uses color normalization and denoising techniques to improve …


A General Framework Of Large-Scale Convex Optimization Using Jensen Surrogates And Acceleration Techniques, Soysal Degirmenci May 2016

A General Framework Of Large-Scale Convex Optimization Using Jensen Surrogates And Acceleration Techniques, Soysal Degirmenci

McKelvey School of Engineering Theses & Dissertations

In a world where data rates are growing faster than computing power, algorithmic acceleration based on developments in mathematical optimization plays a crucial role in narrowing the gap between the two. As the scale of optimization problems in many fields is getting larger, we need faster optimization methods that not only work well in theory, but also work well in practice by exploiting underlying state-of-the-art computing technology.

In this document, we introduce a unified framework of large-scale convex optimization using Jensen surrogates, an iterative optimization method that has been used in different fields since the 1970s. After this general treatment, …


Revelation Of Yin-Yang Balance In Microbial Cell Factories By Data Mining, Flux Modeling, And Metabolic Engineering, Gang Wu May 2016

Revelation Of Yin-Yang Balance In Microbial Cell Factories By Data Mining, Flux Modeling, And Metabolic Engineering, Gang Wu

McKelvey School of Engineering Theses & Dissertations

The long-held assumption of never-ending rapid growth in biotechnology and especially in synthetic biology has been recently questioned, due to lack of substantial return of investment. One of the main reasons for failures in synthetic biology and metabolic engineering is the metabolic burdens that result in resource losses. Metabolic burden is defined as the portion of a host cells resources either energy molecules (e.g., NADH, NADPH and ATP) or carbon building blocks (e.g., amino acids) that is used to maintain the engineered components (e.g., pathways). As a result, the effectiveness of synthetic biology tools heavily dependents on cell capability to …


Data Driven Sample Generator Model With Application To Classification, Alvaro Emilio Ulloa Cerna May 2016

Data Driven Sample Generator Model With Application To Classification, Alvaro Emilio Ulloa Cerna

Mathematics & Statistics ETDs

Despite the rapidly growing interest, progress in the study of relations between physiological abnormalities and mental disorders is hampered by complexity of the human brain and high costs of data collection. The complexity can be captured by machine learning approaches, but they still may require significant amounts of data. In this thesis, we seek to mitigate the latter challenge by developing a data driven sample generator model for the generation of synthetic realistic training data. Our method greatly improves generalization in classification of schizophrenia patients and healthy controls from their structural magnetic resonance images. A feed forward neural network trained …


A Comparative Approach To Question Answering Systems, Josue Balandrano Coronel May 2016

A Comparative Approach To Question Answering Systems, Josue Balandrano Coronel

Theses and Dissertations

In this paper I will analyze three different algorithms and approaches to implement Question Answering Systems (QA-Systems). I will analyze the efficiency, strengths, and weaknesses of multiple algorithms by explaining them in detail and comparing them with each other. The overarching aim of this thesis is to explore ideas that can be used to create a truly open context QA-System. Open context QA-Systems remain an open problem.

The various algorithms and approaches presented in this work will be focused on complex questions. Complex questions are usually verbose and the context of the question is equally important to answer the query …


Mobile Big Data Analytics Using Deep Learning And Apache Spark, Mohammad Abu Alsheikh, Dusit Niyato, Shaowei Lin, Hwee-Pink Tan, Zhu Han May 2016

Mobile Big Data Analytics Using Deep Learning And Apache Spark, Mohammad Abu Alsheikh, Dusit Niyato, Shaowei Lin, Hwee-Pink Tan, Zhu Han

Research Collection School Of Computing and Information Systems

The proliferation of mobile devices, such as smartphones and Internet of Things gadgets, has resulted in the recent mobile big data era. Collecting mobile big data is unprofitable unless suitable analytics and learning methods are utilized to extract meaningful information and hidden patterns from data. This article presents an overview and brief tutorial on deep learning in mobile big data analytics and discusses a scalable learning framework over Apache Spark. Specifically, distributed deep learning is executed as an iterative MapReduce computing on many Spark workers. Each Spark worker learns a partial deep model on a partition of the overall mobile, …


Comparison Of Machine Learning Algorithms In Suggesting Candidate Edges To Construct A Query On Heterogeneous Graphs, Rohit Ravi Kumar Bhoopalam May 2016

Comparison Of Machine Learning Algorithms In Suggesting Candidate Edges To Construct A Query On Heterogeneous Graphs, Rohit Ravi Kumar Bhoopalam

Computer Science and Engineering Theses

Querying graph data can be difficult as it requires the user to have knowledge of the underlying schema and the query language. Visual query builders allow users to formulate the intended query by drawing nodes and edges of the query graph, which can be translated into a database query. Visual query builders help users formulate the query without requiring the user to have knowledge of the query language and the underlying schema. To the best of our knowledge, none of the currently available visual query builders suggest users what nodes/edges to include into their query graph. We provide suggestions to …


Sparse Feature Learning For Image Analysis In Segmentation, Classification, And Disease Diagnosis., Ehsan Hosseini-Asl May 2016

Sparse Feature Learning For Image Analysis In Segmentation, Classification, And Disease Diagnosis., Ehsan Hosseini-Asl

Electronic Theses and Dissertations

The success of machine learning algorithms generally depends on intermediate data representation, called features that disentangle the hidden factors of variation in data. Moreover, machine learning models are required to be generalized, in order to reduce the specificity or bias toward the training dataset. Unsupervised feature learning is useful in taking advantage of large amount of unlabeled data, which is available to capture these variations. However, learned features are required to capture variational patterns in data space. In this dissertation, unsupervised feature learning with sparsity is investigated for sparse and local feature extraction with application to lung segmentation, interpretable deep …


Exploring Privacy Leakage From The Resource Usage Patterns Of Mobile Apps, Amin Rois Sinung Nugroho May 2016

Exploring Privacy Leakage From The Resource Usage Patterns Of Mobile Apps, Amin Rois Sinung Nugroho

Graduate Theses and Dissertations

Due to the popularity of smart phones and mobile apps, a potential privacy risk with the usage of mobile apps is that, from the usage information of mobile apps (e.g., how many hours a user plays mobile games in each day), private information about a user’s living habits and personal activities can be inferred. To assess this risk, this thesis answers the following research question: can the type of a mobile app (e.g., email, web browsing, mobile game, music streaming, etc.) used by a user be inferred from the resource (e.g., CPU, memory, network, etc.) usage patterns of the mobile …


Bridging Statistical Learning And Formal Reasoning For Cyber Attack Detection, Kexin Pei Apr 2016

Bridging Statistical Learning And Formal Reasoning For Cyber Attack Detection, Kexin Pei

Open Access Theses

Current cyber-infrastructures are facing increasingly stealthy attacks that implant malicious payloads under the cover of benign programs. Current attack detection approaches based on statistical learning methods may generate misleading decision boundaries when processing noisy data with such a mixture of benign and malicious behaviors. On the other hand, attack detection based on formal program analysis may lack completeness or adaptivity when modeling attack behaviors. In light of these limitations, we have developed LEAPS, an attack detection system based on supervised statistical learning to classify benign and malicious system events. Furthermore, we leverage control flow graphs inferred from the system event …


Predicting Changes To Source Code, Justin James Roll Apr 2016

Predicting Changes To Source Code, Justin James Roll

Master's Theses

Organizations typically use issue tracking systems (ITS) such as Jira to plan software releases and assign requirements to developers. Organizations typically also use source control management (SCM) repositories such as Git to track historical changes to a code-base. These ITS and SCM repositories contain valuable data that remains largely untapped. As developers churn through an organization, it becomes expensive for developers to spend time determining which software artifact must be modified to implement a requirement. In this work we created, developed, tested and evaluated a tool called Class Change Predictor, otherwise known as CCP, for predicting which class will implement …


Cross-Subject Continuous Analytic Workload Profiling Using Stochastic Discrete Event Simulation, Joseph J. Giametta Mar 2016

Cross-Subject Continuous Analytic Workload Profiling Using Stochastic Discrete Event Simulation, Joseph J. Giametta

Theses and Dissertations

Operator functional state (OFS) in remotely piloted aircraft (RPA) simulations is modeled using electroencephalograph (EEG) physiological data and continuous analytic workload profiles (CAWPs). A framework is proposed that provides solutions to the limitations that stem from lengthy training data collection and labeling techniques associated with generating CAWPs for multiple operators/trials. The framework focuses on the creation of scalable machine learning models using two generalization methods: 1) the stochastic generation of CAWPs and 2) the use of cross-subject physiological training data to calibrate machine learning models. Cross-subject workload models are used to infer OFS on new subjects, reducing the need to …


Novel Machine Learning Methods For Modeling Time-To-Event Data, Bhanukiran Vinzamuri Jan 2016

Novel Machine Learning Methods For Modeling Time-To-Event Data, Bhanukiran Vinzamuri

Wayne State University Dissertations

Predicting time-to-event from longitudinal data where different events occur at different time points is an extremely important problem in several domains such as healthcare, economics, social networks and seismology, to name a few. A unique challenge in this problem involves building predictive models from right censored data (also called as survival data). This is a phenomenon where instances whose event of interest are not yet observed within a given observation time window and are considered to be right censored. Effective models for predicting time-to-event labels from such right censored data with good accuracy can have a significant impact in these …


Privacy And Accountability In Black-Box Medicine, Roger Allan Ford, W. Nicholson Price Ii Jan 2016

Privacy And Accountability In Black-Box Medicine, Roger Allan Ford, W. Nicholson Price Ii

Law Faculty Scholarship

Black-box medicine—the use of big data and sophisticated machine learning techniques for health-care applications—could be the future of personalized medicine. Black-box medicine promises to make it easier to diagnose rare diseases and conditions, identify the most promising treatments, and allocate scarce resources among different patients. But to succeed, it must overcome two separate, but related, problems: patient privacy and algorithmic accountability. Privacy is a problem because researchers need access to huge amounts of patient health information to generate useful medical predictions. And accountability is a problem because black-box algorithms must be verified by outsiders to ensure they are accurate and …


Harnessing The Power Of Text Mining For The Detection Of Abusive Content In Social Media, Hao Chen, Susan Mckeever, Sarah Jane Delany Jan 2016

Harnessing The Power Of Text Mining For The Detection Of Abusive Content In Social Media, Hao Chen, Susan Mckeever, Sarah Jane Delany

Conference papers

Abstract The issues of cyberbullying and online harassment have gained considerable coverage in the last number of years. Social media providers need to be able to detect abusive content both accurately and efficiently in order to protect their users. Our aim is to investigate the application of core text mining techniques for the automatic detection of abusive content across a range of social media sources include blogs, forums, media-sharing, Q&A and chat - using datasets from Twitter, YouTube, MySpace, Kongregate, Formspring and Slashdot. Using supervised machine learning, we compare alternative text representations and dimension reduction approaches, including feature selection and …


Eeg Interictal Spike Detection Using Artificial Neural Networks, Howard J. Carey Iii Jan 2016

Eeg Interictal Spike Detection Using Artificial Neural Networks, Howard J. Carey Iii

Theses and Dissertations

Epilepsy is a neurological disease causing seizures in its victims and affects approximately 50 million people worldwide. Successful treatment is dependent upon correct identification of the origin of the seizures within the brain. To achieve this, electroencephalograms (EEGs) are used to measure a patient’s brainwaves. This EEG data must be manually analyzed to identify interictal spikes that emanate from the afflicted region of the brain. This process can take a neurologist more than a week and a half per patient. This thesis presents a method to extract and process the interictal spikes in a patient, and use them to reduce …


Mathematical Foundations Of Sentiment Classification : A Probabilistic Approach, Syed Shahzad Raza Jan 2016

Mathematical Foundations Of Sentiment Classification : A Probabilistic Approach, Syed Shahzad Raza

Legacy Theses & Dissertations (2009 - 2024)

This thesis is an introduction to the mathematical formalization of sentiment classification. It presents two popular probabilistic machine learning models to classify tweets downloaded from Twitter during the US Election Period, 2016. The thesis analyses accuracy of the two classification algorithms used. Namely, Multinomial Naïve Bayes and Bernoulli Naïve Bayes algorithms. Supervised learning approaches implemented in this thesis use approximately 600 manually labeled tweets containing information regarding the US presidential candidates. It is shown with 80% accuracy that majority of twitter users spoke in favor of Donald Trump before and after the presidential election through their tweets. We also discuss …


Removal Of Impulse Noise In Digital Images With Na\"Ive Bayes Classifier Method, Cafer Budak, Mustafa Türk, Abdullah Toprak Jan 2016

Removal Of Impulse Noise In Digital Images With Na\"Ive Bayes Classifier Method, Cafer Budak, Mustafa Türk, Abdullah Toprak

Turkish Journal of Electrical Engineering and Computer Sciences

No abstract provided.


Forecasting Customer Electricity Load Demand In The Power Trading Agent Competition Using Machine Learning, Saiful Abu Jan 2016

Forecasting Customer Electricity Load Demand In The Power Trading Agent Competition Using Machine Learning, Saiful Abu

Open Access Theses & Dissertations

Accurate electricity load demand forecasting is an important problem in managing the power grid for both economic and environmental reasons. The Power TAC simulation provides a platform to do research on smart grid energy generation and distribution systems. Brokers are the focus of the design task posed to developers by the system. The brokers work as self-interested entities that try to maximize profits by trading electricity across multiple markets. To be successful, a broker has to forecast the electricity demand for customers as accurately as possible so it can use this information to operate efficiently. My proposed forecasting method uses …


Algorithmic Music Composition And Accompaniment Using Neural Networks, Daniel Wilton Risdon Jan 2016

Algorithmic Music Composition And Accompaniment Using Neural Networks, Daniel Wilton Risdon

Senior Projects Spring 2016

The goal of this project was to use neural networks as a tool for live music performance. Specifically, the intention was to adapt a preexisting neural network code library to work in Max, a visual programming language commonly used to create instruments and effects for electronic music and audio processing. This was done using ConvNetJS, a JavaScript library created by Andrej Karpathy.

Several neural network models were trained using a range of different training data, including music from various genres. The resulting neural network-based instruments were used to play brief pieces of music, which they used as input to create …


Feature Selection For Movie Recommendation, Zehra Çataltepe, Mahi̇ye Uluyağmur, Esengül Tayfur Jan 2016

Feature Selection For Movie Recommendation, Zehra Çataltepe, Mahi̇ye Uluyağmur, Esengül Tayfur

Turkish Journal of Electrical Engineering and Computer Sciences

TV users have an abundance of different movies they could choose from, and with the quantity and quality of data available both on user behavior and content, better recommenders are possible. In this paper, we evaluate and combine different content-based and collaborative recommendation methods for a Turkish movie recommendation system. Our recommendation methods can make use of user behavior, different types of content features, and other users' behavior to predict movie ratings. We gather different types of data on movies, such as the description, actors, directors, year, and genre. We use natural language processing methods to convert the Turkish movie …


A Mapreduce-Based Distributed Svm Algorithm For Binary Classification, Ferhat Özgür Çatak, Mehmet Erdal Balaban Jan 2016

A Mapreduce-Based Distributed Svm Algorithm For Binary Classification, Ferhat Özgür Çatak, Mehmet Erdal Balaban

Turkish Journal of Electrical Engineering and Computer Sciences

Although the support vector machine (SVM) algorithm has a high generalization property for classifying unseen examples after the training phase~and a small loss value, the algorithm is not suitable for real-life classification and regression problems. SVMs cannot solve hundreds of thousands of examples in a training dataset. In previous studies on distributed machine-learning algorithms, the SVM was trained in a costly and preconfigured computer environment. In this research, we present a MapReduce-based distributed parallel SVM training algorithm for binary classification problems. This work shows how to distribute optimization problems over cloud computing systems with the MapReduce technique. In the second …


A Comparison Of Fundamental Network Formation Principles Between Offline And Online Friends On Twitter, Felicia Natali, Feida Zhu Jan 2016

A Comparison Of Fundamental Network Formation Principles Between Offline And Online Friends On Twitter, Felicia Natali, Feida Zhu

Research Collection School Of Computing and Information Systems

We investigate the differences between how some of the fundamental principles of network formation apply among offline friends and how they apply among online friends on Twitter. We consider three fundamental principles of network formation proposed by Schaefer et al.: reciprocity, popularity, and triadic closure. Overall, we discover that these principles mainly apply to offline friends on Twitter. Based on how these principles apply to offline versus online friends, we formulate rules to predict offline friendship on Twitter. We compare our algorithm with popular machine learning algorithms and Xiewei’s random walk algorithm. Our algorithm beats the machine learning algorithms on …