Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine learning

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1381 - 1410 of 1686

Full-Text Articles in Physical Sciences and Mathematics

Image Spam Detection, Aneri Chavda May 2017

Image Spam Detection, Aneri Chavda

Master's Projects

Email is one of the most common forms of digital communication. Spam can be de ned as unsolicited bulk email, while image spam includes spam text embedded inside images. Image spam is used by spammers so as to evade text-based spam lters and hence it poses a threat to email based communication. In this research, we analyze image spam detection methods based on various combinations of image processing and machine learning techniques.


Bayesian Optimization For Refining Object Proposals, With An Application To Pedestrian Detection, Anthony D. Rhodes May 2017

Bayesian Optimization For Refining Object Proposals, With An Application To Pedestrian Detection, Anthony D. Rhodes

Student Research Symposium

We devise an algorithm using a Bayesian optimization framework in conjunction with contextual visual data for the efficient localization of objects in still images. Recent research has demonstrated substantial progress in object localization and related tasks for computer vision. However, many current state-of-the-art object localization procedures still suffer from inaccuracy and inefficiency, in addition to failing to successfully leverage contextual data. We address these issues with the current research.

Our method encompasses an active search procedure that uses contextual data to generate initial bounding-box proposals for a target object. We train a convolutional neural network to approximate an offset distance …


Visual Knowledge Discovery And Machine Learning For Investment Strategy, Antoni Wilinski, Boris Kovalerchuk May 2017

Visual Knowledge Discovery And Machine Learning For Investment Strategy, Antoni Wilinski, Boris Kovalerchuk

All Faculty Scholarship for the College of the Sciences

Knowledge discovery is an important aspect of human cognition. The advantage of the visual approach is in opportunity to substitute some complex cognitive tasks by easier perceptual tasks. However for cognitive tasks such as financial investment decision making this opportunity faces the challenge that financial data are abstract multidimensional and multivariate, i.e., outside of traditional visual perception in 2D or 3D world. This paper presents an approach to find an investment strategy based on pattern discovery in multidimensional space of specifically prepared time series. Visualization based on the lossless Collocated Paired Coordinates (CPC) plays an important role in this approach …


Aspect Discovery From Product Reviews, Ying Ding May 2017

Aspect Discovery From Product Reviews, Ying Ding

Dissertations and Theses Collection

With the rapid development of online shopping sites and social media, product reviews are accumulating. These reviews contain information that is valuable to both businesses and customers. To businesses, companies can easily get a large number of feedback of their products, which is difficult to achieve by doing customer survey in the traditional way. To customers, they can know the products they are interested in better by reading reviews, which may be uneasy without online reviews. However, the accumulation has caused consuming all reviews impossible. It is necessary to develop automated techniques to efficiently process them. One of the most …


Using A Multi Variate Pattern Analysis (Mvpa) Approach To Decode Fmri Responses To Fear And Anxiety., Sajjad Torabian Esfahani May 2017

Using A Multi Variate Pattern Analysis (Mvpa) Approach To Decode Fmri Responses To Fear And Anxiety., Sajjad Torabian Esfahani

Electronic Theses and Dissertations

This study analyzed fMRI responses to fear and anxiety using a Multi Variate Pattern Analysis (MVPA) approach. Compared to conventional univariate methods which only represent regions of activation, MVPA provides us with more detailed patterns of voxels. We successfully found different patterns for fear and anxiety through separate classification attempts in each subject’s representational space. Further, we transformed all the individual models into a standard space to do group analysis. Results showed that subjects share a more common fear response. Also, the amygdala and hippocampus areas are more important for differentiating fear than anxiety.


Machine Learning: Several Advances In Linear Discriminant Analysis, Multi-View Regression And Support Vector Machine, Shuai Zheng May 2017

Machine Learning: Several Advances In Linear Discriminant Analysis, Multi-View Regression And Support Vector Machine, Shuai Zheng

Computer Science and Engineering Dissertations

Machine learning technology is now widely used in engineering, science, finance, healthcare, etc. In this dissertation, we make several advances in machine learning technologies for high dimensional data analysis, image data classification, recommender systems and classification algorithms. In this big data era, many data are high dimensional data which is difficult to analyze. We propose two efficient Linear Discriminant Analysis (LDA) based methods to reduce data to low dimensions. Kernel alignment measures the degree of similarity between two kernels. We propose kernel alignment inspired LDA to find a subspace to maximize the alignment between subspace-transformed data kernel and class indicator …


Learning From Wizard-Of-Oz Using Dynamic User Modeling, Tasnim Inayat Makada May 2017

Learning From Wizard-Of-Oz Using Dynamic User Modeling, Tasnim Inayat Makada

Computer Science and Engineering Theses

Socially assistive robotics (SAR) is a field of study that combines assistive robotics with socially interactive robotics where the goal of the robot is to provide assistance to human users through social interaction. The effectiveness of a SAR system basically depends on the user’s engagement in the interaction and the level of autonomy obtained by the system such that it requires no human intervention. The focus of this thesis is to build a SAR system that progressively learns to make autonomous decisions in an online manner, based on human input. An expert/therapist provides guidance to the system during the interaction …


Detecting Malicious Campaigns In Crowdsourcing Platforms, Hongkyu Choi May 2017

Detecting Malicious Campaigns In Crowdsourcing Platforms, Hongkyu Choi

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Crowdsourcing sites such as Mechanical Turk and Crowdflower provide a marketplace where requesters create tasks and recruit workers, who may perform certain tasks in order to get financial compensation. Anyone in the world can be a requester and/or a worker as long as he/she has the Internet connection. Crowdsourcing creates a new way to solve various tasks by using “human computation power”. However, crowdsourcing has been misused by malicious requesters and unethical workers for account generation, search engine optimization, content and link generation, ad posting and spam mailing, and social network linking. It creates new threats to the Web system. …


Improving Long Term Stock Market Prediction With Text Analysis, Tanner A. Bohn Apr 2017

Improving Long Term Stock Market Prediction With Text Analysis, Tanner A. Bohn

Electronic Thesis and Dissertation Repository

The task of forecasting stock performance is well studied with clear monetary motivations for those wishing to invest. A large amount of research in the area of stock performance prediction has already been done, and multiple existing results have shown that data derived from textual sources related to the stock market can be successfully used towards forecasting. These existing approaches have mostly focused on short term forecasting, used relatively simple sentiment analysis techniques, or had little data available. In this thesis, we prepare over ten years worth of stock data and propose a solution which combines features from textual yearly …


Investigating Citation Linkage Between Research Articles, Kokou Hospice Houngbo Apr 2017

Investigating Citation Linkage Between Research Articles, Kokou Hospice Houngbo

Electronic Thesis and Dissertation Repository

In recent years, there has been a dramatic increase in scientific publications across the globe. To help navigate this overabundance of information, methods have been devised to find papers with related content, but they are lacking in the ability to provide specific information that a researcher may need without having to read hundreds of linked papers. The search and browsing capabilities of online domain specific scientific repositories are limited to finding a paper citing other papers, but do not point to the specific text that is being cited. Providing this capability to the research community will be beneficial in terms …


Identification Of Prognostic Genes And Gene Sets For Early-Stage Non-Small Cell Lung Cancer Using Bi-Level Selection Methods, Suyan Tian, Chi Wang, Howard H. Chang, Jianguo Sun Apr 2017

Identification Of Prognostic Genes And Gene Sets For Early-Stage Non-Small Cell Lung Cancer Using Bi-Level Selection Methods, Suyan Tian, Chi Wang, Howard H. Chang, Jianguo Sun

Biostatistics Faculty Publications

In contrast to feature selection and gene set analysis, bi-level selection is a process of selecting not only important gene sets but also important genes within those gene sets. Depending on the order of selections, a bi-level selection method can be classified into three categories – forward selection, which first selects relevant gene sets followed by the selection of relevant individual genes; backward selection which takes the reversed order; and simultaneous selection, which performs the two tasks simultaneously usually with the aids of a penalized regression model. To test the existence of subtype-specific prognostic genes for non-small cell lung cancer …


Quantitative Criticism Of Literary Relationships, Joseph P. Dexter, Theodore Katz, Nilesh Tripuraneni, Tathagata Dasgupta, Ajay Kannan, James Brofos, Jorge A. Bonilla Lopez, Lea Schroeder Apr 2017

Quantitative Criticism Of Literary Relationships, Joseph P. Dexter, Theodore Katz, Nilesh Tripuraneni, Tathagata Dasgupta, Ajay Kannan, James Brofos, Jorge A. Bonilla Lopez, Lea Schroeder

Dartmouth Scholarship

Authors often convey meaning by referring to or imitating prior works of literature, a process that creates complex networks of literary relationships (“intertextuality”) and contributes to cultural evolution. In this paper, we use techniques from stylometry and machine learning to address subjective literary critical questions about Latin literature, a corpus marked by an extraordinary concentration of intertextuality. Our work, which we term “quantitative criticism,” focuses on case studies involving two influential Roman authors, the playwright Seneca and the historian Livy. We find that four plays related to but distinct from Seneca’s main writings are differentiated from the rest of the …


Development And Evaluation Of Machine Learning Algorithms For Biomedical Applications, Turki Talal Turki Apr 2017

Development And Evaluation Of Machine Learning Algorithms For Biomedical Applications, Turki Talal Turki

Dissertations

Gene network inference and drug response prediction are two important problems in computational biomedicine. The former helps scientists better understand the functional elements and regulatory circuits of cells. The latter helps a physician gain full understanding of the effective treatment on patients. Both problems have been widely studied, though current solutions are far from perfect. More research is needed to improve the accuracy of existing approaches.

This dissertation develops machine learning and data mining algorithms, and applies these algorithms to solve the two important biomedical problems. Specifically, to tackle the gene network inference problem, the dissertation proposes (i) new techniques …


Viewability Prediction For Display Advertising, Chong Wang Apr 2017

Viewability Prediction For Display Advertising, Chong Wang

Dissertations

As a massive industry, display advertising delivers advertisers’ marketing messages to attract customers through graphic banners on webpages. Display advertising is also the most essential revenue source of online publishers. Currently, advertisers are charged by user response or ad serving. However, recent studies show that users barely click or convert display ads. Moreover, about half of the ads are actually never seen by users. In this case, advertisers cannot enhance their brand awareness and increase return on investment. Publishers also lose much revenue. Therefore, the ad pricing standards are shifting to a new model: ad impressions are paid if they …


Using Machine Learning To Predict Chemotherapy Response In Cell Lines And Patients Based On Genetic Expression, Dimo Angelov Mar 2017

Using Machine Learning To Predict Chemotherapy Response In Cell Lines And Patients Based On Genetic Expression, Dimo Angelov

Electronic Thesis and Dissertation Repository

The goal of this thesis was to examine different machine learning techniques for predicting chemotherapy response in cell lines and patients based on genetic expression. After trying regression, multi-class classification techniques and binary classification it was concluded that binary classification was the best method for training models due to the limited size of available cell line data. We found support vector machine classifiers trained on cell line data were easier to use and produced better results compared to neural networks. Sequential backward feature selection was able to select genes for the models that produced good results, however the greedy algorithm …


Soal: Second-Order Online Active Learning, Shuji Hao, Peilin Zhao, Jing Lu, Steven C. H. Hoi, Chunyan Miao, Chi Zhang Feb 2017

Soal: Second-Order Online Active Learning, Shuji Hao, Peilin Zhao, Jing Lu, Steven C. H. Hoi, Chunyan Miao, Chi Zhang

Research Collection School Of Computing and Information Systems

This paper investigates the problem of online active learning for training classification models from sequentially arriving data. This is more challenging than conventional online learning tasks since the learner not only needs to figure out how to effectively update the classifier but also needs to decide when is the best time to query the label of an incoming instance given limited label budget. The existing online active learning approaches are often based on first-order online learning methods which generally fall short in slow convergence rate and suboptimal exploitation of available information when querying the labeled data. To overcome the limitations, …


Malware Detection Using The Index Of Coincidence, Bhavna Gurnani Jan 2017

Malware Detection Using The Index Of Coincidence, Bhavna Gurnani

Master's Projects

In this research, we apply the Index of Coincidence (IC) to problems in malware analysis. The IC, which is often used in cryptanalysis of classic ciphers, is a technique for measuring the repeat rate in a string of symbols. A score based on the IC is applied to a variety of challenging malware families. We nd that this relatively simple IC score performs surprisingly well, with superior results in comparison to various machine learning based scores, at least in some cases.


An Incremental Reseeding Strategy For Clustering, Xavier Bresson, Huiyi Hu, Thomas Laurent, Arthur Szlam, James Von Brecht Jan 2017

An Incremental Reseeding Strategy For Clustering, Xavier Bresson, Huiyi Hu, Thomas Laurent, Arthur Szlam, James Von Brecht

Thomas Laurent

In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning. The algorithm alternates between three basic components: diffusing seed vertices over the graph, thresholding the diffused seeds, and then randomly reseeding the thresholded clusters. We demonstrate experimentally that the proper combination of these ingredients leads to an algorithm that achieves state-of-the-art performance in terms of cluster purity on standard benchmarks datasets. Moreover, the algorithm runs an order of magnitude faster than the other algorithms that achieve comparable results in terms of accuracy. We also describe a coarsen, cluster and refine approach similar to GRACLUS and …


Ai Education: Machine Learning Resources, Todd W. Neller Jan 2017

Ai Education: Machine Learning Resources, Todd W. Neller

Computer Science Faculty Publications

In this column, we focus on resources for learning and teaching three broad categories of machine learning (ML): supervised, unsupervised, and reinforcement learning. In ournext column, we will focus specifically on deep neural network learning resources, so if you have any resource recommendations, please email them to the address above. [excerpt]


Towards A Relative-Pitch Neural Network System For Chorale Composition And Harmonization, Samuel P. Goree Jan 2017

Towards A Relative-Pitch Neural Network System For Chorale Composition And Harmonization, Samuel P. Goree

Honors Papers

Computational creativity researchers interested in applying machine learning to computer composition often use the music of J.S. Bach to train their systems. Working with Bach, though, requires grappling with the conventions of tonal music, which can be difficult for computer systems to learn. In this paper, we propose and implement an alternate approach to composition and harmonization of chorales based on pitch-relative note encodings to avoid tonality altogether. We then evaluate our approach using a survey and expert analysis, and find that pitch-relative encodings do not significantly affect human-comparability, likability or creativity. However, an extension of this model that better …


Application Of Response Surface Methods To Determine Conditions For Optimal Genomic Prediction, Reka Howard, Alicia L. Carriquiry, William D. Beavis Jan 2017

Application Of Response Surface Methods To Determine Conditions For Optimal Genomic Prediction, Reka Howard, Alicia L. Carriquiry, William D. Beavis

Department of Statistics: Faculty Publications

An epistatic genetic architecture can have a significant impact on prediction accuracies of genomic prediction (GP) methods. Machine learning methods predict traits comprised of epistatic genetic architectures more accurately than statistical methods based on additive mixed linear models. The differences between these types of GP methods suggest a diagnostic for revealing genetic architectures underlying traits of interest. In addition to genetic architecture, the performance of GP methods may be influenced by the sample size of the training population, the number of QTL, and the proportion of phenotypic variability due to genotypic variability (heritability). Possible values for these factors and the …


Intelligent Feature Selection For Detecting Http/2 Denial Of Service Attacks, Erwin Adi, Zubair Baig Jan 2017

Intelligent Feature Selection For Detecting Http/2 Denial Of Service Attacks, Erwin Adi, Zubair Baig

Australian Information Security Management Conference

Intrusion-detection systems employ machine learning techniques to classify traffic into attack and legitimate. Network flooding attacks can leverage the new web communications protocol (HTTP/2) to bypass intrusion-detection systems. This creates an urgent demand to understand HTTP/2 characteristics and to devise customised cyber-attack detection schemes. This paper proposes Step Sister; a technique to generate an optimum network traffic feature set for network intrusion detection. The proposed technique demonstrates that a consistent set of features are selected for a given HTTP/2 dataset. This allows intrusion-detection systems to classify previously unseen network traffic samples with fewer false alarm than when techniques used in …


A Novel Approach For Classifying Gene Expression Data Using Topic Modeling, Soon Jye Kho, Himi Yalamanchili, Michael L. Raymer, Amit Sheth Jan 2017

A Novel Approach For Classifying Gene Expression Data Using Topic Modeling, Soon Jye Kho, Himi Yalamanchili, Michael L. Raymer, Amit Sheth

Kno.e.sis Publications

Understanding the role of differential gene expression in cancer etiology and cellular process is a complex problem that continues to pose a challenge due to sheer number of genes and inter-related biological processes involved. In this paper, we employ an unsupervised topic model, Latent Dirichlet Allocation (LDA) to mitigate overfitting of high-dimensionality gene expression data and to facilitate understanding of the associated pathways. LDA has been recently applied for clustering and exploring genomic data but not for classification and prediction. Here, we proposed to use LDA inclustering as well as in classification of cancer and healthy tissues using lung cancer …


Diagnosing Breast Cancer With A Neural Network, John Cullen Jan 2017

Diagnosing Breast Cancer With A Neural Network, John Cullen

Undergraduate Journal of Mathematical Modeling: One + Two

Fine needle aspiration (FNA) is a minimally invasive biopsy technique that can be used to successfully diagnose types of cancer, including breast cancer. Immediately, it is difficult for a human to spot any trends in the cell level data gathered during a fine needle aspiration procedure. One way to predict the type of tumor a patient has, is to use a computer to develop a mathematical model based on known data. This project utilizes the Diagnostic Wisconsin Breast Cancer Database (DWBCDB) to create an accurate mathematical model that predicts the type of a patient’s tumor (Malignant or Benign). A neural …


Temporal Feature Selection With Symbolic Regression, Christopher Winter Fusting Jan 2017

Temporal Feature Selection With Symbolic Regression, Christopher Winter Fusting

Graduate College Dissertations and Theses

Building and discovering useful features when constructing machine learning models is the central task for the machine learning practitioner. Good features are useful not only in increasing the predictive power of a model but also in illuminating the underlying drivers of a target variable. In this research we propose a novel feature learning technique in which Symbolic regression is endowed with a ``Range Terminal'' that allows it to explore functions of the aggregate of variables over time. We test the Range Terminal on a synthetic data set and a real world data in which we predict seasonal greenness using satellite …


Mouse Vs. Machine: The Game, Cafferty Aiko Frattarelli Jan 2017

Mouse Vs. Machine: The Game, Cafferty Aiko Frattarelli

Senior Projects Spring 2017

Many modern video games built by big name companies are coded by a group of people together using, and possibly modifying, an already designed game engine. These games usually have another group of people creating the artwork. In this project, I coded and designed a video game from scratch, as well as created all the artwork used in the game. The player controls a mouse character who fights a variety of monsters. In order to create the complexity of the game, I implement basic neural networks as the enemy artificial intelligence, i.e. the decision making process of the enemy. It …


Machine Learning With Personal Data: Is Data Protection Law Smart Enough To Meet The Challenge?, Fred H. Cate, Christopher Kuner, Dan Jerker B. Svantesson, Orla Lynskey, Christopher Millard Jan 2017

Machine Learning With Personal Data: Is Data Protection Law Smart Enough To Meet The Challenge?, Fred H. Cate, Christopher Kuner, Dan Jerker B. Svantesson, Orla Lynskey, Christopher Millard

Articles by Maurer Faculty

No abstract provided.


Presenting A Labelled Dataset For Real-Time Detection Of Abusive User Posts, Hao Chen, Susan Mckeever, Sarah Jane Delany Jan 2017

Presenting A Labelled Dataset For Real-Time Detection Of Abusive User Posts, Hao Chen, Susan Mckeever, Sarah Jane Delany

Conference papers

Social media sites facilitate users in posting their own personal comments online. Most support free format user posting, with close to real-time publishing speeds. However, online posts generated by a public user audience carry the risk of containing inappropriate, potentially abusive content. To detect such content, the straightforward approach is to filter against blacklists of profane terms. However, this lexicon filtering approach is prone to problems around word variations and lack of context. Although recent methods inspired by machine learning have boosted detection accuracies, the lack of gold standard labelled datasets limits the development of this approach. In this work, …


Deep Learning Method Vs. Hand-Crafted Features For Lung Cancer Diagnosis And Breast Cancer Risk Analysis, Wenqing Sun Jan 2017

Deep Learning Method Vs. Hand-Crafted Features For Lung Cancer Diagnosis And Breast Cancer Risk Analysis, Wenqing Sun

Open Access Theses & Dissertations

Breast cancer and lung cancer are two major leading causes of cancer deaths, and researchers have been developing computer aided diagnosis (CAD) system to automatically diagnose them for decades. In recent studies, we found that the techniques in CAD system can also be used for breast cancer risk analysis, like feature design and machine learning. Also we noticed that with the development of deep learning methods, the performance of CAD system can be improved by using computer automatically generated features. To explore these possibilities, we conducted a series of studies: the first two studies focused on transferring the original CAD …


Late Fusion Of Facial Dynamics For Automatic Expression Recognition, Alessandra Bandrabur, Laura Florea, Cornel Florea, Matei Mancas Jan 2017

Late Fusion Of Facial Dynamics For Automatic Expression Recognition, Alessandra Bandrabur, Laura Florea, Cornel Florea, Matei Mancas

Turkish Journal of Electrical Engineering and Computer Sciences

Installment of a facial expression is associated with contractions and extensions of specific facial muscles. Noting that expression is about changes, we present a model for expression classification based on facial landmarks dynamics. Our model isolates the trajectory of facial fiducial points by wrapping them up in relevant features and discriminating among various alternatives with a machine learning classification system. The used features are geometric and temporal-based and the classification system is represented by a late fusion framework that combines several neural networks with binary responses. The proposed method is robust, being able to handle complex expression classes.