Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine learning

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1351 - 1380 of 1687

Full-Text Articles in Physical Sciences and Mathematics

Evolution Of Bias In Human And Machine Learning Algorithm Interaction, Wenlong Sun, Olfa Nasraoui, Patrick Shafto Oct 2017

Evolution Of Bias In Human And Machine Learning Algorithm Interaction, Wenlong Sun, Olfa Nasraoui, Patrick Shafto

Commonwealth Computational Summit

Human algorithm interaction:

  • people are now affected by the output of all types of machine learning algorithms.
  • social media, blogs, social networks, and other services and applications.

Motivation:

  • ML algorithm relied on reliable labels from experts to build prediction.
  • However, ML algorithm started to receive data from the more general population.
  • The interaction leads to biased result which is caused by ingesting unchecked information from general population, such as biased samples and biased labels.


A Comparative Study On Machine Learning Algorithms For Network Defense, Abdinur Ali, Yen-Hung Hu, Chung-Chu (George) Hsieh, Mushtaq Khan Oct 2017

A Comparative Study On Machine Learning Algorithms For Network Defense, Abdinur Ali, Yen-Hung Hu, Chung-Chu (George) Hsieh, Mushtaq Khan

Virginia Journal of Science

Network security specialists use machine learning algorithms to detect computer network attacks and prevent unauthorized access to their networks. Traditionally, signature and anomaly detection techniques have been used for network defense. However, detection techniques must adapt to keep pace with continuously changing security attacks. Therefore, machine learning algorithms always learn from experience and are appropriate tools for this adaptation. In this paper, ten machine learning algorithms were trained with the KDD99 dataset with labels, then they were tested with different dataset without labels. The researchers investigate the speed and the efficiency of these machine learning algorithms in terms of several …


Feature Space Augmentation: Improving Prediction Accuracy Of Classical Problems In Cognitive Science And Computer Vison, Piyush Saxena Oct 2017

Feature Space Augmentation: Improving Prediction Accuracy Of Classical Problems In Cognitive Science And Computer Vison, Piyush Saxena

Dissertations (1934 -)

The prediction accuracy in many classical problems across multiple domains has seen a rise since computational tools such as multi-layer neural nets and complex machine learning algorithms have become widely accessible to the research community. In this research, we take a step back and examine the feature space in two problems from very different domains. We show that novel augmentation to the feature space yields higher performance. Emotion Recognition in Adults from a Control Group: The objective is to quantify the emotional state of an individual at any time using data collected by wearable sensors. We define emotional state as …


Vungle Inc. Improves Monetization Using Big-Data Analytics, Bert De Reyck, Ioannis Fragkos, Yael Gruksha-Cockayne, Casey Lichtendahl, Hammond Guerin, Andre Kritzer Oct 2017

Vungle Inc. Improves Monetization Using Big-Data Analytics, Bert De Reyck, Ioannis Fragkos, Yael Gruksha-Cockayne, Casey Lichtendahl, Hammond Guerin, Andre Kritzer

Research Collection Lee Kong Chian School Of Business

The advent of big data has created opportunities for firms to customize their products and services to unprecedented levels of granularity. Using big data to personalize an offering in real time, however, remains a major challenge. In the mobile advertising industry, once a customer enters the network, an ad-serving decision must be made in a matter of milliseconds. In this work, we describe the design and implementation of an ad-serving algorithm that incorporates machine-learning methods to make personalized ad-serving decisions within milliseconds. We developed this algorithm for Vungle Inc., one of the largest global mobile ad networks. Our approach also …


Exploring The Internal Statistics: Single Image Super-Resolution, Completion And Captioning, Yang Xian Sep 2017

Exploring The Internal Statistics: Single Image Super-Resolution, Completion And Captioning, Yang Xian

Dissertations, Theses, and Capstone Projects

Image enhancement has drawn increasingly attention in improving image quality or interpretability. It aims to modify images to achieve a better perception for human visual system or a more suitable representation for further analysis in a variety of applications such as medical imaging, remote sensing, and video surveillance. Based on different attributes of the given input images, enhancement tasks vary, e.g., noise removal, deblurring, resolution enhancement, prediction of missing pixels, etc. The latter two are usually referred to as image super-resolution and image inpainting (or completion).

Image super-resolution and completion are numerically ill-posed problems. Multi-frame-based approaches make use of the …


Lidar Aboveground Vegetation Biomass Estimates In Shrublands: Prediction, Uncertainties And Application To Coarser Scales, Aihua Li, Shital Dhakal, Nancy F. Glenn, Lucas P. Spaete Sep 2017

Lidar Aboveground Vegetation Biomass Estimates In Shrublands: Prediction, Uncertainties And Application To Coarser Scales, Aihua Li, Shital Dhakal, Nancy F. Glenn, Lucas P. Spaete

Geosciences Faculty Publications and Presentations

Our study objectives were to model the aboveground biomass in a xeric shrub-steppe landscape with airborne light detection and ranging (Lidar) and explore the uncertainty associated with the models we created. We incorporated vegetation vertical structure information obtained from Lidar with ground-measured biomass data, allowing us to scale shrub biomass from small field sites (1 m subplots and 1 ha plots) to a larger landscape. A series of airborne Lidar-derived vegetation metrics were trained and linked with the field-measured biomass in Random Forests (RF) regression models. A Stepwise Multiple Regression (SMR) model was also explored as a comparison. Our results …


Inferring Spread Of Readers’ Emotion Affected By Online News, Agus Sulistya, Ferdian Thung, David Lo Sep 2017

Inferring Spread Of Readers’ Emotion Affected By Online News, Agus Sulistya, Ferdian Thung, David Lo

Research Collection School Of Computing and Information Systems

Depending on the reader, A news article may be viewed from many different perspectives, thus triggering different (and possibly contradicting) emotions. In this paper, we formulate a problem of predicting readers’ emotion distribution affected by a news article. Our approach analyzes affective annotations provided by readers of news articles taken from a non-English online news site. We create a new corpus from the annotated articles, and build a domain-specific emotion lexicon and word embedding features. We finally construct a multi-target regression model from a set of features extracted from online news articles. Our experiments show that by combining lexicon and …


Sugarmate: Non-Intrusive Blood Glucose Monitoring With Smartphones, Weixi Gu, Yuxun Zhou, Zimu Zhou, Xi Liu, Han Zou, Pei Zhang, Costas J. Spanos, Lin Zhang Sep 2017

Sugarmate: Non-Intrusive Blood Glucose Monitoring With Smartphones, Weixi Gu, Yuxun Zhou, Zimu Zhou, Xi Liu, Han Zou, Pei Zhang, Costas J. Spanos, Lin Zhang

Research Collection School Of Computing and Information Systems

Inferring abnormal glucose events such as hyperglycemia and hypoglycemia is crucial for the health of both diabetic patients and non-diabetic people. However, regular blood glucose monitoring can be invasive and inconvenient in everyday life. We present SugarMate, a first smartphone-based blood glucose inference system as a temporary alternative to continuous blood glucose monitors (CGM) when they are uncomfortable or inconvenient to wear. In addition to the records of food, drug and insulin intake, it leverages smartphone sensors to measure physical activities and sleep quality automatically. Provided with the imbalanced and often limited measurements, a challenge of SugarMate is the inference …


Nonparametric Variable Importance Assessment Using Machine Learning Techniques, Brian D. Williamson, Peter B. Gilbert, Noah Simon, Marco Carone Aug 2017

Nonparametric Variable Importance Assessment Using Machine Learning Techniques, Brian D. Williamson, Peter B. Gilbert, Noah Simon, Marco Carone

UW Biostatistics Working Paper Series

In a regression setting, it is often of interest to quantify the importance of various features in predicting the response. Commonly, the variable importance measure used is determined by the regression technique employed. For this reason, practitioners often only resort to one of a few regression techniques for which a variable importance measure is naturally defined. Unfortunately, these regression techniques are often sub-optimal for predicting response. Additionally, because the variable importance measures native to different regression techniques generally have a different interpretation, comparisons across techniques can be difficult. In this work, we study a novel variable importance measure that can …


Improving Pure-Tone Audiometry Using Probabilistic Machine Learning Classification, Xinyu Song Aug 2017

Improving Pure-Tone Audiometry Using Probabilistic Machine Learning Classification, Xinyu Song

McKelvey School of Engineering Theses & Dissertations

Hearing loss is a critical public health concern, affecting hundreds millions of people worldwide and dramatically impacting quality of life for affected individuals. While treatment techniques have evolved in recent years, methods for assessing hearing ability have remained relatively unchanged for decades. The standard clinical procedure is the modified Hughson-Westlake procedure, an adaptive pure-tone detection task that is typically performed manually by audiologists, costing millions of collective hours annually among healthcare professionals. In addition to the high burden of labor, the technique provides limited detail about an individual’s hearing ability, estimating only detection thresholds at a handful of pre-defined pure-tone …


Effect Of Label Noise On The Machine-Learned Classification Of Earthquake Damage, Jared Frank, Umaa Rebbapragada, James Bialas, Thomas Oommen, Timothy C. Havens Aug 2017

Effect Of Label Noise On The Machine-Learned Classification Of Earthquake Damage, Jared Frank, Umaa Rebbapragada, James Bialas, Thomas Oommen, Timothy C. Havens

Michigan Tech Publications

Automated classification of earthquake damage in remotely-sensed imagery using machine learning techniques depends on training data, or data examples that are labeled correctly by a human expert as containing damage or not. Mislabeled training data are a major source of classifier error due to the use of imprecise digital labeling tools and crowdsourced volunteers who are not adequately trained on or invested in the task. The spatial nature of remote sensing classification leads to the consistent mislabeling of classes that occur in close proximity to rubble, which is a major byproduct of earthquake damage in urban areas. In this study, …


Applying Machine Learning To Computational Chemistry: Can We Predict Molecular Properties Faster Without Compromising Accuracy?, Hanjing Xu, Pradeep Gurunathan, Lyudmila Slipchenko Aug 2017

Applying Machine Learning To Computational Chemistry: Can We Predict Molecular Properties Faster Without Compromising Accuracy?, Hanjing Xu, Pradeep Gurunathan, Lyudmila Slipchenko

The Summer Undergraduate Research Fellowship (SURF) Symposium

Non-covalent interactions are crucial in analyzing protein folding and structure, function of DNA and RNA, structures of molecular crystals and aggregates, and many other processes in the fields of biology and chemistry. However, it is time and resource consuming to calculate such interactions using quantum-mechanical formulations. Our group has proposed previously that the effective fragment potential (EFP) method could serve as an efficient alternative to solve this problem. However, one of the computational bottlenecks of the EFP method is obtaining parameters for each molecule/fragment in the system, before the actual EFP simulations can be carried out. Here we present a …


Machine Learning In Xenon1t Analysis, Dillon A. Davis, Rafael F. Lang, Darryl P. Masson Aug 2017

Machine Learning In Xenon1t Analysis, Dillon A. Davis, Rafael F. Lang, Darryl P. Masson

The Summer Undergraduate Research Fellowship (SURF) Symposium

In process of analyzing large amounts of quantitative data, it can be quite time consuming and challenging to uncover populations of interest contained amongst the background data. Therefore, the ability to partially automate the process while gaining additional insight into the interdependencies of key parameters via machine learning seems quite appealing. As of now, the primary means of reviewing the data is by manually plotting data in different parameter spaces to recognize key features, which is slow and error prone. In this experiment, many well-known machine learning algorithms were applied to a dataset to attempt to semi-automatically identify known populations, …


Predicting Locations Of Pollution Sources Using Convolutional Neural Networks, Yiheng Chi, Nickolas D. Winovich, Guang Lin Aug 2017

Predicting Locations Of Pollution Sources Using Convolutional Neural Networks, Yiheng Chi, Nickolas D. Winovich, Guang Lin

The Summer Undergraduate Research Fellowship (SURF) Symposium

Pollution is a severe problem today, and the main challenge in water and air pollution controls and eliminations is detecting and locating pollution sources. This research project aims to predict the locations of pollution sources given diffusion information of pollution in the form of array or image data. These predictions are done using machine learning. The relations between time, location, and pollution concentration are first formulated as pollution diffusion equations, which are partial differential equations (PDEs), and then deep convolutional neural networks are built and trained to solve these PDEs. The convolutional neural networks consist of convolutional layers, reLU layers …


Asymptotically Unbiased Estimation Of A Nonsymmetric Dependence Measure Applied To Sensor Data Analytics And Financial Time Series, Angel Caƫaron, Razvan Andonie, Yvonne Chueh Aug 2017

Asymptotically Unbiased Estimation Of A Nonsymmetric Dependence Measure Applied To Sensor Data Analytics And Financial Time Series, Angel Caƫaron, Razvan Andonie, Yvonne Chueh

All Faculty Scholarship for the College of the Sciences

A fundamental concept frequently applied to statistical machine learning is the detection of dependencies between unknown random variables found from data samples. In previous work, we have introduced a nonparametric unilateral dependence measure based on Onicescu’s information energy and a kNN method for estimating this measure from an available sample set of discrete or continuous variables. This paper provides the formal proofs which show that the estimator is asymptotically unbiased and has asymptotic zero variance when the sample size increases. It implies that the estimator has good statistical qualities. We investigate the performance of the estimator for data analysis applications …


Speech Processing Approach For Diagnosing Dementia In An Early Stage, Roozbeh Sadeghian, J. David Schaffer, Stephen A. Zahorian Aug 2017

Speech Processing Approach For Diagnosing Dementia In An Early Stage, Roozbeh Sadeghian, J. David Schaffer, Stephen A. Zahorian

Faculty Works

The clinical diagnosis of Alzheimer’s disease and other dementias is very challenging, especially in the early stages. Our hypothesis is that any disease that affects particular brain regions involved in speech production and processing will also leave detectable finger prints in the speech. Computerized analysis of speech signals and computational linguistics have progressed to the point where an automatic speech analysis system is a promising approach for a low-cost non-invasive diagnostic tool for early detection of Alzheimer’s disease.

We present empirical evidence that strong discrimination between subjects with a diagnosis of probable Alzheimer’s versus matched normal controls can be achieved …


Analyzing The Relationship Between Human Behavior And Indoor Air Quality, Beiyu Lin, Yibo Huangfu, Nathan Lima, Bertram Jobson, Max Kirk, Patrick O’Keeffe, Shelley N. Pressley, Von Walden, Brian Lamb, Diane J. Cook Aug 2017

Analyzing The Relationship Between Human Behavior And Indoor Air Quality, Beiyu Lin, Yibo Huangfu, Nathan Lima, Bertram Jobson, Max Kirk, Patrick O’Keeffe, Shelley N. Pressley, Von Walden, Brian Lamb, Diane J. Cook

Computer Science Faculty Publications and Presentations

In the coming decades, as we experience global population growth and global aging issues, there will be corresponding concerns about the quality of the air we experience inside and outside buildings. Because we can anticipate that there will be behavioral changes that accompany population growth and aging, we examine the relationship between home occupant behavior and indoor air quality. To do this, we collect both sensor-based behavior data and chemical indoor air quality measurements in smart home environments. We introduce a novel machine learning-based approach to quantify the correlation between smart home features and chemical measurements of air quality, and …


Accurate And Justifiable : New Algorithms For Explainable Recommendations., Behnoush Abdollahi Aug 2017

Accurate And Justifiable : New Algorithms For Explainable Recommendations., Behnoush Abdollahi

Electronic Theses and Dissertations

Websites and online services thrive with large amounts of online information, products, and choices, that are available but exceedingly difficult to find and discover. This has prompted two major paradigms to help sift through information: information retrieval and recommender systems. The broad family of information retrieval techniques has given rise to the modern search engines which return relevant results, following a user's explicit query. The broad family of recommender systems, on the other hand, works in a more subtle manner, and do not require an explicit query to provide relevant results. Collaborative Filtering (CF) recommender systems are based on algorithms …


Dynamic Adversarial Mining - Effectively Applying Machine Learning In Adversarial Non-Stationary Environments., Tegjyot Singh Sethi Aug 2017

Dynamic Adversarial Mining - Effectively Applying Machine Learning In Adversarial Non-Stationary Environments., Tegjyot Singh Sethi

Electronic Theses and Dissertations

While understanding of machine learning and data mining is still in its budding stages, the engineering applications of the same has found immense acceptance and success. Cybersecurity applications such as intrusion detection systems, spam filtering, and CAPTCHA authentication, have all begun adopting machine learning as a viable technique to deal with large scale adversarial activity. However, the naive usage of machine learning in an adversarial setting is prone to reverse engineering and evasion attacks, as most of these techniques were designed primarily for a static setting. The security domain is a dynamic landscape, with an ongoing never ending arms race …


Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets Jul 2017

Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets

Computer Science Faculty Scholarship

The exploration of multidimensional datasets of all possible sizes and dimensions is a long-standing challenge in knowledge discovery, machine learning, and visualization. While multiple efficient visualization methods for n-D data analysis exist, the loss of information, occlusion, and clutter continue to be a challenge. This paper proposes and explores a new interactive method for visual discovery of n-D relations for supervised learning. The method includes automatic, interactive, and combined algorithms for discovering linear relations, dimension reduction, and generalization for non-linear relations. This method is a special category of reversible General Line Coordinates (GLC). It produces graphs in 2-D that represent …


Classification With Large Sparse Datasets: Convergence Analysis And Scalable Algorithms, Xiang Li Jul 2017

Classification With Large Sparse Datasets: Convergence Analysis And Scalable Algorithms, Xiang Li

Electronic Thesis and Dissertation Repository

Large and sparse datasets, such as user ratings over a large collection of items, are common in the big data era. Many applications need to classify the users or items based on the high-dimensional and sparse data vectors, e.g., to predict the profitability of a product or the age group of a user, etc. Linear classifiers are popular choices for classifying such datasets because of their efficiency. In order to classify the large sparse data more effectively, the following important questions need to be answered.

1. Sparse data and convergence behavior. How different properties of a dataset, such as …


Identifying Twitter Spam By Utilizing Random Forests, Humza S. Haider Jul 2017

Identifying Twitter Spam By Utilizing Random Forests, Humza S. Haider

Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal

The use of Twitter has rapidly grown since the first tweet in 2006. The number of spammers on Twitter shows a similar increase. Classifying users into spammers and non-spammers has been heavily researched, and new methods for spam detection are developing rapidly. One of these classification techniques is known as random forests. We examine three studies that employ random forests using user based features, geo-tagged features, and time dependent features. Each study showed high accuracy rates and F-measures with the exception of one model that had a test set with a more realistic proportion of spam relative to typical testing …


Deep Learning On Lie Groups For Skeleton-Based Action Recognition, Zhiwu Huang, C. Wan, T. Probst, Gool L. Van Jul 2017

Deep Learning On Lie Groups For Skeleton-Based Action Recognition, Zhiwu Huang, C. Wan, T. Probst, Gool L. Van

Research Collection School Of Computing and Information Systems

In recent years, skeleton-based action recognition has become a popular 3D classification problem. State-of-the-art methods typically first represent each motion sequence as a high-dimensional trajectory on a Lie group with an additional dynamic time warping, and then shallowly learn favorable Lie group features. In this paper we incorporate the Lie group structure into a deep network architecture to learn more appropriate Lie group features for 3D action recognition. Within the network structure, we design rotation mapping layers to transform the input Lie group features into desirable ones, which are aligned better in the temporal domain. To reduce the high feature …


Speech Based Machine Learning Models For Emotional State Recognition And Ptsd Detection, Debrup Banerjee Jul 2017

Speech Based Machine Learning Models For Emotional State Recognition And Ptsd Detection, Debrup Banerjee

Electrical & Computer Engineering Theses & Dissertations

Recognition of emotional state and diagnosis of trauma related illnesses such as posttraumatic stress disorder (PTSD) using speech signals have been active research topics over the past decade. A typical emotion recognition system consists of three components: speech segmentation, feature extraction and emotion identification. Various speech features have been developed for emotional state recognition which can be divided into three categories, namely, excitation, vocal tract and prosodic. However, the capabilities of different feature categories and advanced machine learning techniques have not been fully explored for emotion recognition and PTSD diagnosis. For PTSD assessment, clinical diagnosis through structured interviews is a …


Machine Learning With Scattering Transforms, Jacob Hansen, Gus Hart Jun 2017

Machine Learning With Scattering Transforms, Jacob Hansen, Gus Hart

Journal of Undergraduate Research

Our goal was to implement scattering transforms as a mathematical representation of materials. The intention of this project was to build intuition on this technique using model data in one and two dimensions. The tools created here will be used as templates in further projects on real materials data. The intuition built during this project is crucial to the machine learning framework for materials design that we hope to build in the near future.


Back To The Future: Logic And Machine Learning, Simon Dobnik, John D. Kelleher Jun 2017

Back To The Future: Logic And Machine Learning, Simon Dobnik, John D. Kelleher

Conference papers

In this paper we argue that since the beginning of the natural language processing or computational linguistics there has been a strong connection between logic and machine learning. First of all, there is something logical about language or linguistic about logic. Secondly, we argue that rather than distinguishing between logic and machine learning, a more useful distinction is between top-down approaches and data-driven approaches. Examining some recent approaches in deep learning we argue that they incorporate both properties and this is the reason for their very successful adoption to solve several problems within language technology.


Solving Algorithmic Problems In Finitely Presented Groups Via Machine Learning, Jonathan Gryak Jun 2017

Solving Algorithmic Problems In Finitely Presented Groups Via Machine Learning, Jonathan Gryak

Dissertations, Theses, and Capstone Projects

Machine learning and pattern recognition techniques have been successfully applied to algorithmic problems in free groups. In this dissertation, we seek to extend these techniques to finitely presented non-free groups, in particular to polycyclic and metabelian groups that are of interest to non-commutative cryptography.

As a prototypical example, we utilize supervised learning methods to construct classifiers that can solve the conjugacy decision problem, i.e., determine whether or not a pair of elements from a specified group are conjugate. The accuracies of classifiers created using decision trees, random forests, and N-tuple neural network models are evaluated for several non-free groups. …


The Ogcleaner: Detecting False-Positive Sequence Homology, Masaki Stanley Fujimoto Jun 2017

The Ogcleaner: Detecting False-Positive Sequence Homology, Masaki Stanley Fujimoto

Theses and Dissertations

Within bioinformatics, phylogenetics is the study of the evolutionary relationships between different species and organisms. The genetic revolution has caused an explosion in the amount of raw genomic information that is available to scientists for study. While there has been an explosion in available data, analysis methods have lagged behind. A key task in phylogenetics is identifying homology clusters. Current methods rely on using heuristics based on pairwise sequence comparison to identify homology clusters. We propose the Orthology Group Cleaner (the OGCleaner) as a method to evaluate cluster level verification of putative homology clusters in order to create higher quality …


Employing Smartwatch For Enhanced Password Authentication, Bing Chang, Ximing Liu, Yingjiu Li, Pingjian Wang, Wen-Tao Zhu, Zhan Wang Jun 2017

Employing Smartwatch For Enhanced Password Authentication, Bing Chang, Ximing Liu, Yingjiu Li, Pingjian Wang, Wen-Tao Zhu, Zhan Wang

Research Collection School Of Computing and Information Systems

This paper presents an enhanced password authentication scheme by systematically exploiting the motion sensors in a smartwatch. We extract unique features from the sensor data when a smartwatch bearer types his/her password (or PIN), and train certain machine learning classifiers using these features. We then implement smartwatch-aided password authentication using the classifiers. Our scheme is user-friendly since it does not require users to perform any additional actions when typing passwords or PINs other than wearing smartwatches. We conduct a user study involving 51 participants on the developed prototype so as to evaluate its feasibility and performance. Experimental results show that …


Tackling The Interleaving Problem In Activity Discovery, Eoin Rogers, Robert J. Ross, John D. Kelleher Jun 2017

Tackling The Interleaving Problem In Activity Discovery, Eoin Rogers, Robert J. Ross, John D. Kelleher

Conference papers

Activity discovery (AD) is the unsupervised process of discovering activities in data produced from streaming sensor networks that are recording the actions of human subjects. One major challenge for AD systems is interleaving, the tendency for people to carry out multiple activities at a time a parallel. Following on from our previous work, we continue to investigate AD in interleaved datasets, with a view towards progressing the state-of-the-art for AD.