Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine Learning

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 421 - 450 of 826

Full-Text Articles in Physical Sciences and Mathematics

Using Torchattacks To Improve The Robustness Of Models With Adversarial Training, William S. Matos Díaz Jan 2021

Using Torchattacks To Improve The Robustness Of Models With Adversarial Training, William S. Matos Díaz

Cybersecurity: Deep Learning Driven Cybersecurity Research in a Multidisciplinary Environment

Adversarial training has proven to be one of the most successful ways to defend models against adversarial examples. This process consists of training a model with an adversarial example to improve the robustness of the model. In this experiment, Torchattacks, a Pytorch library made for importing adversarial examples more easily, was used to determine which attack was the strongest. Later on, the strongest attack was used to train the model and make it more robust against adversarial examples. The datasets used to perform the experiments were MNIST and CIFAR-10. Both datasets were put to the test using PGD, FGSM, and …


Hamlton: Gross Box Office And Sentiment Analysis For Broadway Shows, Allyson K. Nace Jan 2021

Hamlton: Gross Box Office And Sentiment Analysis For Broadway Shows, Allyson K. Nace

Computer Science: Student Scholarship & Creative Works

The term ‘Broadway’ refers to the live theater performances, either plays or musicals, that take place in the 41 professional 500-seat-or-more theaters located in the Theater District and Lincoln Center in New York City, NY. The data utilized originated from Playbill.com, and it supplies detailed Broadway grosses broken down by week, theater, and individual shows dating back to 1985. To supplement this, data for two other datasets was collected. The first dataset consists of Tony Award wins broken down by show (close to 800 total plays and musicals) for each Broadway season (year) dating back to the year 1997. The …


Predicting The 2020 Us Presidential Election, Aidan Hanley Jan 2021

Predicting The 2020 Us Presidential Election, Aidan Hanley

Summer Scholarship, Creative Arts and Research Projects (SCARP)

The 2020 US Presidential Election was unique in many ways, and held a number of surprising results. Although the 2020 presidential election is over, there is insight to be gained by analyzing the factors that may influence a voter’s choice and building a model that could predict future presidential elections. Using the Democracy Fund Voter Study Group’s Nationscape public opinion survey, we’ve constructed a model using multiple logistic regression with L2 regularization for predicting which candidate a given respondent will vote for, taking into account how various factors such as age, ethnicity, education level, and orientation influence voter decisions. Our …


K-Nearest Neighbour Classifiers - A Tutorial, Padraig Cunningham, Sarah Jane Delany Jan 2021

K-Nearest Neighbour Classifiers - A Tutorial, Padraig Cunningham, Sarah Jane Delany

Conference papers

Perhaps the most straightforward classifier in the arsenal or Machine Learning techniques is the Nearest Neighbour Classifier – classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of …


Clustering Data To Classify Hearthstone Decks, Tim Inzitari Jan 2021

Clustering Data To Classify Hearthstone Decks, Tim Inzitari

Williams Honors College, Honors Research Projects

The esports game of "Hearthstone" is a collectible card game with a competitive format that has every team submit 4 decks of 30 cards each. Using K-Means clustering an adaptable way to group data for classifying can be made that works well in every update of the game. This system will take in a list of decks and cluster them to easily classify large amounts of information in a timely fashion. This system will be able to be used by the Universities esports department for years to come to aid the preparation of "Hearthstone" matches. This model uses qualities about …


Goes-R Supervised Machine Learning, Ronald Adomako Jan 2021

Goes-R Supervised Machine Learning, Ronald Adomako

Dissertations and Theses

The GOES-R series is a product line of four satellite, with two currently on-orbit (GOES-16 “East” and GOES-17 “West”). GOES-17 is susceptible to a Loop-Heat-Pipe (LHP) phenomenon where during Fall and Spring seasons, there are times of day where some of the infrared bands records inaccurate readings from the Advanced Baseline Imager (ABI). This occurs from joint astronomical behavior and position of the GOES-17. This calibration issue occurs when the LHP instrument fails to radiate the heat of the sun out of ABI. Predictive Calibration (pCal) is an algorithm developed by instrument vendors for the National Oceanic Atmospheric Agency (NOAA) …


Statistical And Machine Learning Approaches To Depressive Disorders Among Adults In The United States: From Factor Discovery To Prediction Evaluation, Minhwa Lee Jan 2021

Statistical And Machine Learning Approaches To Depressive Disorders Among Adults In The United States: From Factor Discovery To Prediction Evaluation, Minhwa Lee

Senior Independent Study Theses

According to the National Institutes of Mental Health (NIMH), depressive disorders (or major depression) are considered one of the most common and serious health risks in the United States. Our study focuses on extracting non-medical factors of depressive disorders diagnosis, such as overall health states, health risk behaviors, demography, and healthcare access, using the Behavioral Risk Factor Surveillance System (BRFSS) data set collected by the Centers for Disease Control and Prevention (CDC) in 2018.

We set the two objectives of our study about depressive disorders diagnosis in the United States as follows. First, we aim to utilize machine learning algorithms …


Forecasting The Daily Percentage Of Delayed Flights Based On The National Weather Data, Parto Mahmoudi Jan 2021

Forecasting The Daily Percentage Of Delayed Flights Based On The National Weather Data, Parto Mahmoudi

Graduate Student Theses, Dissertations, & Professional Papers

Flight delays cost airlines and affect passenger’s satisfaction. In this research work, we predicted the daily percentage of delayed flights based on the national weather data using the multiple linear regression and the random forest models. We extracted the passenger flight on-time performance data from the Bureau of Transportation Statistics and the weather dataset from NOAA National Centers for Environmental Information for the years from 2015 to 2019. We used the flight dataset for Seattle airport as the origin. We predicted the daily percentage of delayed flights for the Seattle-originated flights based on the features such as weather conditions of …


Cascaded Deep Learning Network For Postearthquake Bridge Serviceability Assessment, Youjeong Jang Jan 2021

Cascaded Deep Learning Network For Postearthquake Bridge Serviceability Assessment, Youjeong Jang

Electronic Theses and Dissertations

Damages assessment of bridges is important to derive immediate response after severe events to decide serviceability. Especially, past earthquakes have proven the vulnerability of bridges with insufficient detailing. Due to lack of a national and unified post-earthquake inspection procedure for bridges, conventional damage assessments are performed by sending professional personnel to the onsite, detecting visually and measuring the damage state. To get accurate and fast damage result of bridge condition is important to save not only lives but also costs.
There have been studies using image processing techniques to assess damage of bridge column without sending individual to onsite. Convolutional …


Log Analysis And Visualization Of Hpc Application Performance Data, Ryan David Lewis Jan 2021

Log Analysis And Visualization Of Hpc Application Performance Data, Ryan David Lewis

Graduate Research Theses & Dissertations

High-performance computing (HPC) resources at facilities such as Argonne National Laboratory's Leadership Computing Facility (ALCF) enable a wide array of scientific experiments and research applications. In day-to-day operation, these platforms collect copious amounts of system, performance, and debugging logs, capturing data about how jobs, individual tasks, and the system as a whole, operate and perform. This thesis builds on previous efforts to examine how these logs can be used to better understand user and application behavior and system resource usage, in addition to demonstrating machine-learning-based (ML-based) techniques for characterizing applications and predicting job behavior using log data. Five datasets collected …


Feature Investigation For Stock Returns Prediction Using Xgboost And Deep Learning Sentiment Classification, Seungho (Samuel) Lee Jan 2021

Feature Investigation For Stock Returns Prediction Using Xgboost And Deep Learning Sentiment Classification, Seungho (Samuel) Lee

CMC Senior Theses

This paper attempts to quantify predictive power of social media sentiment and financial data in stock prediction by utilizing a comprehensive set of stock-related fundamental and technical variables and social media sentiments. For conducting sentiment analysis, this study employs a pretrained finBERT model that provides three different sentiment classifications and respective softmax scores. Hence, the significance of these variables is evaluated with XGBoost regression and Shapley Additive exPlanations (SHAP) frameworks. Through investigating feature importance, this study finds that statistical properties of sentiment variables provide a stronger predictive power than a weighted sentiment score and that it is possible to quantify …


Automatic Fall Risk Detection Based On Imbalanced Data, Yen-Hung Liu, Patrick C. K. Hung, Farkhund Iqbal, Benjamin C. M. Fung Jan 2021

Automatic Fall Risk Detection Based On Imbalanced Data, Yen-Hung Liu, Patrick C. K. Hung, Farkhund Iqbal, Benjamin C. M. Fung

All Works

In recent years, the declining birthrate and aging population have gradually brought countries into an ageing society. Regarding accidents that occur amongst the elderly, falls are an essential problem that quickly causes indirect physical loss. In this paper, we propose a pose estimation-based fall detection algorithm to detect fall risks. We use body ratio, acceleration and deflection as key features instead of using the body keypoints coordinates. Since fall data is rare in real-world situations, we train and evaluate our approach in a highly imbalanced data setting. We assess not only different imbalanced data handling methods but also different machine …


Source Code Comment Classification Artificial Intelligence, Cole Sutyak Jan 2021

Source Code Comment Classification Artificial Intelligence, Cole Sutyak

Williams Honors College, Honors Research Projects

Source code comment classification is an important problem for future machine learning solutions. In particular, supervised machine learning solutions that have largely subjective data labels but are difficult to obtain the labels for. Machine learning problems are problems largely because of a lack of data. In machine learning solutions, it is better to have a large amount of mediocre data than it is to have a small amount of good data. While the mediocre data might not produce the best accuracy, it produces the best results because there is much more to learn from the problem.

In this project, data …


Weakly Supervised Learning For Multi-Image Synthesis, Muhammad Usman Rafique Jan 2021

Weakly Supervised Learning For Multi-Image Synthesis, Muhammad Usman Rafique

Theses and Dissertations--Electrical and Computer Engineering

Machine learning-based approaches have been achieving state-of-the-art results on many computer vision tasks. While deep learning and convolutional networks have been incredibly popular, these approaches come at the expense of huge amounts of labeled data required for training. Manually annotating large amounts of data, often millions of images in a single dataset, is costly and time consuming. To deal with the problem of data annotation, the research community has been exploring approaches that require less amount of labelled data.

The central problem that we consider in this research is image synthesis without any manual labeling. Image synthesis is a classic …


Exposure Assessment Of Emerging Contaminants: Rapid Screening And Modeling Of Plant Uptake, Majid Bagheri Jan 2021

Exposure Assessment Of Emerging Contaminants: Rapid Screening And Modeling Of Plant Uptake, Majid Bagheri

Doctoral Dissertations

"With the advent of new chemicals and their increasing uses in every aspect of our life, considerable number of emerging contaminants are introduced to environment yearly. Emerging contaminants in forms of pharmaceuticals, detergents, biosolids, and reclaimed wastewater can cross plant roots and translocate to various parts of the plants. Long-term human exposure to emerging contaminants through food consumption is assumed to be a pathway of interest. Thus, uptake and translocation of emerging contaminants in plants are important for the assessment of health risks associated with human exposure to emerging contaminants. To have a better understanding over fate of emerging contaminants …


K-Nearest Neighbors Density-Based Clustering, Avory C. Bryant Jan 2021

K-Nearest Neighbors Density-Based Clustering, Avory C. Bryant

Theses and Dissertations

Traditional density-based clustering approaches rely on a distance-based parameter to define data connectivity and density. However, an appropriate value of this parameter can be difficult to determine as it is highly dependent on the underlying distribution of the data. In particular, distribution parameters affect the scale of inter-group distances (e.g., variance); this dependence leads to a well-known inability to simultaneously detect clusters at varying levels of density. In this work, connectivity and density are defined according to the rank-order induced by the distance metric (i.e., invariant to the expected scale of the distances). Connectivity by k-nearest neighbors and density by …


Relevance-Tcav: Explaining Deep Neural Nets In Human Concepts, Henning Fischel Jan 2021

Relevance-Tcav: Explaining Deep Neural Nets In Human Concepts, Henning Fischel

Senior Projects Spring 2021

Neural Networks, a form of machine learning, are used in increasingly important roles in the modern world. They are being used in self-driving cars and medical diagnoses. However, they are “Black Boxes”: they cannot be easily interpreted by humans. This project combines two methods of explaining a neural network’s decisions in an attempt to improve their accuracy. This new method, relevance-based testing with concept activation vectors (R-TCAV), yields promising results on two small experiments but is less precise than the previous TCAV method.


Improving The Data Quality In Gravitation-Wave Detectors By Mitigating Transient Noise Artifacts, Kentaro Mogushi Jan 2021

Improving The Data Quality In Gravitation-Wave Detectors By Mitigating Transient Noise Artifacts, Kentaro Mogushi

Doctoral Dissertations

“The existence of gravitational waves (GWs), small perturbations in spacetime produced by accelerating massive objects was first predicted in 1916 as solutions of Einstein’s Theory of General Relativity (Einstein, 1916). Detecting and analyzing GWs produced by sources allows us to probe astrophysical phenomena.

The era of GW astronomy began from the first direct detection of the coalescence of a binary black hole in 2015 by the collaboration of the advanced Laser Interferometer Gravitational-wave Observatory (LIGO) (Aasi et al., 2015) and advanced Virgo (Abbott et al., 2016a). Since 2015, LIGO-Virgo detected about 50 confident transient events of GW signals (Abbott et …


Super-Resolution Imaging Of Remote Sensed Brightness Temperature Using A Convolutional Neural Network, Kellen A. Donahue Jan 2021

Super-Resolution Imaging Of Remote Sensed Brightness Temperature Using A Convolutional Neural Network, Kellen A. Donahue

Graduate Student Theses, Dissertations, & Professional Papers

Steady improvements to the instruments used in remote sensing has led to much higher resolution data, often contemporaneous with lower resolution instruments that continue to collect data. There is a clear opportunity to reconcile recent high resolution satellite data with the lower resolution data of the past. Super-resolution (SR) imaging is a technique that increases the spatial resolution of image data by training statistical methods on simultaneously occurring lower and higher resolution data sets. The special sensor microwave/imager (SSMI) and advanced microwave scanning radiometer (AMSR2) brightness temperature data products are well suited to super-resolution imaging, and SR can be used …


Visualization For Solving Non-Image Problems And Saliency Mapping, Divya Chandrika Kalla Jan 2021

Visualization For Solving Non-Image Problems And Saliency Mapping, Divya Chandrika Kalla

All Master's Theses

High-dimensional data play an important role in knowledge discovery and data science. Integration of visualization, visual analytics, machine learning (ML), and data mining (DM) are the key aspects of data science research for high-dimensional data. This thesis is to explore the efficiency of a new algorithm to convert non-images data into raster images by visualizing data using heatmap in the collocated paired coordinates (CPC). These images are called the CPC-R images and the algorithm that produces them is called the CPC-R algorithm. Powerful deep learning methods open an opportunity to solve non-image ML/DM problems by transforming non-image ML problems into …


A Compact Wavelength Meter Using A Multimode Fiber, Ogbole Collins Inalegwu Jan 2021

A Compact Wavelength Meter Using A Multimode Fiber, Ogbole Collins Inalegwu

Masters Theses

“Wavelength meters are very important for precision measurements of both pulses and continuous-wave optical sources. Conventional wavelength meters employ gratings, prisms, interferometers, and other wavelength-sensitive materials in their design. Here, we report a simple and compact wavelength meter based on a section of multimode fiber and a camera. The concept is to correlate the multimodal interference pattern (i.e., speckle pattern) at the end-face of a multimode fiber with the wavelength of the input lightsource. Through a series of experiments, specklegrams from the end face of a multimode fiber as captured by a charge-coupled device (CCD) camera were recorded; the images …


Neural Network Supervised And Reinforcement Learning For Neurological, Diagnostic, And Modeling Problems, Donald Wunsch Iii Jan 2021

Neural Network Supervised And Reinforcement Learning For Neurological, Diagnostic, And Modeling Problems, Donald Wunsch Iii

Masters Theses

“As the medical world becomes increasingly intertwined with the tech sphere, machine learning on medical datasets and mathematical models becomes an attractive application. This research looks at the predictive capabilities of neural networks and other machine learning algorithms, and assesses the validity of several feature selection strategies to reduce the negative effects of high dataset dimensionality. Our results indicate that several feature selection methods can maintain high validation and test accuracy on classification tasks, with neural networks performing best, for both single class and multi-class classification applications. This research also evaluates a proof-of-concept application of a deep-Q-learning network (DQN) to …


Association Of Incident Cancer To Low-Value Care And Healthcare Cost Burden Among Elderly Medicare Beneficiaries, Chibuzo Iloabuchi Jan 2021

Association Of Incident Cancer To Low-Value Care And Healthcare Cost Burden Among Elderly Medicare Beneficiaries, Chibuzo Iloabuchi

Graduate Theses, Dissertations, and Problem Reports

In the United States (US), 25% of healthcare spending is considered wasteful because it is spent reimbursing low-value care. Low-value care is the utilization of healthcare services, medical tests, and procedures that have unclear or no clinical benefit to patients but still exposes them to risk. World-wide, low-value care imposes a significant economic burden on patients, payers, governments, and society. Cancer care among older adults > 65 years is one of the biggest drivers of healthcare expenditure in the US and accounts for nearly 40% of all spending, and low-value care among cancer patients is prevalent and contributes to the financial …


Detecting Surface Interactions Via A Wearable Microphone To Improve Augmented Reality Text Entry, R. Habibi Jan 2021

Detecting Surface Interactions Via A Wearable Microphone To Improve Augmented Reality Text Entry, R. Habibi

Dissertations, Master's Theses and Master's Reports

This thesis investigates whether we can detect and distinguish between surface interaction events such as tapping or swiping using a wearable mic from a surface. Also, what are the advantages of new text entry methods such as tapping with two fingers simultaneously to enter capital letters and punctuation? For this purpose, we conducted a remote study to collect audio and video of three different ways people might interact with a surface. We also built a CNN classifier to detect taps. Our results show that we can detect and distinguish between surface interaction events such as tap or swipe via a …


Measuring Machine Learning Model Uncertainty With Applications To Aerial Segmentation, Kevin James Cotton Jan 2021

Measuring Machine Learning Model Uncertainty With Applications To Aerial Segmentation, Kevin James Cotton

CGU Theses & Dissertations

Machine learning model performance on both validation data and new data can be better measured and understood by leveraging uncertainty metrics at the time of prediction. These metrics can improve the model training process by indicating which training data need to be corrected and what part of the domain needs further annotation. The methods described have yet to reach mainstream adoption, and show great potential. Here, we survey the field of uncertainty metrics and provide a robust framework for its application to aerial segmentation. Uncertainty is divided into two types: aleatoric and epistemic. Aleatoric uncertainty arises from variations in training …


Distributed Load Testing By Modeling And Simulating User Behavior, Chester Ira Parrott Dec 2020

Distributed Load Testing By Modeling And Simulating User Behavior, Chester Ira Parrott

LSU Doctoral Dissertations

Modern human-machine systems such as microservices rely upon agile engineering practices which require changes to be tested and released more frequently than classically engineered systems. A critical step in the testing of such systems is the generation of realistic workloads or load testing. Generated workload emulates the expected behaviors of users and machines within a system under test in order to find potentially unknown failure states. Typical testing tools rely on static testing artifacts to generate realistic workload conditions. Such artifacts can be cumbersome and costly to maintain; however, even model-based alternatives can prevent adaptation to changes in a system …


Data: The Good, The Bad And The Ethical, John D. Kelleher, Filipe Cabral Pinto, Luis M. Cortesao Dec 2020

Data: The Good, The Bad And The Ethical, John D. Kelleher, Filipe Cabral Pinto, Luis M. Cortesao

Articles

It is often the case with new technologies that it is very hard to predict their long-term impacts and as a result, although new technology may be beneficial in the short term, it can still cause problems in the longer term. This is what happened with oil by-products in different areas: the use of plastic as a disposable material did not take into account the hundreds of years necessary for its decomposition and its related long-term environmental damage. Data is said to be the new oil. The message to be conveyed is associated with its intrinsic value. But as in …


Representational Learning Approach For Predicting Developer Expertise Using Eye Movements, Sumeet Maan Dec 2020

Representational Learning Approach For Predicting Developer Expertise Using Eye Movements, Sumeet Maan

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

The thesis analyzes an existing eye-tracking dataset collected while software developers were solving bug fixing tasks in an open-source system. The analysis is performed using a representational learning approach namely, Multi-layer Perceptron (MLP). The novel aspect of the analysis is the introduction of a new feature engineering method based on the eye-tracking data. This is then used to predict developer expertise on the data. The dataset used in this thesis is inherently more complex because it is collected in a very dynamic environment i.e., the Eclipse IDE using an eye-tracking plugin, iTrace. Previous work in this area only worked on …


Attentional Parsing Networks, Marcus Karr Dec 2020

Attentional Parsing Networks, Marcus Karr

Master's Theses

Convolutional neural networks (CNNs) have dominated the computer vision field since the early 2010s, when deep learning largely replaced previous approaches like hand-crafted feature engineering and hierarchical image parsing. Meanwhile transformer architectures have attained preeminence in natural language processing, and have even begun to supplant CNNs as the state of the art for some computer vision tasks.

This study proposes a novel transformer-based architecture, the attentional parsing network, that reconciles the deep learning and hierarchical image parsing approaches to computer vision. We recast unsupervised image representation as a sequence-to-sequence translation problem where image patches are mapped to successive layers …


Cross Dataset Evaluation For Iot Network Intrusion Detection, Anjum Farah Dec 2020

Cross Dataset Evaluation For Iot Network Intrusion Detection, Anjum Farah

Theses and Dissertations

With the advent of Internet of Things (IOT) technology, the need to ensure the security of an IOT network has become important. There are several intrusion detection systems (IDS) that are available for analyzing and predicting network anomalies and threats. However, it is challenging to evaluate them to realistically estimate their performance when deployed. A lot of research has been conducted where the training and testing is done using the same simulated dataset. However, realistically, a network on which an intrusion detection model is deployed will be very different from the network on which it was trained. The aim of …