Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine learning

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1171 - 1200 of 1687

Full-Text Articles in Physical Sciences and Mathematics

Highly Accurate Fragment Library For Protein Fold Recognition, Wessam Elhefnawy Apr 2019

Highly Accurate Fragment Library For Protein Fold Recognition, Wessam Elhefnawy

Computer Science Theses & Dissertations

Proteins play a crucial role in living organisms as they perform many vital tasks in every living cell. Knowledge of protein folding has a deep impact on understanding the heterogeneity and molecular functions of proteins. Such information leads to crucial advances in drug design and disease understanding. Fold recognition is a key step in the protein structure discovery process, especially when traditional computational methods fail to yield convincing structural homologies. In this work, we present a new protein fold recognition approach using machine learning and data mining methodologies.

First, we identify a protein structural fragment library (Frag-K) composed of a …


A Data-Driven Approach For Modeling Agents, Hamdi Kavak Apr 2019

A Data-Driven Approach For Modeling Agents, Hamdi Kavak

Computational Modeling & Simulation Engineering Theses & Dissertations

Agents are commonly created on a set of simple rules driven by theories, hypotheses, and assumptions. Such modeling premise has limited use of real-world data and is challenged when modeling real-world systems due to the lack of empirical grounding. Simultaneously, the last decade has witnessed the production and availability of large-scale data from various sensors that carry behavioral signals. These data sources have the potential to change the way we create agent-based models; from simple rules to driven by data. Despite this opportunity, the literature has neglected to offer a modeling approach to generate granular agent behaviors from data, creating …


Evaluating Machine Learning Techniques For Smart Home Device Classification, Angelito E. Aragon Jr. Mar 2019

Evaluating Machine Learning Techniques For Smart Home Device Classification, Angelito E. Aragon Jr.

Theses and Dissertations

Smart devices in the Internet of Things (IoT) have transformed the management of personal and industrial spaces. Leveraging inexpensive computing, smart devices enable remote sensing and automated control over a diverse range of processes. Even as IoT devices provide numerous benefits, it is vital that their emerging security implications are studied. IoT device design typically focuses on cost efficiency and time to market, leading to limited built-in encryption, questionable supply chains, and poor data security. In a 2017 report, the United States Government Accountability Office recommended that the Department of Defense investigate the risks IoT devices pose to operations security, …


Confidence Inference In Defensive Cyber Operator Decision Making, Graig S. Ganitano Mar 2019

Confidence Inference In Defensive Cyber Operator Decision Making, Graig S. Ganitano

Theses and Dissertations

Cyber defense analysts face the challenge of validating machine generated alerts regarding network-based security threats. Operations tempo and systematic manpower issues have increased the importance of these individual analyst decisions, since they typically are not reviewed or changed. Analysts may not always be confident in their decisions. If confidence can be accurately assessed, then analyst decisions made under low confidence can be independently reviewed and analysts can be offered decision assistance or additional training. This work investigates the utility of using neurophysiological and behavioral correlates of decision confidence to train machine learning models to infer confidence in analyst decisions. Electroencephalography …


Characterization Of Tropical Cyclone Intensity Using Microwave Imagery, Amanda M. Nelson Mar 2019

Characterization Of Tropical Cyclone Intensity Using Microwave Imagery, Amanda M. Nelson

Theses and Dissertations

In the absence of wind speed data from aircraft reconnaissance of tropical cyclones (TCs), analysts rely on remote sensing tools to estimate TC intensity. For over 40 years, the Dvorak technique has been applied to estimate intensity using visible and infrared (IR) satellite imagery, but its accuracy is sometimes limited when the radiative effects of high clouds obscure the TC convective structure below. Microwave imagery highlights areas of precipitation and deep convection revealing different patterns than visible and IR imagery. This study explores application of machine learning algorithms to identify patterns in microwave imagery to infer storm intensity, particularly focusing …


A Performance Comparison Of Machine Learning Algorithms For Arced Labyrinth Spillways, Fernando Salazar, Brian M. Crookston Mar 2019

A Performance Comparison Of Machine Learning Algorithms For Arced Labyrinth Spillways, Fernando Salazar, Brian M. Crookston

Publications

Labyrinth weirs provide an economic option for flow control structures in a variety of applications, including as spillways at dams. The cycles of labyrinth weirs are typically placed in a linear configuration. However, numerous projects place labyrinth cycles along an arc to take advantage of reservoir conditions and dam alignment, and to reduce construction costs such as narrowing the spillway chute. Practitioners must optimize more than 10 geometric variables when developing a head–discharge relationship. This is typically done using the following tools: empirical relationships, numerical modeling, and physical modeling. This study applied a new tool, machine learning, to the analysis …


Computational Regiospecific Analysis Of Brain Lipidomic Profiles, Austin Ahlstrom Mar 2019

Computational Regiospecific Analysis Of Brain Lipidomic Profiles, Austin Ahlstrom

Undergraduate Honors Theses

Mass spectrometry provides an extensive data set that can prove unwieldy for practical analytical purposes. Applying programming and machine learning methods to automate region analysis in DESI mass spectrometry of mouse brain tissue can help direct and refine such an otherwise unusable data set. The results carry promise of faster, more reliable analysis of this type, and yield interesting insights into molecular characteristics of regions of interest within these brain samples. These results have significant implications in continued investigation of molecular processes in the brain, along with other aspects of mass spectrometry, collective analysis of biological molecules (i.e. omics), and …


Interim Performance Report, Lg‐71‐16‐0152‐16, Extending Intelligent Computational Image Analysis For Archival Discovery, March 2019, Elizabeth Lorang, Leen-Kiat Soh, John O'Brien Mar 2019

Interim Performance Report, Lg‐71‐16‐0152‐16, Extending Intelligent Computational Image Analysis For Archival Discovery, March 2019, Elizabeth Lorang, Leen-Kiat Soh, John O'Brien

CDRH Grant Reports

The primary goal of "Extending Intelligent Computational Image Analysis for Archival Discovery" is to investigate the use of image analysis as a methodology for content identification, description, and information retrieval in digital libraries and other digitized collections. Building on work started under a National Endowment for the Humanities' Office of Digital Humanities Start-up Grant, our IMLS project seeks to 1) analyze and verify our previously developed image analysis approach and extend it so that it is newspaper agnostic, type agnostic, and language agnostic; 2) scale and revise the intelligent image analysis approach and determine the ideal balance between precision and …


Confusion Prediction From Eye-Tracking Data: Experiments With Machine Learning, Joni Salminen, Mridul Nagpal, Haewoon Kwak, Jisun An, Soon-Gyo Jung, Bernard J. Jansen Mar 2019

Confusion Prediction From Eye-Tracking Data: Experiments With Machine Learning, Joni Salminen, Mridul Nagpal, Haewoon Kwak, Jisun An, Soon-Gyo Jung, Bernard J. Jansen

Research Collection School Of Computing and Information Systems

Predicting user confusion can help improve information presentation on websites, mobile apps, and virtual reality interfaces. One promising information source for such prediction is eye-tracking data about gaze movements on the screen. Coupled with think-aloud records, we explore if user's confusion is correlated with primarily fixation-level features. We find that random forest achieves an accuracy of more than 70% when prediction user confusion using only fixation features. In addition, adding user-level features (age and gender) improves the accuracy to more than 90%. We also find that balancing the classes before training improves performance. We test two balancing algorithms, Synthetic Minority …


Kaggle And Click-Through Rate Prediction, Todd W. Neller Feb 2019

Kaggle And Click-Through Rate Prediction, Todd W. Neller

Computer Science Faculty Publications

Neller presented a look at Kaggle.com, an online Data Science and Machine Learning learning community, as a place to seek rapid, experiential peer education for most any Data Science topic. Using the specific challenge of Click-Through Rate Prediction (CTRP), he focused on lessons learned from relevant Kaggle competitions on how to perform CTRP.


Stock Market Prediction Analysis By Incorporating Social And News Opinion And Sentiment, Zhaoxia Wang, Seng-Beng Ho, Zhiping Lin Feb 2019

Stock Market Prediction Analysis By Incorporating Social And News Opinion And Sentiment, Zhaoxia Wang, Seng-Beng Ho, Zhiping Lin

Research Collection School Of Computing and Information Systems

The price of the stocks is an important indicator for a company and many factors can affect their values. Different events may affect public sentiments and emotions differently, which may have an effect on the trend of stock market prices. Because of dependency on various factors, the stock prices are not static, but are instead dynamic, highly noisy and nonlinear time series data. Due to its great learning capability for solving the nonlinear time series prediction problems, machine learning has been applied to this research area. Learning-based methods for stock price prediction are very popular and a lot of enhanced …


Spectral Clustering For Electrical Phase Identification Using Advanced Metering Infrastructure Voltage Time Series, Logan Blakely Jan 2019

Spectral Clustering For Electrical Phase Identification Using Advanced Metering Infrastructure Voltage Time Series, Logan Blakely

Dissertations and Theses

The increasing demand for and prevalence of distributed energy resources (DER) such as solar power, electric vehicles, and energy storage, present a unique set of challenges for integration into a legacy power grid, and accurate models of the low-voltage distribution systems are critical for accurate simulations of DER. Accurate labeling of the phase connections for each customer in a utility model is one area of grid topology that is known to have errors and has implications for the safety, efficiency, and hosting capacity of a distribution system. This research presents a methodology for the phase identification of customers solely using …


Knowing Without Knowing: Real-Time Usage Identification Of Computer Systems, Leila Mohammed Hawana Jan 2019

Knowing Without Knowing: Real-Time Usage Identification Of Computer Systems, Leila Mohammed Hawana

Dissertations and Theses

Contemporary computers attempt to understand a user's actions and preferences in order to make decisions that better serve the user. In pursuit of this goal, computers can make observations that range from simple pattern recognition to listening in on conversations without the device being intentionally active. While these developments are incredibly useful for customization, the inherent security risks involving personal data are not always worth it. This thesis attempts to tackle one issue in this domain, computer usage identification, and presents a solution that identifies high-level usage of a system at any given moment without looking into any personal data. …


The Benefits Of Artificial Intelligence In Cybersecurity, Ricardo Calderon Jan 2019

The Benefits Of Artificial Intelligence In Cybersecurity, Ricardo Calderon

Economic Crime Forensics Capstones

Cyberthreats have increased extensively during the last decade. Cybercriminals have become more sophisticated. Current security controls are not enough to defend networks from the number of highly skilled cybercriminals. Cybercriminals have learned how to evade the most sophisticated tools, such as Intrusion Detection and Prevention Systems (IDPS), and botnets are almost invisible to current tools. Fortunately, the application of Artificial Intelligence (AI) may increase the detection rate of IDPS systems, and Machine Learning (ML) techniques are able to mine data to detect botnets’ sources. However, the implementation of AI may bring other risks, and cybersecurity experts need to find a …


Dc-Rts Noise: Observation And Analysis, Benjamin William Hendrickson Jan 2019

Dc-Rts Noise: Observation And Analysis, Benjamin William Hendrickson

Dissertations and Theses

Dark current random telegraph signal (DC-RTS) is a physical phenomenon that effects the performance of solid state image sensors. Identified by meta-stable stochastic switching between two or more dark current levels, DC-RTS is an emerging concern for device scientists and manufacturers as a limiting noise source. Observed and studied in both charge coupled devices (CCDs) and complementary metal-oxide-semiconductor (CMOS) image sensors, the metastable defects inside the device structure that give rise to this switching phenomenon are known to be derived from radiation damage. An examination of the relationship between high energy photon damage and these RTS defects is presented and …


Assessment Of Post-Wildfire Debris Flow Occurrence Using Classifier Tree, Priscilla Addison, Thomas Oommen, Qiuying Sha Jan 2019

Assessment Of Post-Wildfire Debris Flow Occurrence Using Classifier Tree, Priscilla Addison, Thomas Oommen, Qiuying Sha

Michigan Tech Publications

Besides the dangers of an actively burning wildfire, a plethora of other hazardous consequences can occur afterwards. Debris flows are among the most hazardous of these, being known to cause fatalities and extensive damage to infrastructure. Although debris flows are not exclusive to fire affected areas, a wildfire can increase a location’s susceptibility by stripping its protective covers like vegetation and introducing destabilizing factors such as ash filling soil pores to increase runoff potential. Due to the associated dangers, researchers are developing statistical models to isolate susceptible locations. Existing models predominantly employ the logistic regression algorithm; however, previous studies have …


Intelligent Diagnosis Of Aircraft Electrical Faults Based On Rmbp Neural Network, Lishan Jia, Zhe Liu, Sun Yi Jan 2019

Intelligent Diagnosis Of Aircraft Electrical Faults Based On Rmbp Neural Network, Lishan Jia, Zhe Liu, Sun Yi

Journal of System Simulation

Abstract: To the characteristics of multiple properties, hard to remove and high cost of time and manpower of aircraft electrical faults maintenance in aircraft maintenance of civil aviation, construction of intelligent aircraft electrical faults diagnosis system using RMBP neural network is proposed. RMBP algorithm is used to study sample data in the intelligent faults diagnosis system as it can overcome the faults of long time of convergence and easy to go into local minima of common BP algorithm, and is suitable for training large-scale neural network,. Experience data are collected, samples are made, samples training and experiment are carried out. …


Research Of Nonlinear Time Series Prediction Method For Motion Capture, Tianyu Huang, Yunying Guo Jan 2019

Research Of Nonlinear Time Series Prediction Method For Motion Capture, Tianyu Huang, Yunying Guo

Journal of System Simulation

Abstract: In this paper, we study the nonlinear time series prediction method for action capture. A prediction method based on the capture data is studied and implemented by analyzing human motion data to solve the data loss and correction problem caused by sensor failure. Based on this research purpose, the simulation experiment assumes that a sensor in the sequence of actions fails, then uses eight kinds of machine learning methods, and evaluates them with six indexes. The prediction results of different methods are compared and the predicted motions are visualized. Through the experiments, data prediction accuracy by random forest, decision …


Optimizing Control Of Total Heat Supply Based On Machine Learning, Li Qi, Xingqi Hu, Jianmin Zhao Jan 2019

Optimizing Control Of Total Heat Supply Based On Machine Learning, Li Qi, Xingqi Hu, Jianmin Zhao

Journal of System Simulation

Abstract: The central heating system has complex structure, along with the characteristics of hysteresis, strong coupling and nonlinear. Contraposing the problem that the process is difficult to be identified and controlled by the mechanism modeling, an optimal control method of heat source total heat production based on machine learning is proposed. The heat source model of central heating system is established by BP neural network and long short-term memory neural network. Under the premise of meeting the demand of heating quality, with the total energy consumption as the optimization objective, the optimal control sequence of water supply temperature and water …


The New Legal Landscape For Text Mining And Machine Learning, Matthew Sag Jan 2019

The New Legal Landscape For Text Mining And Machine Learning, Matthew Sag

Faculty Articles

Now that the dust has settled on the Authors Guild cases, this Article takes stock of the legal context for TDM research in the United States. This reappraisal begins in Part I with an assessment of exactly what the Authors Guild cases did and did not establish with respect to the fair use status of text mining. Those cases held unambiguously that reproducing copyrighted works as one step in the process of knowledge discovery through text data mining was transformative, and thus ultimately a fair use of those works. Part I explains why those rulings followed inexorably from copyright's most …


The Paradox Of Big Data, Gary N. Smith Jan 2019

The Paradox Of Big Data, Gary N. Smith

Pomona Economics

Data-mining is often used to discover patterns in Big Data. It is tempting believe that because an unearthed pattern is unusual it must be meaningful, but patterns are inevitable in Big Data and usually meaningless. The paradox of Big Data is that data mining is most seductive when there are a large number of variables, but a large number of variables exacerbates the perils of data mining.


Law's Halo And The Moral Machine, Bert I. Huang Jan 2019

Law's Halo And The Moral Machine, Bert I. Huang

Faculty Scholarship

How will we assess the morality of decisions made by artificial intelli­gence – and will our judgments be swayed by what the law says? Focusing on a moral dilemma in which a driverless car chooses to sacrifice its passenger to save more people, this study offers evidence that our moral intuitions can be influenced by the presence of the law.


Transfer Learning For Detecting Unknown Network Attacks, Juan Zhao, Sachin Shetty, Jan Wei Pan, Charles Kamhoua, Kevin Kwiat Jan 2019

Transfer Learning For Detecting Unknown Network Attacks, Juan Zhao, Sachin Shetty, Jan Wei Pan, Charles Kamhoua, Kevin Kwiat

VMASC Publications

Network attacks are serious concerns in today’s increasingly interconnected society. Recent studies have applied conventional machine learning to network attack detection by learning the patterns of the network behaviors and training a classification model. These models usually require large labeled datasets; however, the rapid pace and unpredictability of cyber attacks make this labeling impossible in real time. To address these problems, we proposed utilizing transfer learning for detecting new and unseen attacks by transferring the knowledge of the known attacks. In our previous work, we have proposed a transfer learning-enabled framework and approach, called HeTL, which can find the common …


Automatically Extracting Meaning From Legal Texts: Opportunities And Challenges, Kevin D. Ashley Jan 2019

Automatically Extracting Meaning From Legal Texts: Opportunities And Challenges, Kevin D. Ashley

Articles

This paper examines impressive new applications of legal text analytics in automated contract review, litigation support, conceptual legal information retrieval, and legal question answering against the backdrop of some pressing technological constraints. First, artificial intelligence (Al) programs cannot read legal texts like lawyers can. Using statistical methods, Al can only extract some semantic information from legal texts. For example, it can use the extracted meanings to improve retrieval and ranking, but it cannot yet extract legal rules in logical form from statutory texts. Second, machine learning (ML) may yield answers, but it cannot explain its answers to legal questions or …


Credit Risk Analysis In Peer To Peer Lending Data Set: Lending Club, Mohammad Mubasil Bokhari Jan 2019

Credit Risk Analysis In Peer To Peer Lending Data Set: Lending Club, Mohammad Mubasil Bokhari

Senior Projects Spring 2019

This project studies the classification variable ‘default’ in Peer to Peer lending dataset known as Lending Club. The project improved on existing work in terms of accuracy, F-1 measure, precision, recall, and root mean squared error. We explored balancing techniques such as oversampling the minority class, undersampling the majority class, and random forests with balanced bootstraps. We also analyzed and proposed new features that improve the Learner performance.


Hierarchical Cluster Analysis: A New Type Of Ranking Criteria Based On Arwu Ranking Data, Zhengshuo Li Jan 2019

Hierarchical Cluster Analysis: A New Type Of Ranking Criteria Based On Arwu Ranking Data, Zhengshuo Li

Dissertations

The advent of big data leads to many applications of Machine Learning techniques. University rankings is one of the applicable domains, which is currently playing a crucial role in the assessment of the universities' performance. Currently, the rankings are usually carried out by some authoritative ranking institutions by means of weighting techniques and the results are conveyed in numerical rankings. Three of the most famous university ranking institutions have been introduced from a technical perspective. However, these institutions have been proven to be subjective in relation to their data selection and weighting method.


The D&D Sorting Hat: Predicting Dungeons And Dragons Characters From Textual Backstories, Joseph C. Macinnes Jan 2019

The D&D Sorting Hat: Predicting Dungeons And Dragons Characters From Textual Backstories, Joseph C. Macinnes

Senior Independent Study Theses

Dungeons and Dragons is a tabletop roleplaying game which focuses heavily on character interaction and creating narratives. The current state of the game's character creation process often bogs down new players in decisions related to game mechanics, not a character's identity and personality. This independent study investigates the use of machine learning and natural language processing to make these decisions for a player based on their character's backstory - the textual biography or description of a character. The study presents a collection of existing characters and uses these examples to create a family of models capable of predicting a character's …


Learning To Map The Visual And Auditory World, Tawfiq Salem Jan 2019

Learning To Map The Visual And Auditory World, Tawfiq Salem

Theses and Dissertations--Computer Science

The appearance of the world varies dramatically not only from place to place but also from hour to hour and month to month. Billions of images that capture this complex relationship are uploaded to social-media websites every day and often are associated with precise time and location metadata. This rich source of data can be beneficial to improve our understanding of the globe. In this work, we propose a general framework that uses these publicly available images for constructing dense maps of different ground-level attributes from overhead imagery. In particular, we use well-defined probabilistic models and a weakly-supervised, multi-task training …


Work-In-Progress Reports Submitted To The Library Of Congress As Part Of Digital Libraries, Intelligent Data Analytics, And Augmented Description, Chulwoo Pack, Yi Liu, Leen-Kiat Soh, Elizabeth Lorang Jan 2019

Work-In-Progress Reports Submitted To The Library Of Congress As Part Of Digital Libraries, Intelligent Data Analytics, And Augmented Description, Chulwoo Pack, Yi Liu, Leen-Kiat Soh, Elizabeth Lorang

CSE Technical Reports

This document includes work-in-progress reports submitted to the Library of Congress as part of the Aida digital libraries research team's work on Digital Libraries, Intelligent Data Analytics, and Augmented Description: A Demonstration Project. These work-in-progress reports provide a snapshot glimpse, as well as underlying rationale and decision-making, at various points in the development of the project and its machine learning explorations. Reports cover explorations on historic newspapers, minimally-processed manuscript collections, materials digitized from physical originals and those digitized from microform surrogates, and investigate challenges related to image segmentation and document zoning, classification, document image quality analysis, metadata generation, and more.


The Use Of Deep Learning Distributed Representations In The Identification Of Abusive Text, Susan Mckeever, Hao Chen, Sarah Jane Delany Jan 2019

The Use Of Deep Learning Distributed Representations In The Identification Of Abusive Text, Susan Mckeever, Hao Chen, Sarah Jane Delany

Conference papers

The selection of optimal feature representations is a critical step in the use of machine learning in text classification. Traditional features (e.g. bag of words and n-grams) have dominated for decades, but in the past five years, the use of learned distributed representations has become increasingly common. In this paper, we summarise and present a categorisation of the stateof-the-art distributed representation techniques, including word and sentence embedding models. We carry out an empirical analysis of the performance of the various feature representations using the scenario of detecting abusive comments. We compare classification accuracies across a range of off-the-shelf embedding models …