Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine learning

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1561 - 1590 of 1686

Full-Text Articles in Physical Sciences and Mathematics

Anticipating The Friction Coefficient Of Friction Materials Used In Automobiles By Means Of Machine Learning Without Using A Test Instrument, Mustafa Ti̇mur, Fati̇h Aydin Jan 2013

Anticipating The Friction Coefficient Of Friction Materials Used In Automobiles By Means Of Machine Learning Without Using A Test Instrument, Mustafa Ti̇mur, Fati̇h Aydin

Turkish Journal of Electrical Engineering and Computer Sciences

The most important factor for designs in which friction materials are used is the coefficient of friction. The coefficient of friction has been determined taking such variants as velocity, temperature, and pressure into account, which arise from various factors in friction materials, and by analyzing the effects of these variants on friction materials. Many test instruments have been produced in order to determine the coefficient of friction. In this article, a study about the use of machine learning algorithms instead of test instruments in order to determine the coefficient of friction is presented. Isotonic regression was selected as the machine …


Improved Cardiovascular Risk Prediction Using Nonparametric Regression And Electronic Health Record Data, Edward Kennedy, Wyndy Wiitala, Rodney Hayward, Jeremy Sussman Dec 2012

Improved Cardiovascular Risk Prediction Using Nonparametric Regression And Electronic Health Record Data, Edward Kennedy, Wyndy Wiitala, Rodney Hayward, Jeremy Sussman

Edward H. Kennedy

Use of the electronic health record (EHR) is expected to increase rapidly in the near future, yet little research exists on whether analyzing internal EHR data using flexible, adaptive statistical methods could improve clinical risk prediction. Extensive implementation of EHR in the Veterans Health Administration provides an opportunity for exploration. Our objective was to compare the performance of various approaches for predicting risk of cerebrovascular and cardiovascular (CCV) death, using traditional risk predictors versus more comprehensive EHR data. Regression methods outperformed the Framingham risk score, even with the same predictors (AUC increased from 71% to 73% and calibration also improved). …


Computationally Efficient Confidence Intervals For Cross-Validated Area Under The Roc Curve Estimates, Erin Ledell, Maya L. Petersen, Mark J. Van Der Laan Dec 2012

Computationally Efficient Confidence Intervals For Cross-Validated Area Under The Roc Curve Estimates, Erin Ledell, Maya L. Petersen, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In binary classification problems, the area under the ROC curve (AUC), is an effective means of measuring the performance of your model. Most often, cross-validation is also used, in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we must obtain an estimate for its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, calculating the cross-validated AUC on even a relatively small data set can still require a …


Identification Of Tcp Protocols, Juan Shao Dec 2012

Identification Of Tcp Protocols, Juan Shao

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Recently, many new TCP algorithms, such as BIC, CUBIC, and CTCP, have been deployed in the Internet. Investigating the deployment statistics of these TCP algorithms is meaningful to study the performance and stability of the Internet. Currently, there is a tool named Congestion Avoidance Algorithm Identification (CAAI) for identifying the TCP algorithm of a web server and then for investigating the TCP deployment statistics. However, CAAI using a simple k-NN algorithm can not achieve a high identification accuracy. In this thesis, we comprehensively study the identification accuracy of five popular machine learning models. We find that the random forest model …


Bayesian Test Analytics For Document Collections, Daniel David Walker Nov 2012

Bayesian Test Analytics For Document Collections, Daniel David Walker

Theses and Dissertations

Modern document collections are too large to annotate and curate manually. As increasingly large amounts of data become available, historians, librarians and other scholars increasingly need to rely on automated systems to efficiently and accurately analyze the contents of their collections and to find new and interesting patterns therein. Modern techniques in Bayesian text analytics are becoming wide spread and have the potential to revolutionize the way that research is conducted. Much work has been done in the document modeling community towards this end,though most of it is focused on modern, relatively clean text data. We present research for improved …


Geocam: A Geovisual Analytics Workspace To Contextualize And Interpret Statements About Movement, Anuj Jaiswal, Scott Pezanowski, Prasenjit Mitra, Xiao Zhang, Sen Xu, Ian Turton, Alexander Klippel, Alan M. Maceachren Oct 2012

Geocam: A Geovisual Analytics Workspace To Contextualize And Interpret Statements About Movement, Anuj Jaiswal, Scott Pezanowski, Prasenjit Mitra, Xiao Zhang, Sen Xu, Ian Turton, Alexander Klippel, Alan M. Maceachren

Journal of Spatial Information Science

This article focuses on integrating computational and visual methods in a system that supports analysts to identify extract map and relate linguistic accounts of movement. We address two objectives: (1) build the conceptual theoretical and empirical framework needed to represent and interpret human-generated directions; and (2) design and implement a geovisual analytics workspace for direction document analysis. We have built a set of geo-enabled computational methods to identify documents containing movement statements and a visual analytics environment that uses natural language processing methods iteratively with geographic database support to extract interpret and map geographic movement references in context. Additionally analysts …


Linguistic Spatial Classifications Of Event Domains In Narratives Of Crime, Blake Stephen Howald Oct 2012

Linguistic Spatial Classifications Of Event Domains In Narratives Of Crime, Blake Stephen Howald

Journal of Spatial Information Science

Structurally, formal definitions of the linguistic narrative minimally require two temporally linked past-time events. The role of space in this definition, based on spatial language indicating where events occur, is considered optional and non-structural. However, based on narratives with a high frequency of spatial language, recent research has questioned this perspective, suggesting that space is more critical than may be readily apparent. Through an analysis of spatially rich serial criminal narratives, it will be demonstrated that spatial information qualitatively varies relative to narrative events. In particular, statistical classifiers in a supervised machine learning task achieve a 90% accuracy in predicting …


A New Web Search Engine With Learning Hierarchy, Da Kuang Aug 2012

A New Web Search Engine With Learning Hierarchy, Da Kuang

Electronic Thesis and Dissertation Repository

Most of the existing web search engines (such as Google and Bing) are in the form of keyword-based search. Typically, after the user issues a query with the keywords, the search engine will return a flat list of results. When the query issued by the user is related to a topic, only the keyword matching may not accurately retrieve the whole set of webpages in that topic. On the other hand, there exists another type of search system, particularly in e-Commerce web- sites, where the user can search in the categories of different faceted hierarchies (e.g., product types and price …


A Confidence-Prioritization Approach To Data Processing In Noisy Data Sets And Resulting Estimation Models For Predicting Streamflow Diel Signals In The Pacific Northwest, Nathaniel Lee Gustafson Aug 2012

A Confidence-Prioritization Approach To Data Processing In Noisy Data Sets And Resulting Estimation Models For Predicting Streamflow Diel Signals In The Pacific Northwest, Nathaniel Lee Gustafson

Theses and Dissertations

Streams in small watersheds are often known to exhibit diel fluctuations, in which streamflow oscillates on a 24-hour cycle. Streamflow diel fluctuations, which we investigate in this study, are an informative indicator of environmental processes. However, in Environmental Data sets, as well as many others, there is a range of noise associated with individual data points. Some points are extracted under relatively clear and defined conditions, while others may include a range of known or unknown confounding factors, which may decrease those points' validity. These points may or may not remain useful for training, depending on how much uncertainty they …


On The K-Mer Frequency Spectra Of Organism Genome And Proteome Sequences With A Preliminary Machine Learning Assessment Of Prime Predictability, Nathan O. Schmidt Aug 2012

On The K-Mer Frequency Spectra Of Organism Genome And Proteome Sequences With A Preliminary Machine Learning Assessment Of Prime Predictability, Nathan O. Schmidt

Boise State University Theses and Dissertations

A regular expression and region-specific filtering system for biological records at the National Center for Biotechnology database is integrated into an object oriented sequence counting application, and a statistical software suite is designed and deployed to interpret the resulting k-mer frequencies|with a priority focus on nullomers. The proteome k-mer frequency spectra of ten model organisms and the genome k-mer frequency spectra of two bacteria and virus strains for the coding and non-coding regions are comparatively scrutinized. We observe that the naturally-evolved (NCBI/organism) and the artificially-biased (randomly-generated) sequences exhibit a clear deviation from the artificially-unbiased (randomly-generated) histogram distributions. …


On The Automatic Recognition Of Human Activities Using Heterogeneous Wearable Sensors, Oscar David Lara Yejas Jun 2012

On The Automatic Recognition Of Human Activities Using Heterogeneous Wearable Sensors, Oscar David Lara Yejas

USF Tampa Graduate Theses and Dissertations

Delivering accurate and opportune information on people's activities and behaviors has become one of the most important tasks within pervasive computing. Its wide spectrum of potential applications in medical, entertainment, and tactical scenarios, motivates further

research and development of new strategies to improve accuracy, pervasiveness, and eciency.

This dissertation addresses the recognition of human activities (HAR) with wearable sensors in three main regards: In the rst place, physiological signals have been incorporated as a new source of information to improve the recognition accuracy achieved by conventional approaches, which rely on accelerometer signals solely. A new HAR system, Centinela, was born …


Bayesian And Related Methods: Techniques Based On Bayes' Theorem, Mehmet Vurkaç May 2012

Bayesian And Related Methods: Techniques Based On Bayes' Theorem, Mehmet Vurkaç

Systems Science Friday Noon Seminar Series

Bayes' theorem is a simple algebraic consequence of conditional probability. Yet, its consequences are critical to philosophy, society, and technology. Starting from its simple derivation, we will show how its interpretation in terms of base rates (priors) and class-conditional likelihoods illuminates everyday problems in medicine and law, and provides signal processing, communications, machine learning, model selection, and other applications of statistics with powerful classification and estimation tools. Next, we will briefly examine some of the ways in which this theorem can be adopted to include multiple attributes, contexts, hypotheses, and levels of risk. Methods derived from or related to Bayes’ …


The Glass Is Half-Full: Overestimating The Quality Of A Novel Environment Is Advantageous, Oded Berger-Tal, Tal Avgar Apr 2012

The Glass Is Half-Full: Overestimating The Quality Of A Novel Environment Is Advantageous, Oded Berger-Tal, Tal Avgar

Wildland Resources Faculty Publications

According to optimal foraging theory, foraging decisions are based on the forager's current estimate of the quality of its environment. However, in a novel environment, a forager does not possess information regarding the quality of the environment, and may make a decision based on a biased estimate. We show, using a simple simulation model, that when facing uncertainty in heterogeneous environments it is better to overestimate the quality of the environment (to be an “optimist”) than underestimate it, as optimistic animals learn the true value of the environment faster due to higher exploration rate. Moreover, we show that when the …


Ensemble Methods For Malware Diagnosis Based On One-Class Svms, Xing An Jan 2012

Ensemble Methods For Malware Diagnosis Based On One-Class Svms, Xing An

LSU Master's Theses

Malware diagnosis is one of today’s most popular topics of machine learning. Instead of simply applying all the classical classification algorithms to the problem and claim the highest accuracy as the result of prediction, which is the typical approach adopted by studies of this kind, we stick to the Support Vector Machine (SVM) classifier and based on our observation of some principles of learning, characteristics of statistics and the behavior of SVM, we employed a number of the potential preprocessing or ensemble methods including rescaling, bagging and clustering that may enhance the performance to the classical algorithm. We implemented the …


A Study Of Localization And Latency Reduction For Action Recognition, Syed Zain Masood Jan 2012

A Study Of Localization And Latency Reduction For Action Recognition, Syed Zain Masood

Electronic Theses and Dissertations

The success of recognizing periodic actions in single-person-simple-background datasets, such as Weizmann and KTH, has created a need for more complex datasets to push the performance of action recognition systems. In this work, we create a new synthetic action dataset and use it to highlight weaknesses in current recognition systems. Experiments show that introducing background complexity to action video sequences causes a significant degradation in recognition performance. Moreover, this degradation cannot be fixed by fine-tuning system parameters or by selecting better feature points. Instead, we show that the problem lies in the spatio-temporal cuboid volume extracted from the interest point …


Software Process Evaluation: A Machine Learning Approach, Ning Chen, Steven C. H. Hoi, Xiaokui Xiao Nov 2011

Software Process Evaluation: A Machine Learning Approach, Ning Chen, Steven C. H. Hoi, Xiaokui Xiao

Research Collection School Of Computing and Information Systems

Software process evaluation is essential to improve software development and the quality of software products in an organization. Conventional approaches based on manual qualitative evaluations (e.g., artifacts inspection) are deficient in the sense that (i) they are time-consuming, (ii) they suffer from the authority constraints, and (iii) they are often subjective. To overcome these limitations, this paper presents a novel semi-automated approach to software process evaluation using machine learning techniques. In particular, we formulate the problem as a sequence classification task, which is solved by applying machine learning algorithms. Based on the framework, we define a new quantitative indicator to …


Active Multiple Kernel Learning For Interactive 3d Object Retrieval Systems, Steven C. H. Hoi, Rong Jin Oct 2011

Active Multiple Kernel Learning For Interactive 3d Object Retrieval Systems, Steven C. H. Hoi, Rong Jin

Research Collection School Of Computing and Information Systems

An effective relevance feedback solution plays a key role in interactive intelligent 3D object retrieval systems. In this work, we investigate the relevance feedback problem for interactive intelligent 3D object retrieval, with the focus on studying effective machine learning algorithms for improving the user's interaction in the retrieval task. One of the key challenges is to learn appropriate kernel similarity measure between 3D objects through the relevance feedback interaction with users. We address this challenge by presenting a novel framework of Active multiple kernel learning (AMKL), which exploits multiple kernel learning techniques for relevance feedback in interactive 3D object retrieval. …


Active Multiple Kernel Learning For Interactive 3d Object Retrieval Systems, Steven C. H. Hoi, Rong Jin Oct 2011

Active Multiple Kernel Learning For Interactive 3d Object Retrieval Systems, Steven C. H. Hoi, Rong Jin

Research Collection School Of Computing and Information Systems

An effective relevance feedback solution plays a key role in interactive intelligent 3D object retrieval systems. In this work, we investigate the relevance feedback problem for interactive intelligent 3D object retrieval, with the focus on studying effective machine learning algorithms for improving the user's interaction in the retrieval task. One of the key challenges is to learn appropriate kernel similarity measure between 3D objects through the relevance feedback interaction with users. We address this challenge by presenting a novel framework of Active multiple kernel learning (AMKL), which exploits multiple kernel learning techniques for relevance feedback in interactive 3D object retrieval. …


Implementation Of A New Sigmoid Function In Backpropagation Neural Networks., Jeffrey A. Bonnell Aug 2011

Implementation Of A New Sigmoid Function In Backpropagation Neural Networks., Jeffrey A. Bonnell

Electronic Theses and Dissertations

This thesis presents the use of a new sigmoid activation function in backpropagation artificial neural networks (ANNs). ANNs using conventional activation functions may generalize poorly when trained on a set which includes quirky, mislabeled, unbalanced, or otherwise complicated data. This new activation function is an attempt to improve generalization and reduce overtraining on mislabeled or irrelevant data by restricting training when inputs to the hidden neurons are sufficiently small. This activation function includes a flattened, low-training region which grows or shrinks during back-propagation to ensure a desired proportion of inputs inside the low-training region. With a desired low-training proportion of …


Development Of Advanced Algorithms To Detect, Characterize And Forecast Solar Activities, Yuan Yuan May 2011

Development Of Advanced Algorithms To Detect, Characterize And Forecast Solar Activities, Yuan Yuan

Dissertations

Study of the solar activity is an important part of space weather research. It is facing serious challenges because of large data volume, which requires application of state-of-the-art machine learning and computer vision techniques. This dissertation targets at two essential aspects in space weather research: automatic feature detection and forecasting of eruptive events.

Feature detection includes solar filament detection and solar fibril tracing. A solar filament consists of a mass of gas suspended over the chromosphere by magnetic fields and seen as a dark, ribbon-shaped feature on the bright solar disk in Hα (Hydrogen-alpha) full-disk solar images. In this dissertation, …


Hardware Acceleration Of Inference Computing: The Numenta Htm Algorithm, Dan Hammerstrom May 2011

Hardware Acceleration Of Inference Computing: The Numenta Htm Algorithm, Dan Hammerstrom

Systems Science Friday Noon Seminar Series

In this presentation I will describe the latest version of the Numenta HTM Cortical Learning Algorithm and why it is interesting for doing research into radical new computer architectures. Then I will discuss the hardware acceleration research we are doing, and briefly look at some preliminary applications development.


Narrative Analysis And Computational Model To Predict Interestingness Of Narratives, Laxman Thapa May 2011

Narrative Analysis And Computational Model To Predict Interestingness Of Narratives, Laxman Thapa

Theses and Dissertations - UTB/UTPA

In this research, I present results demonstrating the classification of the specially generated narratives by a machine agent by listening to human subject describing the same sets of the events. These classifications are based on human ratings of interestingness for many different recountings of the same stories. The classification is performed on various features selected after analyzing the different possible feature that affect on the interestingness of narratives. The features were extracted from the surface text as well as from annotations of how each narration relates to the content of the known story. I present the annotation process and resulting …


Empirical Methods For Predicting Student Retention- A Summary From The Literature, Matt Bogard May 2011

Empirical Methods For Predicting Student Retention- A Summary From The Literature, Matt Bogard

Economics Faculty Publications

The vast majority of the literature related to the empirical estimation of retention models includes a discussion of the theoretical retention framework established by Bean, Braxton, Tinto, Pascarella, Terenzini and others (see Bean, 1980; Bean, 2000; Braxton, 2000; Braxton et al, 2004; Chapman and Pascarella, 1983; Pascarell and Ternzini, 1978; St. John and Cabrera, 2000; Tinto, 1975) This body of research provides a starting point for the consideration of which explanatory variables to include in any model specification, as well as identifying possible data sources. The literature separates itself into two major camps including research related to the hypothesis testing …


Empirical Methods-A Review: With An Introduction To Data Mining And Machine Learning, Matt Bogard May 2011

Empirical Methods-A Review: With An Introduction To Data Mining And Machine Learning, Matt Bogard

Economics Faculty Publications

This presentation was part of a staff workshop focused on empirical methods and applied research. This includes a basic overview of regression with matrix algebra, maximum likelihood, inference, and model assumptions. Distinctions are made between paradigms related to classical statistical methods and algorithmic approaches. The presentation concludes with a brief discussion of generalization error, data partitioning, decision trees, and neural networks.


Learning Local Features Using Boosted Trees For Face Recognition, Rajkiran Gottumukkal Apr 2011

Learning Local Features Using Boosted Trees For Face Recognition, Rajkiran Gottumukkal

Electrical & Computer Engineering Theses & Dissertations

Face recognition is fundamental to a number of significant applications that include but not limited to video surveillance and content based image retrieval. Some of the challenges which make this task difficult are variations in faces due to changes in pose, illumination and deformation. This dissertation proposes a face recognition system to overcome these difficulties. We propose methods for different stages of face recognition which will make the system more robust to these variations. We propose a novel method to perform skin segmentation which is fast and able to perform well under different illumination conditions. We also propose a method …


On The Effect Of Criticality And Topology On Learning In Random Boolean Networks, Alireza Goudarzi Jan 2011

On The Effect Of Criticality And Topology On Learning In Random Boolean Networks, Alireza Goudarzi

Systems Science Friday Noon Seminar Series

Random Boolean networks (RBN) are discrete dynamical systems composed of N automata with a binary state, each of which interacts with other automata in the network. RBNs were originally introduced as simplified models of gene regulation. In this presentation, I will present recent work done conjointly with Natali Gulbahce (UCSF), Thimo Rohlf (MPI, CNRS), and Christof Teuscher (PSU). We extend the study of learning in feedforward Boolean networks to random Boolean networks (RBNs) and systematically explore the relationship between the learning capability, the network topology, the system size N, the training sample T, and the complexity of the computational task. …


Algorithms For Training Large-Scale Linear Programming Support Vector Regression And Classification, Pablo Rivas Perea Jan 2011

Algorithms For Training Large-Scale Linear Programming Support Vector Regression And Classification, Pablo Rivas Perea

Open Access Theses & Dissertations

The main contribution of this dissertation is the development of a method to train a Support Vector Regression (SVR) model for the large-scale case where the number of training samples supersedes the computational resources. The proposed scheme consists of posing the SVR problem entirely as a Linear Programming (LP) problem and on the development of a sequential optimization method based on variables decomposition, constraints decomposition, and the use of primal-dual interior point methods. Experimental results demonstrate that the proposed approach has comparable performance with other SV-based classifiers. Particularly, experiments demonstrate that as the problem size increases, the sparser the solution …


Data Mining Based Learning Algorithms For Semi-Supervised Object Identification And Tracking, Michael P. Dessauer Jan 2011

Data Mining Based Learning Algorithms For Semi-Supervised Object Identification And Tracking, Michael P. Dessauer

Doctoral Dissertations

Sensor exploitation (SE) is the crucial step in surveillance applications such as airport security and search and rescue operations. It allows localization and identification of movement in urban settings and can significantly boost knowledge gathering, interpretation and action. Data mining techniques offer the promise of precise and accurate knowledge acquisition techniques in high-dimensional data domains (and diminishing the “curse of dimensionality” prevalent in such datasets), coupled by algorithmic design in feature extraction, discriminative ranking, feature fusion and supervised learning (classification). Consequently, data mining techniques and algorithms can be used to refine and process captured data and to detect, recognize, classify, …


Effective Task Transfer Through Indirect Encoding, Phillip Verbancsics Jan 2011

Effective Task Transfer Through Indirect Encoding, Phillip Verbancsics

Electronic Theses and Dissertations

An important goal for machine learning is to transfer knowledge between tasks. For example, learning to play RoboCup Keepaway should contribute to learning the full game of RoboCup soccer. Often approaches to task transfer focus on transforming the original representation to fit the new task. Such representational transformations are necessary because the target task often requires new state information that was not included in the original representation. In RoboCup Keepaway, changing from the 3 vs. 2 variant of the task to 4 vs. 3 adds state information for each of the new players. In contrast, this dissertation explores the idea …


An Exploration Of Multi-Agent Learning Within The Game Of Sheephead, Brady Brau Jan 2011

An Exploration Of Multi-Agent Learning Within The Game Of Sheephead, Brady Brau

All Graduate Theses, Dissertations, and Other Capstone Projects

In this paper, we examine a machine learning technique presented by Ishii et al. used to allow for learning in a multi-agent environment and apply an adaptation of this learning technique to the card game Sheephead. We then evaluate the effectiveness of our adaptation by running simulations against rule-based opponents. Multi-agent learning presents several layers of complexity on top of a single-agent learning in a stationary environment. This added complexity and increased state space is just beginning to be addressed by researchers. We utilize techniques used by Ishii et al. to facilitate this multi-agent learning. We model the environment of …