Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine learning

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1591 - 1620 of 1686

Full-Text Articles in Physical Sciences and Mathematics

Combining Natural Language Processing And Statistical Text Mining: A Study Of Specialized Versus Common Languages, Jay Jarman Jan 2011

Combining Natural Language Processing And Statistical Text Mining: A Study Of Specialized Versus Common Languages, Jay Jarman

USF Tampa Graduate Theses and Dissertations

This dissertation focuses on developing and evaluating hybrid approaches for analyzing free-form text in the medical domain. This research draws on natural language processing (NLP) techniques that are used to parse and extract concepts based on a controlled vocabulary. Once important concepts are extracted, additional machine learning algorithms, such as association rule mining and decision tree induction, are used to discover classification rules for specific targets. This multi-stage pipeline approach is contrasted with traditional statistical text mining (STM) methods based on term counts and term-by-document frequencies. The aim is to create effective text analytic processes by adapting and combining individual …


Assessing Data Quality In A Sensor Network For Environmental Monitoring, Gesuri Ramirez Jan 2011

Assessing Data Quality In A Sensor Network For Environmental Monitoring, Gesuri Ramirez

Open Access Theses & Dissertations

Assessing the quality of sensor data in environmental monitoring applications is important, as erroneous readings produced by malfunctioning sensors, calibration drift, and problematic climatic conditions, such as icing or dust, are common.Traditional data quality checking and correction is a painstaking manual process, so the development of automatic systems for this task is highly desirable.

This study investigates machine learning methods to identify and clean incorrect data from a real-world environmental sensor network, the Jornada Experimental Range, located in Southern New Mexico. We evaluated several learning algorithms and data replacement schemes, and developed a method to identify the problematic sensor. The …


Prediction Of Brain Tumor Progression Using Multiple Histogram Matched Mri Scans, Debrup Banerjee, Loc Tran, Jiang Li, Yuzhong Shen, Frederic Mckenzie, Jihong Wang, Ronald M. Summers (Ed.), Bram Van Ginneken (Ed.) Jan 2011

Prediction Of Brain Tumor Progression Using Multiple Histogram Matched Mri Scans, Debrup Banerjee, Loc Tran, Jiang Li, Yuzhong Shen, Frederic Mckenzie, Jihong Wang, Ronald M. Summers (Ed.), Bram Van Ginneken (Ed.)

Electrical & Computer Engineering Faculty Publications

In a recent study [1], we investigated the feasibility of predicting brain tumor progression based on multiple MRI series and we tested our methods on seven patients' MRI images scanned at three consecutive visits A, B and C. Experimental results showed that it is feasible to predict tumor progression from visit A to visit C using a model trained by the information from visit A to visit B. However, the trained model failed when we tried to predict tumor progression from visit B to visit C, though it is clinically more important. Upon a closer look at the MRI scans …


Identification And Optimization Of Classifier Genes From Multi-Class Earthworm Microarray Dataset, Ying Li, Nan Wang, Chaoyang Zhang, Ping Gong Oct 2010

Identification And Optimization Of Classifier Genes From Multi-Class Earthworm Microarray Dataset, Ying Li, Nan Wang, Chaoyang Zhang, Ping Gong

Faculty Publications

Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT and RDX. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. We have developed an earthworm microarray containing 15,208 unique oligo probes and have used it to profile gene expression in 248 earthworms exposed to TNT, RDX or neither. We assembled a new machine learning pipeline consisting of several well-established feature filtering/selection and classification techniques to analyze the 248-array dataset in order to construct classifier models that …


Event-Driven Similarity And Classification Of Scanpaths, Thomas Grindinger Aug 2010

Event-Driven Similarity And Classification Of Scanpaths, Thomas Grindinger

All Dissertations

Eye tracking experiments often involve recording the pattern of deployment of visual attention over the stimulus as viewers perform a given task (e.g., visual search). It is useful in training applications, for example, to make available an expert's sequence of eye movements, or scanpath, to novices for their inspection and subsequent learning. It may also be potentially useful to be able to assess the conformance of the novice's scanpath to that of the expert. A computational tool is proposed that provides a framework for performing such classification, based on the use of a probabilistic machine learning algorithm. The approach was …


Malware Type Recognition And Cyber Situational Awareness, Thomas Dube, Richard A. Raines, Gilbert L. Peterson, Kenneth W. Bauer, Michael R. Grimaila, Steven K. Rogers Aug 2010

Malware Type Recognition And Cyber Situational Awareness, Thomas Dube, Richard A. Raines, Gilbert L. Peterson, Kenneth W. Bauer, Michael R. Grimaila, Steven K. Rogers

Faculty Publications

Current technologies for computer network and host defense do not provide suitable information to support strategic and tactical decision making processes. Although pattern-based malware detection is an active research area, the additional context of the type of malware can improve cyber situational awareness. This additional context is an indicator of threat capability thus allowing organizations to assess information losses and focus response actions appropriately. Malware Type Recognition (MaTR) is a research initiative extending detection technologies to provide the additional context of malware types using only static heuristics. Test results with MaTR demonstrate over a 99% accurate detection rate and 59% …


Practical Improvements In Applied Spectral Learning, Adam C. Drake Jun 2010

Practical Improvements In Applied Spectral Learning, Adam C. Drake

Theses and Dissertations

Spectral learning algorithms, which learn an unknown function by learning a spectral representation of the function, have been widely used in computational learning theory to prove many interesting learnability results. These algorithms have also been successfully used in real-world applications. However, previous work has left open many questions about how to best use these methods in real-world learning scenarios. This dissertation presents several significant advances in real-world spectral learning. It presents new algorithms for finding large spectral coefficients (a key sub-problem in spectral learning) that allow spectral learning methods to be applied to much larger problems and to a wider …


Transformation Learning: Modeling Transferable Transformations In High-Dimensional Data, Christopher R. Wilson May 2010

Transformation Learning: Modeling Transferable Transformations In High-Dimensional Data, Christopher R. Wilson

Theses and Dissertations

The goal of learning transfer is to apply knowledge gained from one problem to a separate related problem. Transformation learning is a proposed approach to computational learning transfer that focuses on modeling high-level transformations that are well suited for transfer. By using a high-level representation of transferable data, transformation learning facilitates both shallow transfer (intra-domain) and deep transfer (inter-domain) scenarios. Transformations can be discovered in data using manifold learning to order data instances according to the transformations they represent. For high-dimensional data representable with coordinate systems, such as images and sounds, data instances can be decomposed into small sub-instances based …


A Comparative Study On Text Categorization, Aditya Chainulu Karamcheti May 2010

A Comparative Study On Text Categorization, Aditya Chainulu Karamcheti

UNLV Theses, Dissertations, Professional Papers, and Capstones

Automated text categorization is a supervised learning task, defined as assigning category labels to new documents based on likelihood suggested by a training set of labeled documents. Two examples of methodology for text categorizations are Naive Bayes and K-Nearest Neighbor.

In this thesis, we implement two categorization engines based on Naive Bayes and K-Nearest Neighbor methodology. We then compare the effectiveness of these two engines by calculating standard precision and recall for a collection of documents. We will further report on time efficiency of these two engines.


Financial Time Series Forecasting With Machine Learning Techniques: A Survey, Bjoern Krollner, Bruce Vanstone, Gavin Finnie Apr 2010

Financial Time Series Forecasting With Machine Learning Techniques: A Survey, Bjoern Krollner, Bruce Vanstone, Gavin Finnie

Gavin Finnie

Stock index forecasting is vital for making informed investment decisions. This paper surveys recent literature in the domain of machine learning techniques and artificial intelligence used to forecast stock market movements. The publications are categorised according to the machine learning technique used, the forecasting timeframe, the input variables used, and the evaluation techniques employed. It is found that there is a consensus between researchers stressing the importance of stock index forecasting. Artificial Neural Networks (ANNs) are identified to be the dominant machine learning technique in this area. We conclude with possible future research directions.


Financial Time Series Forecasting With Machine Learning Techniques: A Survey, Bjoern Krollner, Bruce Vanstone, Gavin Finnie Apr 2010

Financial Time Series Forecasting With Machine Learning Techniques: A Survey, Bjoern Krollner, Bruce Vanstone, Gavin Finnie

Bjoern Krollner

Stock index forecasting is vital for making informed investment decisions. This paper surveys recent literature in the domain of machine learning techniques and artificial intelligence used to forecast stock market movements. The publications are categorised according to their research motivation, the machine learning technique used, the surveyed stock market, the forecasting time-frame, the input variables used, and the evaluation techniques employed. It is found that there is a consensus between researchers stressing the importance of stock index forecasting and that the results are promising. Artificial Neural Networks (ANNs) are identified to be the dominant machine learning technique in this area. …


Financial Time Series Forecasting With Machine Learning Techniques: A Survey, Bjoern Krollner, Bruce Vanstone, Gavin Finnie Apr 2010

Financial Time Series Forecasting With Machine Learning Techniques: A Survey, Bjoern Krollner, Bruce Vanstone, Gavin Finnie

Bruce Vanstone

Stock index forecasting is vital for making informed investment decisions. This paper surveys recent literature in the domain of machine learning techniques and artificial intelligence used to forecast stock market movements. The publications are categorised according to the machine learning technique used, the forecasting timeframe, the input variables used, and the evaluation techniques employed. It is found that there is a consensus between researchers stressing the importance of stock index forecasting. Artificial Neural Networks (ANNs) are identified to be the dominant machine learning technique in this area. We conclude with possible future research directions.


Segmentation And Fracture Detection In X-Ray Images For Traumatic Pelvic Injury, Rebecca Smith Apr 2010

Segmentation And Fracture Detection In X-Ray Images For Traumatic Pelvic Injury, Rebecca Smith

Theses and Dissertations

Due to the risk of complications such as hemorrhage, severe pelvic trauma is associated with a high mortality rate. Prompt medical treatment is therefore vital. However, the complexity of the injuries can make successful diagnosis and treatment challenging. By generating predictions and recommendations based on patient data, computer-aided decision support systems have the potential to assist physicians in improving outcomes. However, no current system considers features automatically extracted from medical images. This dissertation describes a system to extract diagnostic features from pelvic X-ray images that can be used as input to the prediction process; specifically, the presence of fracture and …


Developing Cyberspace Data Understanding Using Crisp-Dm For Host-Based Ids Feature Mining, Joseph R. Erskine, Gilbert L. Peterson, Barry E. Mullins, Michael R. Grimaila Apr 2010

Developing Cyberspace Data Understanding Using Crisp-Dm For Host-Based Ids Feature Mining, Joseph R. Erskine, Gilbert L. Peterson, Barry E. Mullins, Michael R. Grimaila

Faculty Publications

Current intrusion detection systems (IDS) generate a large number of specific alerts, but typically do not provide actionable information. Compounding this problem is the fact that many alerts are false positive alerts. This paper applies the Cross Industry Standard Process for Data Mining (CRISP-DM) to develop an understanding of a host environment under attack. Data is generated by launching scans and exploits at a machine outfitted with a set of host-based forensic data collectors. Through knowledge discovery, features are selected to project human understanding of the attack process into the IDS model. By discovering relationships between the data collected and …


Extensions Of Nearest Shrunken Centroid Method For Classification, Tomohiko Funai Mar 2010

Extensions Of Nearest Shrunken Centroid Method For Classification, Tomohiko Funai

Theses and Dissertations

Stylometry assumes that the essence of the individual style of an author can be captured using a number of quantitative criteria, such as the relative frequencies of noncontextual words (e.g., or, the, and, etc.). Several statistical methodologies have been developed for authorship analysis. Jockers et al. (2009) utilize Nearest Shrunken Centroid (NSC) classification, a promising classification methodology in DNA microarray analysis for authorship analysis of the Book of Mormon. Schaalje et al. (2010) develop an extended NSC classification to remedy the problem of a missing author. Dabney (2005) and Koppel et al. (2009) suggest other modifications of NSC. This paper …


A Bayesian Decision Theoretical Approach To Supervised Learning, Selective Sampling, And Empirical Function Optimization, James Lamond Carroll Mar 2010

A Bayesian Decision Theoretical Approach To Supervised Learning, Selective Sampling, And Empirical Function Optimization, James Lamond Carroll

Theses and Dissertations

Many have used the principles of statistics and Bayesian decision theory to model specific learning problems. It is less common to see models of the processes of learning in general. One exception is the model of the supervised learning process known as the "Extended Bayesian Formalism" or EBF. This model is descriptive, in that it can describe and compare learning algorithms. Thus the EBF is capable of modeling both effective and ineffective learning algorithms. We extend the EBF to model un-supervised learning, semi-supervised learning, supervised learning, and empirical function optimization. We also generalize the utility model of the EBF to …


Quantification Of Artistic Style Through Sparse Coding Analysis In The Drawings Of Pieter Bruegel The Elder, James M. Hughes, Daniel J. Graham, Daniel N. Rockmore Jan 2010

Quantification Of Artistic Style Through Sparse Coding Analysis In The Drawings Of Pieter Bruegel The Elder, James M. Hughes, Daniel J. Graham, Daniel N. Rockmore

Dartmouth Scholarship

Recently, statistical techniques have been used to assist art historians in the analysis of works of art. We present a novel technique for the quantification of artistic style that utilizes a sparse coding model. Originally developed in vision research, sparse coding models can be trained to represent any image space by maximizing the kurtosis of a representation of an arbitrarily selected image from that space. We apply such an analysis to successfully distinguish a set of authentic drawings by Pieter Bruegel the Elder from another set of well-known Bruegel imitations. We show that our approach, which involves a direct comparison …


Vowel Recognition From Continuous Articulatory Movements For Speaker-Dependent Applications, Jun Wang, Jordan R. Green, Ashok Samal, Tom D. Carrell Jan 2010

Vowel Recognition From Continuous Articulatory Movements For Speaker-Dependent Applications, Jun Wang, Jordan R. Green, Ashok Samal, Tom D. Carrell

Department of Special Education and Communication Disorders: Faculty Publications

A novel approach was developed to recognize vowels from continuous tongue and lip movements. Vowels were classified based on movement patterns (rather than on derived articulatory features, e.g., lip opening) using a machine learning approach. Recognition accuracy on a single-speaker dataset was 94.02% with a very short latency. Recognition accuracy was better for high vowels than for low vowels. This finding parallels previous empirical findings on tongue movements during vowels. The recognition algorithm was then used to drive an articulation-to-acoustics synthesizer. The synthesizer recognizes vowels from continuous input stream of tongue and lip movements and plays the corresponding sound samples …


A Boosting Framework For Visuality-Preserving Distance Metric Learning And Its Application To Medical Image Retrieval, Yang Liu, Rong Jin, Lily Mummert, Rahul Sukthankar, Adam Goode, Bin Zheng, Steven C. H. Hoi, Mahadev Satyanarayanan Jan 2010

A Boosting Framework For Visuality-Preserving Distance Metric Learning And Its Application To Medical Image Retrieval, Yang Liu, Rong Jin, Lily Mummert, Rahul Sukthankar, Adam Goode, Bin Zheng, Steven C. H. Hoi, Mahadev Satyanarayanan

Research Collection School Of Computing and Information Systems

Similarity measurement is a critical component in content-based image retrieval systems, and learning a good distance metric can significantly improve retrieval performance. However, despite extensive study, there are several major shortcomings with the existing approaches for distance metric learning that can significantly affect their application to medical image retrieval. In particular, "similarity" can mean very different things in image retrieval: resemblance in visual appearance (e.g., two images that look like one another) or similarity in semantic annotation (e.g., two images of tumors that look quite different yet are both malignant). Current approaches for distance metric learning typically address only one …


Prediction Of Brain Tumor Progression Using A Machine Learning Technique, Yuzhong Shen, Debrup Banerjee, Jiang Li, Adam Chandler, Yufei Shen, Frederic D. Mckenzie, Jihong Wang, Nico Karssemeijer (Ed.), Ronald M. Summers (Ed.) Jan 2010

Prediction Of Brain Tumor Progression Using A Machine Learning Technique, Yuzhong Shen, Debrup Banerjee, Jiang Li, Adam Chandler, Yufei Shen, Frederic D. Mckenzie, Jihong Wang, Nico Karssemeijer (Ed.), Ronald M. Summers (Ed.)

Electrical & Computer Engineering Faculty Publications

A machine learning technique is presented for assessing brain tumor progression by exploring six patients' complete MRI records scanned during their visits in the past two years. There are ten MRI series, including diffusion tensor image (DTI), for each visit. After registering all series to the corresponding DTI scan at the first visit, annotated normal and tumor regions were overlaid. Intensity value of each pixel inside the annotated regions were then extracted across all of the ten MRI series to compose a 10 dimensional vector. Each feature vector falls into one of three categories:normal, tumor, and normal but progressed to …


Predicting Flavonoid Ugt Regioselectivity With Graphical Residue Models And Machine Learning., Arthur Rhydon Jackson Dec 2009

Predicting Flavonoid Ugt Regioselectivity With Graphical Residue Models And Machine Learning., Arthur Rhydon Jackson

Electronic Theses and Dissertations

Machine learning is applied to a challenging and biologically significant protein classification problem: the prediction of flavonoid UGT acceptor regioselectivity from primary protein sequence. Novel indices characterizing graphical models of protein residues are introduced. The indices are compared with existing amino acid indices and found to cluster residues appropriately. A variety of models employing the indices are then investigated by examining their performance when analyzed using nearest neighbor, support vector machine, and Bayesian neural network classifiers. Improvements over nearest neighbor classifications relying on standard alignment similarity scores are reported.


Noninvasive Estimation Of Pulmonary Artery Pressure Using Heart Sound Analysis, Aaron W. Dennis Dec 2009

Noninvasive Estimation Of Pulmonary Artery Pressure Using Heart Sound Analysis, Aaron W. Dennis

Theses and Dissertations

Right-heart catheterization is the most accurate method for estimating pulmonary artery pressure (PAP). Because it is an invasive procedure it is expensive, exposes patients to the risk of infection, and is not suited for long-term monitoring situations. Medical researchers have shown that PAP influences the characteristics of heart sounds. This suggests that heart sound analysis is a potential noninvasive solution to the PAP estimation problem. This thesis describes the development of a prototype system, called PAPEr, which estimates PAP noninvasively using heart sound analysis. PAPEr uses patient data with machine learning algorithms to build models of how PAP affects heart …


Integrating Information Theory Measures And A Novel Rule-Set-Reduction Tech-Nique To Improve Fuzzy Decision Tree Induction Algorithms, Nael Mohammed Abu-Halaweh Dec 2009

Integrating Information Theory Measures And A Novel Rule-Set-Reduction Tech-Nique To Improve Fuzzy Decision Tree Induction Algorithms, Nael Mohammed Abu-Halaweh

Computer Science Dissertations

Machine learning approaches have been successfully applied to many classification and prediction problems. One of the most popular machine learning approaches is decision trees. A main advantage of decision trees is the clarity of the decision model they produce. The ID3 algorithm proposed by Quinlan forms the basis for many of the decision trees’ application. Trees produced by ID3 are sensitive to small perturbations in training data. To overcome this problem and to handle data uncertainties and spurious precision in data, fuzzy ID3 integrated fuzzy set theory and ideas from fuzzy logic with ID3. Several fuzzy decision trees algorithms and …


A Neural Network Approach To Border Gateway Protocol Peer Failure Detection And Prediction, Cory B. White Dec 2009

A Neural Network Approach To Border Gateway Protocol Peer Failure Detection And Prediction, Cory B. White

Master's Theses

The size and speed of computer networks continue to expand at a rapid pace, as do the corresponding errors, failures, and faults inherent within such extensive networks. This thesis introduces a novel approach to interface Border Gateway Protocol (BGP) computer networks with neural networks to learn the precursor connectivity patterns that emerge prior to a node failure. Details of the design and construction of a framework that utilizes neural networks to learn and monitor BGP connection states as a means of detecting and predicting BGP peer node failure are presented. Moreover, this framework is used to monitor a BGP network …


Dataset Threshold For The Performance Estimators In Supervised Machine Learning Experiments, Zanifa Omary, Fredrick Mtenzi Nov 2009

Dataset Threshold For The Performance Estimators In Supervised Machine Learning Experiments, Zanifa Omary, Fredrick Mtenzi

Conference papers

The establishment of dataset threshold is one among the first steps when comparing the performance of machine learning algorithms. It involves the use of different datasets with different sample sizes in relation to the number of attributes and the number of instances available in the dataset. Currently, there is no limit which has been set for those who are unfamiliar with machine learning experiments on the categorisation of these datasets, as either small or large, based on the two factors. In this paper we perform experiments in order to establish dataset threshold. The established dataset threshold will help unfamiliar supervised …


Automatic Red Tide Detection Using Modis Satellite Images, Wijian Cheng Jun 2009

Automatic Red Tide Detection Using Modis Satellite Images, Wijian Cheng

USF Tampa Graduate Theses and Dissertations

Red tides pose a significant economic and environmental threat in the Gulf of Mexico. Detecting red tide is important for understanding this phenomenon. In this thesis, machine learning approaches based on Random Forests, Support Vector Machines and K-Nearest Neighbors have been evaluated for red tide detection from MODIS satellite images. Detection results using machine learning algorithms were compared to ship collected ground truth red tide data. This work has three major contributions. First, machine learning approaches outperformed two of the latest thresholding red tide detection algorithms based on bio-optical characterization by more than 10% in terms of F measure and …


Intentional Learning Agent Architecture, Budhitama Subagdja, Liz Sonenberg, Iyad Rahwan Jun 2009

Intentional Learning Agent Architecture, Budhitama Subagdja, Liz Sonenberg, Iyad Rahwan

Research Collection School Of Computing and Information Systems

Dealing with changing situations is a major issue in building agent systems. When the time is limited, knowledge is unreliable, and resources are scarce, the issue becomes more challenging. The BDI (Belief-Desire-Intention) agent architecture provides a model for building agents that addresses that issue. The model can be used to build intentional agents that are able to reason based on explicit mental attitudes, while behaving reactively in changing circumstances. However, despite the reactive and deliberative features, a classical BDI agent is not capable of learning. Plans as recipes that guide the activities of the agent are assumed to be static. …


Predictive Decoding Of Neural Data, Yaroslav O. Halchenko May 2009

Predictive Decoding Of Neural Data, Yaroslav O. Halchenko

Dissertations

In the last five decades the number of techniques available for non-invasive functional imaging has increased dramatically. Researchers today can choose from a variety of imaging modalities that include EEG, MEG, PET, SPECT, MRI, and fMRI.

This doctoral dissertation offers a methodology for the reliable analysis of neural data at different levels of investigation. By using statistical learning algorithms the proposed approach allows single-trial analysis of various neural data by decoding them into variables of interest. Unbiased testing of the decoder on new samples of the data provides a generalization assessment of decoding performance reliability. Through consecutive analysis of the …


Concept Learning By Example Decomposition, Sameer Joshi Jan 2009

Concept Learning By Example Decomposition, Sameer Joshi

Electronic Theses and Dissertations

For efficient understanding and prediction in natural systems, even in artificially closed ones, we usually need to consider a number of factors that may combine in simple or complex ways. Additionally, many modern scientific disciplines face increasingly large datasets from which to extract knowledge (for example, genomics). Thus to learn all but the most trivial regularities in the natural world, we rely on different ways of simplifying the learning problem. One simplifying technique that is highly pervasive in nature is to break down a large learning problem into smaller ones; to learn the smaller, more manageable problems; and then to …


Machine Learned Melody Matching Using Strictly Relative Musical Abstractions, Michael Joseph Kolta Jan 2009

Machine Learned Melody Matching Using Strictly Relative Musical Abstractions, Michael Joseph Kolta

Legacy Theses & Dissertations (2009 - 2024)

We implement and evaluate a machine learning approach to improve systems for searching a database of music via melodic sample. We explore symbolic and aural input queries and test our prototypes with extensive user surveys. Our main contribution is to combine the following four elements. First is to create a unique musical abstraction that accounts for both pitch and rhythm in a relative manner. Second, our system allows for approximate matching of imperfect queries via the utilization of the Smith-Waterman algorithm that was originally designed for approximate matching of molecular subsequences, such as DNA samples. Third is to design our …