Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 601 - 630 of 13246

Full-Text Articles in Physical Sciences and Mathematics

Developing A Risk Assessment Instrument For Immigration Cases Under Federal Supervision, Mayra Eydie Pacheco May 2023

Developing A Risk Assessment Instrument For Immigration Cases Under Federal Supervision, Mayra Eydie Pacheco

Open Access Theses & Dissertations

No abstract provided.


Outlier Detection In Multivariate And High-Dimensional Datasets, Yuanhong Wu May 2023

Outlier Detection In Multivariate And High-Dimensional Datasets, Yuanhong Wu

Open Access Theses & Dissertations

Accurate detection of outliers is crucial in the field of statistical analysis. Using classical statisticalmodels without considering the presence of outliers in the data can lead to misleading outcomes. There exist a myriad of procedures to detect outliers in statistics. We concentrate on the statistical techniques that can robustly identify outliers in data sets. To this end, we pursue two aims. First, we give an extensive overview of robust statistical methods which are still popular in recent years for outlier detection. We provide the definitions, algorithms and also discuss some important properties of these methods. Second, two real examples are …


Spatially Adaptive Estimation Of Spectrum, Yi Xie May 2023

Spatially Adaptive Estimation Of Spectrum, Yi Xie

Open Access Theses & Dissertations

A time series may be analyzed either in the time or in the frequency domain. When working in the frequency domain, the main objective is to estimate the underlying spectrum. Various approaches have been proposed to this end, but most are based on smoothing the periodogram using a single smoothing parameter across all Fourier frequencies. Such a global smoothing parameter may result in a biased estimate. To improve the estimation, in this paper, we smooth the log periodogram by placing a dynamic shrinkage prior, such that varying degrees of smoothing may be applied to different regions of the Fourier frequencies, …


Performance Classification Of Ornstein-Uhlenbeck-Type Models Using Fractal Analysis Of Time Series Data., Peter Kwadwo Asante May 2023

Performance Classification Of Ornstein-Uhlenbeck-Type Models Using Fractal Analysis Of Time Series Data., Peter Kwadwo Asante

Open Access Theses & Dissertations

This dissertation aims to assess the performance of Ornstein-Uhlenbeck-type models by examining the fractal characteristics of time series data from various sources, including finance, volcanic and earthquake events, US COVID-19 reported cases and deaths, and two simulated time series with differing properties. The time series data is categorized as either a Gaussian or a Lévy process (Lévy walk or Lévy flight) by using three scaling methods: Rescaled range analysis, Detrended fluctuation analysis, and Diffusion entropy analysis. The outcomes of this analysis indicate that the financial indices are classified as Lévy walks, while the volcanic, earthquake, and COVID-19 data are classified …


Machine Learning-Based Data And Model Driven Bayesian Uncertanity Quantification Of Inverse Problems For Suspended Non-Structural System, Zhiyuan Qin May 2023

Machine Learning-Based Data And Model Driven Bayesian Uncertanity Quantification Of Inverse Problems For Suspended Non-Structural System, Zhiyuan Qin

All Dissertations

Inverse problems involve extracting the internal structure of a physical system from noisy measurement data. In many fields, the Bayesian inference is used to address the ill-conditioned nature of the inverse problem by incorporating prior information through an initial distribution. In the nonparametric Bayesian framework, surrogate models such as Gaussian Processes or Deep Neural Networks are used as flexible and effective probabilistic modeling tools to overcome the high-dimensional curse and reduce computational costs. In practical systems and computer models, uncertainties can be addressed through parameter calibration, sensitivity analysis, and uncertainty quantification, leading to improved reliability and robustness of decision and …


Generalized Additive Model Using Marginal Integration Estimation Techniques With Interactions, Tahiru Mahama May 2023

Generalized Additive Model Using Marginal Integration Estimation Techniques With Interactions, Tahiru Mahama

Open Access Theses & Dissertations

Marginal Integration (MI) is a statistical method that is extensively employed to estimatecomponent functions of the nonparametric additive models. The shortcoming of the purely additive model is that interaction between predictor variables is often ignored, and it may produce poor performance in some real applications. As a result, this research considers the second-order interactions in the regression models. The primary objective is to use marginal integration techniques to estimate the nonparametric additive functions. We compare this model with other models/estimators such as the Generalized Additive Model (GAM), Generalized Additive Model with Selection (GAMSEL), Robust Marginal Integration (RMI), Ordinary Least Squares …


Examining The Effect Of Word Embeddings And Preprocessing Methods On Fake News Detection, Jessica Hauschild May 2023

Examining The Effect Of Word Embeddings And Preprocessing Methods On Fake News Detection, Jessica Hauschild

Department of Statistics: Dissertations, Theses, and Student Work

The words people choose to use hold a lot of power, whether that be in spreading truth or deception. As listeners and readers, we do our best to understand how words are being used. There are many current methods in computer science literature attempting to embed words into numerical information for statistical analyses. Some of these embedding methods, such as Bag of Words, treat words as independent, while others, such as Word2Vec, attempt to gain information about the context of words. It is of interest to compare how well these various methods of translating text into numerical data work specifically …


Gentrification And Crime In The Twin Cities: Insights And Challenges Through A Statistical Lens, Erin G. Franke May 2023

Gentrification And Crime In The Twin Cities: Insights And Challenges Through A Statistical Lens, Erin G. Franke

Mathematics, Statistics, and Computer Science Honors Projects

Gentrification is a complex process of urban redevelopment that typically involves an in-migration of educated people to neighborhoods experiencing a period of disinvestment. While gentrification is widely regarded for its potential to displace long-time businesses and residents of the neighborhood, its impact on crime is highly controversial. There is not a consensus on the relationship between gentrification and crime across criminological theory and past statistical studies have also shown contradictory results. Measuring gentrification on the tract level with census data, we seek to understand gentrification’s relationship with violent crime and theft in the Twin Cities. Using a Poisson model with …


A Brascamp-Lieb–Rary Of Examples, Anina Peersen May 2023

A Brascamp-Lieb–Rary Of Examples, Anina Peersen

Mathematics, Statistics, and Computer Science Honors Projects

This paper focuses on the Brascamp-Lieb inequality and its applications in analysis, fractal geometry, computer science, and more. It provides a beginner-level introduction to the Brascamp-Lieb inequality alongside re- lated inequalities in analysis and explores specific cases of extremizable, simple, and equivalent Brascamp-Lieb data. Connections to computer sci- ence and geometric measure theory are introduced and explained. Finally, the Brascamp-Lieb constant is calculated for a chosen family of linear maps.


Do Firms Respond To Peer Disclosures? Evidence From Disclosures Of Clinical Trial Results, Vedran Capkun, Yun Lou, Clemens A. Otto, Yin Wang May 2023

Do Firms Respond To Peer Disclosures? Evidence From Disclosures Of Clinical Trial Results, Vedran Capkun, Yun Lou, Clemens A. Otto, Yin Wang

Research Collection School Of Accountancy

Using data on the registration of clinical trials and the disclosure of trial results, we examine how firms respond to peer disclosures. We find that firms are less likely to disclose their own trial results if the results of a larger number of closely related trials are disclosed by their peers. This relation is stronger if the firms face higher competition (as measured by the number of competing trials). It is weaker if the firms are further along in their research than the peers (as measured by the trials’ phase) and if the peers’ disclosures convey more negative news (as …


The Last Drought Frontier: Building A Drought Index For The State Of Alaska, Olivia Campbell May 2023

The Last Drought Frontier: Building A Drought Index For The State Of Alaska, Olivia Campbell

School of Natural Resources: Dissertations, Theses, and Student Research

Drought is characterized by periods of below average precipitation. There are five major types of drought recognized in the literature: meteorological, hydrological, agricultural, socioeconomic, and ecological. A relatively new concept in the drought literature is “snow drought.” A key part of the definition of drought is that it is not always accompanied by extreme heat. This means drought can occur even in cold climates, cold seasons, and higher latitudes and altitudes, like Alaska. Drought is a natural part of climate variability, but Alaska’s climate is changing faster than any other state in the United States. Alaska is no stranger to …


Parameter Optimization For Excitable Cell Models, Amrit Parmar May 2023

Parameter Optimization For Excitable Cell Models, Amrit Parmar

Theses, Dissertations and Culminating Projects

The electrophysiology of nodose ganglia neurons is of great interest in the analysis of cell membrane currents and action potential behavior. This behavior was initially outlined in the Hodgkin-Huxley conductance model [1] using a system of nonlinear differential equations. Later, Schild et al. [2] developed an extension of the Hodgkin-Huxley model to provide a more exhaustive description of ion channels involved in nodose neuronal action potential activity. We consider a variety of methods to fit the parameters of both the Hodgkin-Huxley and Schild et al. models to an empirical stimulus response dataset. Our methods were validated using synthetic datasets, as …


An Analysis Of Changes In Seasonal Dynamics And Generational Differences In The Maine Lobster Fishery, Emily Fitting May 2023

An Analysis Of Changes In Seasonal Dynamics And Generational Differences In The Maine Lobster Fishery, Emily Fitting

Electronic Theses and Dissertations

The American lobster (Homarus americanus) supports the most valuable single species fishery in the US. Lobster landings have been increasing steadily for the last three decades, but before that landings were more variable. The high value of the lobster fishery combined with the decline of other commercially important species in this region has created increasing dependence on the resource, and previous research questions the resilience of the fishery in the face of social and environmental changes.

Important lobster life history processes, including migration patterns, growth rates, and reproduction, are driven by ocean bottom temperature, which creates a strong seasonal cycle …


Mixing Measures For Trees Of Fixed Diameter, Ari Holcombe Pomerance May 2023

Mixing Measures For Trees Of Fixed Diameter, Ari Holcombe Pomerance

Mathematics, Statistics, and Computer Science Honors Projects

A mixing measure is the expected length of a random walk in a graph given a set of starting and stopping conditions. We determine the tree structures of order n with diameter d that minimize and maximize for a few mixing measures. We show that the maximizing tree is usually a broom graph or a double broom graph and that the minimizing tree is usually a seesaw graph or a double seesaw graph.


Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski May 2023

Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski

Honors Scholar Theses

Challenging conventional wisdom is at the very core of baseball analytics. Using data and statistical analysis, the sets of rules by which coaches make decisions can be justified, or possibly refuted. One of those sets of rules relates to the construction of a batting order. Through data collection, data adjustment, the construction of a baseball simulator, and the use of a Monte Carlo Simulation, I have assessed thousands of possible batting orders to determine the roster-specific strategies that lead to optimal run production for the 2023 UConn baseball team. This paper details a repeatable process in which basic player statistics …


Non-Destructive Imaging Of Phytosulfokine Trafficking In Plants Using Fiber-Optic Fluorescence Microscopy, Bernard Abakah May 2023

Non-Destructive Imaging Of Phytosulfokine Trafficking In Plants Using Fiber-Optic Fluorescence Microscopy, Bernard Abakah

Electronic Theses and Dissertations

Plants secrete peptide ligands and use receptor signaling to respond to stress and control development. Understanding these phenomena is key to improving plant health and productivity for food, fiber, and energy applications. Phytosulfokine (PSK), a sulfated peptide hormone, regulates plant cell division, growth, and stress tolerance via specific phytosulfokine receptors (PSKRs). This study uses fiber-optic fluorescence microscopy to elucidate trafficking of PSK in live plants. The microscope features two-color optics and an objective lens connected to a 1-m coherent imaging fiber mounted on either a conventional upright microscope body or 5-axis positioning system (X–Y–Z plus pitch and yaw). PSK and …


Predicting High-Cap Tech Stock Polarity: A Combined Approach Using Support Vector Machines And Bidirectional Encoders From Transformers, Ian L. Grisham May 2023

Predicting High-Cap Tech Stock Polarity: A Combined Approach Using Support Vector Machines And Bidirectional Encoders From Transformers, Ian L. Grisham

Electronic Theses and Dissertations

The abundance, accessibility, and scale of data have engendered an era where machine learning can quickly and accurately solve complex problems, identify complicated patterns, and uncover intricate trends. One research area where many have applied these techniques is the stock market. Yet, financial domains are influenced by many factors and are notoriously difficult to predict due to their volatile and multivariate behavior. However, the literature indicates that public sentiment data may exhibit significant predictive qualities and improve a model’s ability to predict intricate trends. In this study, momentum SVM classification accuracy was compared between datasets that did and did not …


An Integer Garch Model For A Poisson Process With Time-Varying Zero-Inflation, Isuru Panduka Ratnayake, V. A. Samaranayake May 2023

An Integer Garch Model For A Poisson Process With Time-Varying Zero-Inflation, Isuru Panduka Ratnayake, V. A. Samaranayake

Mathematics and Statistics Faculty Research & Creative Works

A serially dependent Poisson process with time-varying zero-inflation is proposed. Such formulations have the potential to model count data time series arising from phenomena such as infectious diseases that ebb and flow over time. The model assumes that the intensity of the Poisson process evolves according to a generalized autoregressive conditional heteroscedastic (GARCH) formulation and allows the zero-inflation parameter to vary over time and be governed by a deterministic function or by an exogenous variable. Both the expectation maximization (EM) and the maximum likelihood estimation (MLE) approaches are presented as possible estimation methods. A simulation study shows that both parameter …


A Machine Learning Approach To Evaluate The Effect Of Sodium-Glucose Cotransporter-2 Inhibitors On Chronic Kidney Disease In Diabetes Patients, Solomon Eshun May 2023

A Machine Learning Approach To Evaluate The Effect Of Sodium-Glucose Cotransporter-2 Inhibitors On Chronic Kidney Disease In Diabetes Patients, Solomon Eshun

Theses and Dissertations

Chronic kidney disease (CKD) is a significant complication that contributes to diabetes-related mortality in the United States, and there is growing evidence that sodium-glucose cotransporter 2 inhibitors (SGLT2i) can slow its progression. However, observational studies may suffer from confounding by indication, where patient characteristics and disease severity influence the decision to prescribe SGLT2i. This study utilized electronic health records of individuals with diabetes (from TriNetX) to investigate the effectiveness of SGLT2i on CKD progression. The database provided detailed information on patients’ CKD status, demographics, diagnosis, procedures, and medications, along with corresponding dates of diagnosis and prescription. The study comprised of …


Hispanic Human Capital And Financial Aid Application In The West Census Region, Benjamin Lundy-Paine May 2023

Hispanic Human Capital And Financial Aid Application In The West Census Region, Benjamin Lundy-Paine

Capstone Projects and Master's Theses

As of 2021, very few Hispanic residents in the United States held a college degree in comparison to non-Hispanic residents. Research has shown that, particularly for Hispanic students, financial aid increases college persistence. Hispanic Free Application for Federal Student Aid (FAFSA) submission rates rank among the lowest, preventing many Hispanic students from receiving financial assistance. This issue is most prevalent West Census Region (WCR), where there is the highest concentration of Hispanic residents. To understand what barriers may be preventing Hispanic submission in the WCR this Capstone used logistic regression models to analyze student-level data from the National Center for …


Distance Correlation Based Feature Selection In Random Forest, Jose Munoz-Lopez May 2023

Distance Correlation Based Feature Selection In Random Forest, Jose Munoz-Lopez

Electronic Theses, Projects, and Dissertations

The Pearson correlation coefficient is a commonly used measure of correlation, but it has limitations as it only measures the linear relationship between two numerical variables. In 2007, Szekely et al. introduced the distance correlation, which measures all types of dependencies between random vectors X and Y in arbitrary dimensions, not just the linear ones. In this thesis, we propose a filter method that utilizes distance correlation as a criterion for feature selection in Random Forest regression. We conduct extensive simulation studies to evaluate its performance compared to existing methods under various data settings, in terms of the prediction mean …


Jackknife Empirical Likelihood Tests For Equality Of Generalized Lorenz Curves, Anton Butenko May 2023

Jackknife Empirical Likelihood Tests For Equality Of Generalized Lorenz Curves, Anton Butenko

Electronic Theses, Projects, and Dissertations

A Lorenz curve is a graphical representation of the distribution of income or wealth within a population. The generalized Lorenz curve can be created by scaling the values on the vertical axis of a Lorenz curve by the average output of the distribution. In this thesis, we propose two nonparametric methods for testing the equality of two generalized Lorenz curves. Both methods are based on empirical likelihood and utilize a U -statistic. We derive the limiting distribution of the likelihood ratio, which is shown to follow a chi-squared distribution with one degree of freedom. We conduct simulations to compare the …


A Generalized Family Of Exponentiated Composite Distributions With Applications To Insurance And Survival Data, Bowen Liu May 2023

A Generalized Family Of Exponentiated Composite Distributions With Applications To Insurance And Survival Data, Bowen Liu

UNLV Theses, Dissertations, Professional Papers, and Capstones

The concept of composite distributions was proposed in the early 2000s as a good parametric solution to model the data with heavy tails. Since the concept was proposed, it has been widely used in different areas, such as modeling insurance claim size data, predicting the risk measures in insurance data analysis, fitting survival time data, and modeling precipitation data. While a lot of the composite distributions demonstrated great performances in real applications, many commonly used composite distributions such as the inverse gamma-Pareto (IGP) or exponential-Pareto (EP), did not demonstrate great performances when fitting to several particular data sets. In order …


Multidimensional Investigation Of Tennessee’S Urban Forest, Jillian L. Gorrell May 2023

Multidimensional Investigation Of Tennessee’S Urban Forest, Jillian L. Gorrell

Doctoral Dissertations

Preserving existing trees in urban areas and properly cultivating urban forest conservation and management opportunities is valuable to the ever-growing urban environment and necessary for creating optimal experiences and educational tools to meet the needs of increasing urban populations. This dissertation contains studies investigating several facets of the urban forest, including environmental effects of deforestation and urbanization, tree equity, and urban forest facility management and accessibility. Community education and outreach at arboreta about the importance of the tree canopy can help promote environmental stewardship. A digital questionnaire was electronically distributed to representatives of arboreta certified through the Tennessee Division of …


A Monte Carlo Analysis Of Nonprobability Sampling & Post Hoc Corrections, Julia Hong May 2023

A Monte Carlo Analysis Of Nonprobability Sampling & Post Hoc Corrections, Julia Hong

Masters Theses & Specialist Projects

Nonprobability samples are often used in place of probability samples because the former are less trouble and less expensive. Unfortunately, it is difficult to determine how well a sample represents population parameters when using nonprobability samples. Researchers attempt to mitigate the disadvantages of nonprobability sampling by performing post hoc corrections, but this adjustment may not successfully undo the effects of nonprobability sampling. To examine these effects, a Monte Carlo simulation was conducted to create a pseudo-population from which samples were drawn. Forty-one conditions were replicated 10,000 times each, with each sample consisting of 100 observations. A post-stratification adjustment was made …


Large Deviations For Self Intersection Local Times Of Ornstein-Uhlenbeck Processes, Apostolos Gournaris May 2023

Large Deviations For Self Intersection Local Times Of Ornstein-Uhlenbeck Processes, Apostolos Gournaris

Doctoral Dissertations

In the area of large deviations, people concern about the asymptotic computation of small probabilities on an exponential scale. The general form of large deviations can be roughly described as: P{Yn ∈ A} ≈ exp{−bnI(A)} (n → ∞), for a random sequence {Yn}, a positive sequence bn with bn → ∞, and a coefficient I(A) ≥ 0. In applications, we often concern about the probability that the random variables take large values, that is we concern about the P{Yn ≥ λ}, where λ > 0. Here, we consider the Ornstein-Uhlenbeck process, study the properties of the local times and self intersection …


Examining Political Discourse On Online 8kun And Reddit Forums, Braden Mindrum May 2023

Examining Political Discourse On Online 8kun And Reddit Forums, Braden Mindrum

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

A recent example of political violence in the United States was that of the January 6, 2021, Capitol attack in connection with the certification of Joseph R. Biden’s victory over Donald J. Trump in the 2020 US presidential election. This thesis analyzes the events of January 6, 2021, through the lens of social media discourse. This thesis presents a workflow that acquired over 5 million 8kun and Reddit posts from various apolitical and political forums in the three months preceding and following the Capitol attack on January 6, 2021. Techniques from text analysis are then used to group forums according …


Examining Model Complexity's Effects When Predicting Continuous Measures From Ordinal Labels, Mckade S. Thomas May 2023

Examining Model Complexity's Effects When Predicting Continuous Measures From Ordinal Labels, Mckade S. Thomas

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Many real world problems require the prediction of ordinal variables where the values are a set of categories with an ordering to them. However, in many of these cases the categorical nature of the ordinal data is not a desirable outcome. As such, regression models treat ordinal variables as continuous and do not bind their predictions to discrete categories. Prior research has found that these models are capable of learning useful information between the discrete levels of the ordinal labels they are trained on, but complex models may learn ordinal labels too closely, missing the information between levels. In this …


A Machine Learning Approach To Obese-Inflammatory Phenotyping, Tania Mayleth Vargas May 2023

A Machine Learning Approach To Obese-Inflammatory Phenotyping, Tania Mayleth Vargas

Theses and Dissertations

Obesity is the accumulation of an abnormal, or excessive, amount of fat in the body, which can have negative effects on overall health. This excess accumulation of macronutrients in adipose tissue can cause the release of inflammatory mediators, leading to a proinflammatory state. Inflammation is a known risk factor for various health conditions, including cardiovascular diseases, metabolic syndrome, and diabetes. This study sought to examine the use of data mining methods, particularly clustering algorithms, to identify inflammatory biomarker phenotypes and their association with obesity in a local adolescent population. The algorithms evaluated in this study included: k-means, Ward's hierarchical …


Effects Of Functional Network Model Definition On Biomarker Outcome Prediction, Xinyang Feng May 2023

Effects Of Functional Network Model Definition On Biomarker Outcome Prediction, Xinyang Feng

Arts & Sciences Electronic Theses and Dissertations

Machine learning (ML) models are widely used to investigate the human connectome and to predict and understand behavior, emotion, and cognition. Prior research has organized pediatric connectome data using adult functional network models. However, this assumes that adult functional network models are appropriate and useful for prediction developmental outcomes from pediatric connectome data. We hypothesize that the application of adult brain network models could result in poor model fit, limiting the generalizability of results. Here, we test whether prediction of biological age is improved by concordant brain network models matching underlying functional connectome data. To quantify the difference in age …