Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 631 - 660 of 13246

Full-Text Articles in Physical Sciences and Mathematics

Effects Of Land Use On Soil Microbial Communities In Tropical Montane Forests Of Malaysian Borneo, Yang Kai Tang May 2023

Effects Of Land Use On Soil Microbial Communities In Tropical Montane Forests Of Malaysian Borneo, Yang Kai Tang

Graduate Theses and Dissertations

Land use, such as logging and forest conversion to agriculture, can modify soil physicochemical and biological properties, and affect soil health. To understand how land use change can impact soil properties and canopy structure, we used a land use gradient in Malaysian Borneo consisting of six sites, including old growth forests, mixed forests, and agriculture fields. Specifically, we aimed to answer the following questions: (1) How do soil physicochemical properties vary across land use types? (2) Does bacterial diversity and composition vary across different land use types? (3) Does fungal diversity and composition vary across different land use types? We …


A Machine Learning Approach For Predicting Clinical Trial Patient Enrollment In Drug Development Portfolio Demand Planning, Ahmed Shoieb May 2023

A Machine Learning Approach For Predicting Clinical Trial Patient Enrollment In Drug Development Portfolio Demand Planning, Ahmed Shoieb

Masters Theses

One of the biggest challenges the clinical research industry currently faces is the accurate forecasting of patient enrollment (namely if and when a clinical trial will achieve full enrollment), as the stochastic behavior of enrollment can significantly contribute to delays in the development of new drugs, increases in duration and costs of clinical trials, and the over- or under- estimation of clinical supply. This study proposes a Machine Learning model using a Fully Convolutional Network (FCN) that is trained on a dataset of 100,000 patient enrollment data points including patient age, patient gender, patient disease, investigational product, study phase, blinded …


Quantifying The Effect Of Socio-Economic Predictors And The Built Environment On Mental Health Events In Little Rock, Ar, Alfieri Ek, Grant Drawve, Samantha Robinson, Jyotishka Datta May 2023

Quantifying The Effect Of Socio-Economic Predictors And The Built Environment On Mental Health Events In Little Rock, Ar, Alfieri Ek, Grant Drawve, Samantha Robinson, Jyotishka Datta

Sociology and Criminology Faculty Publications and Presentations

Law enforcement agencies continue to grow in the use of spatial analysis to assist in identifying patterns of outcomes. Despite the critical nature of proper resource allocation for mental health incidents, there has been little progress in statistical modeling of the geo-spatial nature of mental health events in Little Rock, Arkansas. In this article, we provide insights into the spatial nature of mental health data from Little Rock, Arkansas between 2015 and 2018, under a supervised spatial modeling framework. We provide evidence of spatial clustering and identify the important features influencing such heterogeneity via a spatially informed hierarchy of generalized …


Dynamics Of Inertial And Non-Inertial Particles In Geophysical Flows, Nishanta Baral May 2023

Dynamics Of Inertial And Non-Inertial Particles In Geophysical Flows, Nishanta Baral

Theses, Dissertations and Culminating Projects

We consider the dynamics of inertial and non-inertial particles in various flows. We investigate the underlying structures of the flow field by examining their Lagrangian coherent structures (LCS), which are found by computing finitetime Lyapunov exponents (FTLE). We compare the behavior of massless noninertial particles using the velocity fields from four models, the Duffing oscillator, the Bickley jet, the double-gyre flow, and a quasi-geostrophic geophysical flow model, with that of inertial particles. For inertial particles with finite size and mass, we use the Maxey-Riley equation to describe the particle’s motion. We explore the preferential aggregation of inertial particles and demonstrate …


Small But Mighty: Examing The Utility Of Microstatistics In Modeling Ice Hockey, Matt Palmer May 2023

Small But Mighty: Examing The Utility Of Microstatistics In Modeling Ice Hockey, Matt Palmer

Senior Honors Theses

As research into hockey analytics continues, an increasing number of metrics are being introduced into the knowledge base of the field, creating a need to determine whether various stats are useful or simply add noise to the discussion. This paper examines microstatistics – manually tracked metrics which go beyond the NHL’s publicly released stats – both through the lens of meta-analytics (which attempt to objectively assess how useful a metric is) and modeling game probabilities. Results show that while there is certainly room for improvement in understanding and use of microstats in modeling, the metrics overall represent an area of …


Quantification Of Various Types Of Biases In Large Language Models, Sudhashree Sayenju Apr 2023

Quantification Of Various Types Of Biases In Large Language Models, Sudhashree Sayenju

Doctor of Data Science and Analytics Dissertations

Natural Language Processing (NLP) systems are included everywhere on the internet from search engines, language translations to more advanced systems like voice assistant and customer service. Since humans are always on the receiving end of NLP technologies, it is very important to analyze whether or not the Large Language Models (LLMs) in use have bias and are therefore unfair. The majority of the research in NLP bias has focused on societal stereotype biases embedded in LLMs. However, our research focuses on all types of biases, namely model class level bias, stereotype bias and domain bias present in LLMs. Model class …


The 2015 Ncaa Cost-Of-Attendance Stipend And Its Effects On Institutional Financial Aid Packages, Sara Greene Apr 2023

The 2015 Ncaa Cost-Of-Attendance Stipend And Its Effects On Institutional Financial Aid Packages, Sara Greene

Honors Theses

In 2015, the National Collegiate Athletic Association (NCAA) allowed “Cost of Attendance” (COA) stipends to be offered to athletic recruits for Division I schools. These stipends are intended to allow schools to grant aid to student-athletes beyond a full-ride scholarship to cover additional costs imposed on student-athletes. These stipends created an opportunity for the “Autonomy” Power 5 programs to utilize a competitive tactic to try to win over the top recruits. There is evidence that these COA stipends have caused an increase in the estimated cost of attendance reported by the university. This paper examines if the COA stipends have …


Time Series Analysis Of Longitudinally Collected Standard Autoperimetry Data In Glaucoma Patients, Carlyn Childress Apr 2023

Time Series Analysis Of Longitudinally Collected Standard Autoperimetry Data In Glaucoma Patients, Carlyn Childress

Honors College Theses

Glaucoma is a group of eye diseases in which damage gradually occurs to the optic nerve, which often leads to partial or complete loss of vision. As the second leading cause of blindness, there is no cure for glaucoma. Early detection and the tracking of its progression is key to managing the effects of glaucoma. Ordinary Least Squares Regression (OLSR), the most commonly used methodology for tracking glaucoma progression, is inappropriate as the longitudinally collected perimetry data from the glaucoma patients appears to be temporally correlated. Time series models, that account for temporal correlation, are better methods to analyze Mean …


On Cox Proportional Hazards Model Performance Under Different Sampling Schemes, Hani Samawi, Lili Yu, Jingjing Yin Apr 2023

On Cox Proportional Hazards Model Performance Under Different Sampling Schemes, Hani Samawi, Lili Yu, Jingjing Yin

Department of Biostatistics, Epidemiology, and Environmental Health Sciences Faculty Publications

Cox’s proportional hazards model (PH) is an acceptable model for survival data analysis. This work investigates PH models’ performance under different efficient sampling schemes for analyzing time to event data (survival data). We will compare a modified Extreme, and Double Extreme Ranked Set Sampling (ERSS, and DERSS) schemes with a simple random sampling scheme. Observations are assumed to be selected based on an easy-to-evaluate baseline available variable associated with the survival time. Through intensive simulations, we show that these modified approaches (ERSS and DERSS) provide more powerful testing procedures and more efficient estimates of hazard ratio than those based on …


Employee Attrition: Analyzing Factors Influencing Job Satisfaction Of Ibm Data Scientists, Graham Nash Apr 2023

Employee Attrition: Analyzing Factors Influencing Job Satisfaction Of Ibm Data Scientists, Graham Nash

Symposium of Student Scholars

Employee attrition is a relevant issue that every business employer must consider when gauging the effectiveness of their employees. Whether or not an employee chooses to leave their job can come from a multitude of factors. As a result, employers need to develop methods in which they can measure attrition by calculating the several qualities of their employees. Factors like their age, years with the company, which department they work in, their level of education, their job role, and even their marital status are all considered by employers to assist in predicting employee attrition. This project will be analyzing a …


Crime In Los Angeles, Cierra Hughley Apr 2023

Crime In Los Angeles, Cierra Hughley

Symposium of Student Scholars

This study will examine crimes committed in the city of Los Angeles dating back to the year of 2020. The reported data was pulled from the open data of Los Angeles Police Department. The purpose of this study is to show if gender is related to the three primary crimes: property crimes, violent crimes, or other crimes. Doing so will show which crimes were committed by each gender. Even though this study is on gender and crimes committed; it was a hard decision because there were many variables to choose from. However, exploring the relationship between crime and gender was …


Statistical Analysis Of The Relationship Between Protected Bird Species And National Parks, Katherine Harmon Apr 2023

Statistical Analysis Of The Relationship Between Protected Bird Species And National Parks, Katherine Harmon

Symposium of Student Scholars

The ecological diversity of Earth is majorly threatened by habitat loss due to the destruction by human intervention. The conservation status of all identified species are classified into nine categories of varying vulnerability as described by the International Union for Conservation of Nature’s Red List. By understanding the vulnerability of specific species, scientists can work to maintain a viable and healthy ecosystem globally by instilling rules and regulations of observed habitats for threatened species. These habitats are identified by surveying potential locations for threatened species and determining the population size at each site. An example of one of these surveys …


Two Sample Statistical Test For Location Parameters, Narinder Kumar, Arun Kumar Apr 2023

Two Sample Statistical Test For Location Parameters, Narinder Kumar, Arun Kumar

Journal of Modern Applied Statistical Methods

A class of distribution-free tests for the homogeneity of location parameters is proposed and compared with different competitors in terms of Pitman asymptotic relative efficiency. A numerical example is provided and a simulation study is made to check the performance of the tests.


Reducing Restaurant Inventory Costs Through Sales Forecasting, Tyler Mason, Chris Schoen, Trevor Gilbert, Jonathan Enriquez Apr 2023

Reducing Restaurant Inventory Costs Through Sales Forecasting, Tyler Mason, Chris Schoen, Trevor Gilbert, Jonathan Enriquez

Senior Design Project For Engineers

Family Restaurant is a local restaurant in the greater Atlanta area that serves a variety of dishes that include an assortment of 19 different proteins. Currently, Family Restaurant places protein orders based on business intuition, and tends to over-stock and sometimes under-stock. To minimize inventory costs by reducing over-stocking and preventing under-stocking of proteins, we applied Facebook Prophet (FB Prophet), ARIMA, and XG Boost machine learning models to predict protein demand and then fed these results into a Fixed Time Period inventory model to make an overall order suggestion based on the specified time period. We trained our models on …


Using A Distributive Approach To Model Insurance Loss, Kayla Kippes Apr 2023

Using A Distributive Approach To Model Insurance Loss, Kayla Kippes

Student Research Submissions

Insurance loss is an unpredicted event that stands at the forefront of the insurance industry. Loss in insurance represents the costs or expenses incurred due to a claim. An insurance claim is a request for the insurance company to pay for damage caused to an individual’s property. Loss can be measured by how much money (the dollar amount) has been paid out by the insurance company to repair the damage or it can be measured by the number of claims (claim count) made to the insurance company. Insured events include property damage due to fire, theft, flood, a car accident, …


Defining Characteristics That Lead To Cost-Efficient Veteran Nba Free Agent Signings, David Mccain Apr 2023

Defining Characteristics That Lead To Cost-Efficient Veteran Nba Free Agent Signings, David Mccain

Honors Projects in Mathematics

Throughout the history of the NBA, decisions regarding the signing of free agents have been riddled with complexity. Franchises are tasked with finding out what players will serve as optimal free agent signings prior to seeing them perform within the framework of their team. This study hypothesizes that the adequacy of an NBA free agent signing can be modeled and predicted through the implementation of a machine learning model. The model will learn the necessary information using training and testing data sets that include various player biometrics, game statistics, and financial information. The application of this machine learning model will …


Teaching About The Global Refugee Crisis, Melissa Kafer Apr 2023

Teaching About The Global Refugee Crisis, Melissa Kafer

Honors Projects

Around the world, there are more than 30 million refugees (UNHCR, 2023) facing language barriers, cultural differences, prejudice, racism, and xenophobia. The number of admitted refugees in 2022 has more than doubled since 2021 (Duffin, 2022), and yet, many Americans do not know or understand the global refugee crisis. There are misconceptions in America that cause lack of empathy, bias, and prejudice towards refugees. Through the creation of four lesson plans, this research project aims to discover Americans’ misunderstandings regarding refugees and teach them about the crisis to remedy the misconceptions. This study includes a literature review detailing appropriate teaching …


A Pharmacoepidemiological Study Of Myocarditis And Pericarditis Following The First Dose Of Mrna Covid-19 Vaccine In Europe, Joana Tome, Logan Cowan, Isaac Fung Apr 2023

A Pharmacoepidemiological Study Of Myocarditis And Pericarditis Following The First Dose Of Mrna Covid-19 Vaccine In Europe, Joana Tome, Logan Cowan, Isaac Fung

Department of Biostatistics, Epidemiology, and Environmental Health Sciences Faculty Publications

This study assessed the myocarditis and pericarditis reporting rate of the first dose of mRNA COVID-19 vaccines in Europe. Myocarditis and pericarditis data pertinent to mRNA COVID19 vaccines (1 January 2021–11 February 2022) from EudraVigilance database were combined with European Centre for Disease Prevention and Control (ECDC)’s vaccination tracker data. The reporting rate was expressed as events (occurring within 28 days of the first dose) per 1 million individuals vaccinated. An observed-to-expected (OE) analysis quantified excess risk for myocarditis or pericarditis following the first mRNA COVID-19 vaccination. The reporting rate of myocarditis per 1 million individuals vaccinated was 17.27 (95% …


Electric Vehicle Uptake: What Factors Are Motivating The Shift For College-Aged And Older Groups?, Jake Cardines Apr 2023

Electric Vehicle Uptake: What Factors Are Motivating The Shift For College-Aged And Older Groups?, Jake Cardines

Honors Projects in Mathematics

Electric vehicles (EVs) arguably are the most quickly expanding form of transportation as the world races toward a greener future with advanced technology and reduced reliance on fossil fuels. This study analyzes various expected inputs to motivating consumers of particular age groups to purchase EVs, including examination of how the idea of EV ownership is currently perceived and testing which factors influence it positively and negatively. Data collected from 113 survey respondents serves as the basis for determining the responsiveness of potential future EV owners to variables such as vehicle brand and charging availability, electric range, costs associated with purchase …


The Bellarmine Bee Bed: Organizing A Native Plant Garden Using Feedback From The Local Community, Kate Moran Apr 2023

The Bellarmine Bee Bed: Organizing A Native Plant Garden Using Feedback From The Local Community, Kate Moran

Undergraduate Theses

Animal pollinators are the cornerstone of healthy ecosystems. Their survival is essential for the persistence of entire food chains: from the flowers they cross-pollinate directly, to the animals who depend on those plants for nutrition. The establishment of pollinator gardens—particularly ones that consist of native plants—is an effective way to enhance their biodiversity, abundance, and well-being.

The main goal of this thesis is to construct a pollinator garden that maximizes the benefits for animal pollinators using feedback from local gardeners. A survey was used to gather information about the popularity and preferences of 40 flowering plants, and after analyzing the …


From Big Farm To Big Pharma: A Differential Equations Model Of Antibiotic-Resistant Salmonella In Industrial Poultry Populations, Rilyn Mckallip Apr 2023

From Big Farm To Big Pharma: A Differential Equations Model Of Antibiotic-Resistant Salmonella In Industrial Poultry Populations, Rilyn Mckallip

Honors Theses

Antibiotics are used in poultry production as prophylaxis, curative treatment, and growth promotion. The first use is as prophylaxis, or prevention of common bacterial diseases. The crowded conditions in concentrated animal feeding operations necessitate management of infectious disease to ensure overall animal health and the profitability of such operations. In these farms, between 20,000 and 125,000 birds are raised in shed-like enclosures [3], with an average of less than one square foot of space per chicken [34]. Antibiotics are currently used in chicken farms to manage and prevent common bacterial diseases such as respiratory and digestive tract infections, as well …


Open Data Indicates That Collegedale Could Be A Bluezone, Tristan Deschamps, Alva Johnson Apr 2023

Open Data Indicates That Collegedale Could Be A Bluezone, Tristan Deschamps, Alva Johnson

Campus Research Day

A blue zone is an indicator of exceptional health in a community. Adventists have a blue zone community in Loma Linda, but there has been little research into other Adventist populated areas that could be blue zones. Therefore, our goal is to show that open data suggests that a blue zone may exist near Southern Adventist University, specifically in Collegedale. This data has been gathered from different federal sources, including, the CDC, the US Census Bureau, the Tennessee Department of Health, official state records, and federal documents that are available to the public.


Gpu Utilization: Predictive Sarimax Time Series Analysis, Dorothy Dorie Parry Apr 2023

Gpu Utilization: Predictive Sarimax Time Series Analysis, Dorothy Dorie Parry

Modeling, Simulation and Visualization Student Capstone Conference

This work explores collecting performance metrics and leveraging the output for prediction on a memory-intensive parallel image classification algorithm - Inception v3 (or "Inception3"). Experimental results were collected by nvidia-smi on a computational node DGX-1, equipped with eight Tesla V100 Graphic Processing Units (GPUs). Time series analysis was performed on the GPU utilization data taken, for multiple runs, of Inception3’s image classification algorithm (see Figure 1). The time series model applied was Seasonal Autoregressive Integrated Moving Average Exogenous (SARIMAX).


The Effectiveness Of Visualization Techniques For Supporting Decision-Making, Cansu Yalim, Holly A. H. Handley Apr 2023

The Effectiveness Of Visualization Techniques For Supporting Decision-Making, Cansu Yalim, Holly A. H. Handley

Modeling, Simulation and Visualization Student Capstone Conference

Although visualization is beneficial for evaluating and communicating data, the efficiency of various visualization approaches for different data types is not always evident. This research aims to address this issue by investigating the usefulness of several visualization techniques for various data kinds, including continuous, categorical, and time-series data. The qualitative appraisal of each technique's strengths, weaknesses, and interpretation of the dataset is investigated. The research questions include: which visualization approaches perform best for different data types, and what factors impact their usefulness? The absence of clear directions for both researchers and practitioners on how to identify the most effective visualization …


Length Bias Estimation Of Small Businesses Lifetime, Simeng Li Apr 2023

Length Bias Estimation Of Small Businesses Lifetime, Simeng Li

Honors Theses

Small businesses, particularly restaurants, play a crucial role in the economy by generating employment opportunities, boosting tourism, and contributing to the local economy. However, accurately estimating their lifetimes can be challenging due to the presence of length bias, which occurs when the likelihood of sampling any particular restaurant's closure is influenced by its duration in operation. To address the issue, this study conducts goodness-of-fit tests on exponential/gamma family distributions and employs the Kaplan-Meier method to more accurately estimate the average lifetime of restaurants in Carytown. By providing insights into the challenges of estimating the lifetimes of small businesses, this study …


Statistical Approach To Quantifying Interceptability Of Interaction Scenarios For Testing Autonomous Surface Vessels, Benjamin E. Hargis, Yiannis E. Papelis Apr 2023

Statistical Approach To Quantifying Interceptability Of Interaction Scenarios For Testing Autonomous Surface Vessels, Benjamin E. Hargis, Yiannis E. Papelis

Modeling, Simulation and Visualization Student Capstone Conference

This paper presents a probabilistic approach to quantifying interceptability of an interaction scenario designed to test collision avoidance of autonomous navigation algorithms. Interceptability is one of many measures to determine the complexity or difficulty of an interaction scenario. This approach uses a combined probability model of capability and intent to create a predicted position probability map for the system under test. Then, intercept-ability is quantified by determining the overlap between the system under test probability map and the intruder’s capability model. The approach is general; however, a demonstration is provided using kinematic capability models and an odometry-based intent model.


National Residency Matching Program: Looking At The Data Through Linear Regressions, Jacklyn Tellez Apr 2023

National Residency Matching Program: Looking At The Data Through Linear Regressions, Jacklyn Tellez

Undergraduate Theses

The National Residency Matching Program (NRMP) oversees the process of medical school graduates being matched to a residency program. The NRMP determines both the hospital and residency program for medical students. Prior to matching, both hospital programs and students rank each other. The NRMP uses these lists to determine the matches. Four distinct models using data from hospitals and applicants were used to determine what characteristics lead to a chance of being matched. Each model went through multiple rounds of testing to determine the importance of the different independent variables. In each data set, the dependent variable is either the …


Bridging The Chasm Between Fundamental, Momentum, And Quantitative Investing, Allen Hoskins, Jeff Reed, Robert Slater Apr 2023

Bridging The Chasm Between Fundamental, Momentum, And Quantitative Investing, Allen Hoskins, Jeff Reed, Robert Slater

SMU Data Science Review

A chasm exists between the active public equity investment management industry's fundamental, momentum, and quantitative styles. In this study, the researchers explore ways to bridge this gap by leveraging domain knowledge, fundamental analysis, momentum, crowdsourcing, and data science methods. This research also seeks to test the developed tools and strategies during the volatile time period of 2020 and 2021.


Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties, Robert Burigo, Scott Frazier, Eli Kravez, Nibhrat Lohia Apr 2023

Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties, Robert Burigo, Scott Frazier, Eli Kravez, Nibhrat Lohia

SMU Data Science Review

Using the physicochemical properties of wine to predict quality has been done in numerous studies. Given the nature of these properties, the data is inherently skewed. Previous works have focused on handful of sampling techniques to balance the data. This research compares multiple sampling techniques in predicting the target with limited data. For this purpose, an ensemble model is used to evaluate the different techniques. There was no evidence found in this research to conclude that there are specific oversampling methods that improve random forest classifier for a multi-class problem.


Extending The M3-Competition: Category And Interval-Specific Time Series Forecasting, Will Sherman, Kati Schuerger, Randy Kim, Bivin Sadler Apr 2023

Extending The M3-Competition: Category And Interval-Specific Time Series Forecasting, Will Sherman, Kati Schuerger, Randy Kim, Bivin Sadler

SMU Data Science Review

The M3-Competition found that simple models outperform more complex ones for time series forecasting. As part of these competitions, several claims were made that statistical models exceeded machine learning (ML) techniques, such as recurrent neural networks (RNN), in prediction performance. These findings may over-generalize the capabilities of statistical models since the analysis measured the total forecasting accuracy across a wide range of industries and fields and with different interval lengths. This investigation aimed to assess how statistical and ML methods compared when individuating series by category and time interval. Utilizing the M3 data and building individual models using Facebook© Prophet …