Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 61 - 90 of 685

Full-Text Articles in Physical Sciences and Mathematics

Forecasting Razorback Baseball Game Outcomes, Austin Raabe May 2022

Forecasting Razorback Baseball Game Outcomes, Austin Raabe

Information Systems Undergraduate Honors Theses

Despite the disappointing end to the 2021 Arkansas Razorback baseball year, the team’s success provided hog fans something to look forward to next season. While they will be without the 2021 Golden Spikes Award winner, Kevin Kopps, and four All-SEC team selections, the 2022 roster has promising new and returning talent. With fifty percent of the players who played significant time last year coming back (minimum ten hits or ten innings pitched), the arrival of several impact transfers from major conferences, and a recruiting class ranked in the top five according to Perfect Game, there is reason to believe that …


An Examination Of The Statistics And Risk Management Concepts Behind The Patient Protection And Affordable Care Act (Ppaca) Of 2010, Scott Sinclair May 2022

An Examination Of The Statistics And Risk Management Concepts Behind The Patient Protection And Affordable Care Act (Ppaca) Of 2010, Scott Sinclair

Undergraduate Honors Thesis Collection

The Patient Protection and Affordable Care Act (PPACA) is the overarching federal law that has impacted the intricacies of the health insurance market for more than a decade. Using the supervised learning method of multiple linear regression, the relationship between the medical loss ratio rebates and predictor variables such as the state, health insurance market, and the number of insurance companies owing rebates will be analyzed, along with the actuarial value of metal tiers and geographic rating area factors in terms of their relationship to the insurance premium for a standard family of four, defined as a forty-year-old couple with …


Causalmodels: An R Library For Estimating Causal Effects, Joshua Wolff Anderson May 2022

Causalmodels: An R Library For Estimating Causal Effects, Joshua Wolff Anderson

Computational and Data Sciences (MS) Theses

Free and open source software for statistical modeling and machine learning have advanced productivity in data science significantly. Packages such as SciPy in Python and caret in R provide fundamental tools for statistical modeling and machine learning in the two most popular programming languages used by data scientists. Unfortunately, robust tools similar to these are limited in terms of causal inference. The tools in R that exist lack consistent and standardized methodologies and inputs. R lacks a comprehensive package that offers traditional causal inference methods such as standardization, IP weighting, G-estimation, outcome regression, and propensity matching in one common package. …


Understanding And Improving The System: The Effects Of Weighting On The Accuracy Of Political Polling In Arkansas, Beck Williams May 2022

Understanding And Improving The System: The Effects Of Weighting On The Accuracy Of Political Polling In Arkansas, Beck Williams

Political Science Undergraduate Honors Theses

In an effort to increase the accuracy of statewide political polling in Arkansas, we explore the statistical strategy of weighting with a focus on one yearly opinion poll: The Arkansas Poll. We conduct over 70 weighting experiments on the 2016 and 2020 Arkansas Polls using a variety of variables and opinion questions. From these experiments, we find that while some weighted variables tend to create larger changes, weighting typically results in a single-digit percentage change that does not substantially shift or “flip” the majorities. Due to a greater rate of change through weighting in the 2020 Poll compared to the …


On Misuses Of The Kolmogorov–Smirnov Test For One-Sample Goodness-Of-Fit, Anthony Zeimbekakis Apr 2022

On Misuses Of The Kolmogorov–Smirnov Test For One-Sample Goodness-Of-Fit, Anthony Zeimbekakis

Honors Scholar Theses

The Kolmogorov–Smirnov (KS) test is one of the most popular goodness-of-fit tests for comparing a sample with a hypothesized parametric distribution. Nevertheless, it has often been misused. The standard one-sample KS test applies to independent, continuous data with a hypothesized distribution that is completely specified. It is not uncommon, however, to see in the literature that it was applied to dependent, discrete, or rounded data, with hypothesized distributions containing estimated parameters. For example, it has been "discovered" multiple times that the test is too conservative when the parameters are estimated. We demonstrate misuses of the one-sample KS test in three …


Analytical Study To Determine Significant Causes Of Increased No-Hitters In The 2021 Major League Baseball Season, Joel Robison Apr 2022

Analytical Study To Determine Significant Causes Of Increased No-Hitters In The 2021 Major League Baseball Season, Joel Robison

Honors Projects

Why were there so many no-hitters in the 2021 MLB season? This project focuses on possible significant causes to the record-breaking number of no-hitters pitched in the 2021 Major League Baseball season. Specifically, this project takes an analytical look at the recent trends in launch angles and spin rates to determine if there are any significant causes to the increased number of no-hitters in baseball. The random nature and unpredictability of the game of baseball make it almost impossible to come to any solid conclusions.


Einstein-Roscoe Regression For The Slag Viscosity Prediction Problem In Steelmaking, Hiroto Saigo, Dukka Kc, Noritaka Saito Apr 2022

Einstein-Roscoe Regression For The Slag Viscosity Prediction Problem In Steelmaking, Hiroto Saigo, Dukka Kc, Noritaka Saito

Michigan Tech Publications

In classical machine learning, regressors are trained without attempting to gain insight into the mechanism connecting inputs and outputs. Natural sciences, however, are interested in finding a robust interpretable function for the target phenomenon, that can return predictions even outside of the training domains. This paper focuses on viscosity prediction problem in steelmaking, and proposes Einstein-Roscoe regression (ERR), which learns the coefficients of the Einstein-Roscoe equation, and is able to extrapolate to unseen domains. Besides, it is often the case in the natural sciences that some measurements are unavailable or expensive than the others due to physical constraints. To this …


A Monte Carlo Analysis Of Seven Dichotomous Variable Confidence Interval Equations, Morgan Juanita Dubose Apr 2022

A Monte Carlo Analysis Of Seven Dichotomous Variable Confidence Interval Equations, Morgan Juanita Dubose

Masters Theses & Specialist Projects

Department of Psychological Sciences Western Kentucky University There are two options to estimate a range of likely values for the population mean of a continuous variable: one for when the population standard deviation is known and another for when the population standard deviation is unknown. There are seven proposed equations to calculate the confidence interval for the population mean of a dichotomous variable: normal approximation interval, Wilson interval, Jeffreys interval, Clopper-Pearson, Agresti-Coull, arcsine transformation, and logit transformation. In this study, I compared the percent effectiveness of each equation using a Monte Carlo analysis and the interval range over a range …


Split Classification Model For Complex Clustered Data, Katherine Gerot Mar 2022

Split Classification Model For Complex Clustered Data, Katherine Gerot

Honors Theses

Classification in high-dimensional data has generated tremendous interest in a multitude of fields. Data in higher dimensions often tend to reside in non-Euclidean metric space. This prevents Euclidean-based classification methodologies, such as regression, from reliably modeling the data. Many proposed models rely on computationally-complex embedding to convert the data to a more usable format. Others, namely the Support Vector Machine, rely on kernel manipulation to implicitly describe the "feature space" to arrive at a non-linear decision boundary. The proposed methodology in this paper seeks to classify complex data in a relatively computationally-simple and explainable manner.


So Long My Friend, Bryan Mcnair Jan 2022

So Long My Friend, Bryan Mcnair

Journal of Humanistic Mathematics

No abstract provided.


Mathematical Formulations For Complex Resource Scheduling Problems., T. R. Lalita Dr. Jan 2022

Mathematical Formulations For Complex Resource Scheduling Problems., T. R. Lalita Dr.

Doctoral Theses

This thesis deals with development of effective models for large scale real-world resource scheduling problems. Efficient utilization of resources is crucial for any organization or industry as resources are often scarce. Scheduling them in an optimal way can not only take care of the scarcity but has potential economic benefits. Optimal utilization of resources reduces costs and thereby provides a competitive edge in the business world. Resources can be of different types such as human (personnel-skilled and unskilled), financial(budgets), materials, infrastructures(airports and seaports with designed facilities, windmills, warehouses’ area, hotel rooms etc) and equipment (microprocessors, cranes, machinery, aircraft simulators for …


Many-Objective Evolutionary Algorithms: Objective Reduction, Decomposition And Multi-Modality., Monalisa Pal Dr. Jan 2022

Many-Objective Evolutionary Algorithms: Objective Reduction, Decomposition And Multi-Modality., Monalisa Pal Dr.

Doctoral Theses

Evolutionary Algorithms (EAs) for Many-Objective Optimization (MaOO) problems are challenging in nature due to the requirement of large population size, difficulty in maintaining the selection pressure towards global optima and inability of accurate visualization of high-dimensional Pareto-optimal Set (in decision space) and Pareto-Front (in objective space). The quality of the estimated set of Pareto-optimal solutions, resulting from the EAs for MaOO problems, is assessed in terms of proximity to the true surface (convergence) and uniformity and coverage of the estimated set over the true surface (diversity). With more number of objectives, the challenges become more profound. Thus, better strategies have …


Has Winter Weather In Southwest Ohio Been Affected By The El Niño Southern Oscillation, The North Atlantic Oscillation, The Pacific Decadal Oscillation, And The Atlantic Multidecadal Oscillation?, John A. Blue Jan 2022

Has Winter Weather In Southwest Ohio Been Affected By The El Niño Southern Oscillation, The North Atlantic Oscillation, The Pacific Decadal Oscillation, And The Atlantic Multidecadal Oscillation?, John A. Blue

Browse all Theses and Dissertations

Winter temperature and precipitation in Southwest Ohio over the last century were examined for anomalies attributable to teleconnections with large-scale atmospheric perturbations caused by the El Niño Southern Oscillation (ENSO), the North Atlantic Oscillation (NAO), the Pacific Decadal Oscillation (PDO), and the Atlantic Multidecadal Oscillation (AMO). The record of temperature gives evidence of a teleconnection with the NAO, ENSO, and PDO, with the strongest link being for phases of the NAO. Most winters during positive NAO phases had mean monthly temperature warmer than the century long mean, and the majority of negative NAO phase winters had colder temperatures. The difference …


Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu Jan 2022

Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu

Honors Theses and Capstones

COVID-19 caused state and nation-wide lockdowns, which altered human foot traffic, especially in restaurants. The seafood sector in particular suffered greatly as there was an increase in illegal fishing, it is made up of perishable goods, it is seasonal in some places, and imports and exports were slowed. Foot traffic data is useful for business owners to have to know how much to order, how many employees to schedule, etc. One issue is that the data is very expensive, hard to get, and not available until months after it is recorded. Our goal is to not only find covariates that …


Using Deep Neural Networks To Analyze Precision Agriculture Data, Stephanie Liebl Jan 2022

Using Deep Neural Networks To Analyze Precision Agriculture Data, Stephanie Liebl

Electronic Theses and Dissertations

As the population of the Earth increases, there is a growing need for food to feed the inhabitants. Precision agriculture offers techniques and tools that can be used to help accommodate the growing population. One specific precision agriculture tool is remote sensing data, which can be used to image fields as an effort to better predict or understand the crops. In this thesis, deep neural networks are used to evaluate various spatial, spectral, and temporal resolutions of three different satellite images to determine which best predicts corn yield. The main metrics we used to evaluate the models were R-squared (R2), …


Mary Eleanor Spear's Importance To The History Of Statistical Visualization, Melanie Williams Jan 2022

Mary Eleanor Spear's Importance To The History Of Statistical Visualization, Melanie Williams

CMC Senior Theses

This paper will demonstrate why Mary Eleanor Spear (1897-1986) is an important figure in the history of statistical visualization. She lead an impressive career working in the federal government as a data analyst before "data analyst" became a thing. She wrote and illustrated two comprehensive textbooks which furthered the art of statistical visualization. Her textbooks cover extensive graphing knowledge still valuable to statisticians and viewers today. Most notable of her works is her development of the box plot. In addition to Spear's career and contributions, this paper will also address the lack of female representation in science, technology, engineering, and …


Mixed-Effects Regression Models For Analyzing Data With Excess Zeros, Guangyu Xu Jan 2022

Mixed-Effects Regression Models For Analyzing Data With Excess Zeros, Guangyu Xu

Browse all Theses and Dissertations

Data with excess zeros are common in many applications. Failure to account for the extra zeros may lead to biased estimates and misleading inference when analyzing this data type. To investigate the association between a set of predictor variables and an outcome variable with excess zeros, two kinds of regression models, the hurdle model and the zero-inflated model, are commonly used. In these models, the traditional tests, such as the likelihood-ratio test and the Wald test, can only be used to detect the fixed effects in association analysis. Recently, several random-effects or mixed-effects tests have been proposed for association analysis, …


Statistical Theory For Specialized Linear Regression Adjustment Methods Compared To Multiple Linear Regression In The Presence And Absence Of Interaction Effects, Leon Su Jan 2022

Statistical Theory For Specialized Linear Regression Adjustment Methods Compared To Multiple Linear Regression In The Presence And Absence Of Interaction Effects, Leon Su

Theses and Dissertations--Statistics

When building models to investigate outcomes and variables of interest, researchers often want to adjust for other variables. There is a variety of ways that these adjustments are performed. In this work, we will consider four approaches to adjustment utilized by researchers in various fields. We will compare the efficacy of these methods to what we call the ”true model method”, fitting a multiple linear regression model in which adjustment variables are model covariates. Our goal is to show that these adjustment methods have inferior performance to the true model method by comparing model parameter estimates, power, type I error, …


Analyzing Marriage Statistics As Recorded In The Journal Of The American Statistical Association From 1889 To 2012, Annalee Soohoo Jan 2022

Analyzing Marriage Statistics As Recorded In The Journal Of The American Statistical Association From 1889 To 2012, Annalee Soohoo

CMC Senior Theses

The United States has been tracking American marriage statistics since its founding. According to the United States Census Bureau, “marital status and marital history data help federal agencies understand marriage trends, forecast future needs of programs that have spousal benefits, and measure the effects of policies and programs that focus on the well-being of families, including tax policies and financial assistance programs.”[1] With such a wide scope of applications, it is understandable why marriage statistics are so highly studied and well-documented.

This thesis will analyze American marriage patterns over the past 100 years as documented in the Journal of …


A Monte Carlo Simulation Of Rat Choice Behavior With Interdependent Outcomes, Michelle A. Frankot Jan 2022

A Monte Carlo Simulation Of Rat Choice Behavior With Interdependent Outcomes, Michelle A. Frankot

Graduate Theses, Dissertations, and Problem Reports

Preclinical behavioral neuroscience often uses choice paradigms to capture psychiatric symptoms. In particular, the subfield of operant research produces nested datasets with many discrete choices in a session. The standard analytic practice is to aggregate choice into a continuous variable and analyze using ANOVA or linear regression. However, choice data often have multiple interdependent outcomes of interest, violating an assumption of general linear models. The aim of the current study was to quantify the accuracy of linear mixed-effects regression (LMER) for analyzing data from a 4-choice operant task called the Rodent Gambling Task (RGT), which measures decision-making in the context …


A Brief Treatise On Bayesian Inverse Regression., Debashis Chatterjee Dr. Dec 2021

A Brief Treatise On Bayesian Inverse Regression., Debashis Chatterjee Dr.

Doctoral Theses

Inverse problems, where in a broad sense the task is to learn from the noisy response about some unknown function, usually represented as the argument of some known functional form, has received wide attention in the general scientific disciplines. However, apart from the class of traditional inverse problems, there exists another class of inverse problems, which qualify as more authentic class of inverse problems, but unfortunately did not receive as much attention.In a nutshell, the other class of inverse problems can be described as the problem of predicting the covariates corresponding to given responses and the rest of the data. …


Exploring Improvements To The Convergence Of Reconstructing Historical Destructive Earthquakes, Kameron Lightheart Nov 2021

Exploring Improvements To The Convergence Of Reconstructing Historical Destructive Earthquakes, Kameron Lightheart

Theses and Dissertations

Determining risk to human populations due to natural disasters has been a topic of interest in the STEM fields for centuries. Earthquakes and the tsunamis they cause are of particular interest due to their repetition cycles. These cycles can last hundreds of years but we have only had modern measuring instruments for the last century or so which makes analysis difficult. In this document, we explore ways to improve upon an existing method for reconstructing earthquakes from historical accounts of tsunamis. This method was designed and implemented by Jared P Whitehead's research group over the last 5 years. The issue …


Some Nonparametric Hybrid Predictive Models : Asymptotic Properties And Applications., Tanujit Chakraborty Dr. Nov 2021

Some Nonparametric Hybrid Predictive Models : Asymptotic Properties And Applications., Tanujit Chakraborty Dr.

Doctoral Theses

Prediction problems like classification, regression, and time series forecasting have always attracted both the statisticians and computer scientists worldwide to take up the challenges of data science and implementation of complicated models using modern computing facilities. But most traditional statistical and machine learning models assume the available data to be well-behaved in terms of the presence of a full set of essential features, equal size of classes, and stationary data structures in all data instances, etc. Practical data sets from the domain of business analytics, process and quality control, software reliability, and macroeconomics, to name a few, suffer from various …


Trade Bait: Season 3, Ben Bagley Oct 2021

Trade Bait: Season 3, Ben Bagley

WWU Honors College Senior Projects

A 5-episode podcast series dissecting the use of statistics in the NFL and NFL Media


The Classification Of Basket Neural Cells In The Mammalian Neocortex, Sreya Pudi Oct 2021

The Classification Of Basket Neural Cells In The Mammalian Neocortex, Sreya Pudi

Senior Theses

Basket neuronal cells of the mammalian neocortex have been classically categorized into two or more groups. Originally, it was thought that the large and small types are the naturally occurring groups that emerge from reasons that relate to neurobiological function and anatomical position. Later, a study based on anatomical and physiological features of these neurons introduced a third type, the net basket cell which is intermediate in size as compared to the large and small types. In this study, multivariate analysis was used to test the hypothesis that the large and small types are morphologically distinct groups. The results of …


An Introduction To Calling Bullshit: Learning To Think Outside The Black Box, Jevin D. West, Carl T. Bergstrom Aug 2021

An Introduction To Calling Bullshit: Learning To Think Outside The Black Box, Jevin D. West, Carl T. Bergstrom

Numeracy

Bergstrom, Carl T. and Jevin D. West. 2020. Calling Bullshit: The Art of Skepticism in a Data-Driven World. (New York: Random House) 336 pp. ISBN 978-0525509202.

While statistical methods receive greater attention, the art of critically evaluating information in everyday life more commonly depends on thinking outside the black box of the algorithm. In this piece we introduce readers to our book and associated online teaching materials—for readers who want to more capably call “bullshit” or to teach their students to do the same.


The Uncertainty Of Confidence, Michael J. Leach Jul 2021

The Uncertainty Of Confidence, Michael J. Leach

Journal of Humanistic Mathematics

This is a free-verse poem about the estimation of population parameters in statistical models. The spacing of words is intended to reflect uncertainty.


Lab Exercises For Statistics Using Excel, Julia Nebia, Steven Cosares, Milena Cuellar Jul 2021

Lab Exercises For Statistics Using Excel, Julia Nebia, Steven Cosares, Milena Cuellar

Open Educational Resources

This document contains the text associated with a series of computer-based lab exercises to help students apply the concepts usually included in a first course in Statistics. A compressed file has been included that contains a separate folder for each lab. In each folder is an excel spreadsheet file and an editable word document providing the instructions for students to complete the exercise. The exercises are not numbered in the folders, so you can select any subset of these exercises to assign to your students. You are free to modify the instructions in any way you see fit, e.g., to …


A Review Of Logistic Regression And Its Application, Sultana Mubarika Rahman Chowdhury Jun 2021

A Review Of Logistic Regression And Its Application, Sultana Mubarika Rahman Chowdhury

FIU Electronic Theses and Dissertations

The purpose of this thesis is to do an in-depth review of logistic regression and its application. Additionally, comparison of four different methods of coefficient standardization was done using Heart Disease Dataset. These methods were compared based on testing accuracy, training accuracy, area under the curve, sensitivity, and specificity. Furthermore, logistic regression analysis was applied to National Longitudinal Study of Adolescence Health Survey (Add health) dataset to examine the relationship between anxiety or panic disorder and history of childhood maltreatment, medical conditions such as ADHD, PTSD, some socio-economic conditions and addiction. Results indicated; history of abuse has a significant effect …


Data Analysis And Visualization To Dismantle Gender Discrimination In The Field Of Technology, Quinn Bolewicki Jun 2021

Data Analysis And Visualization To Dismantle Gender Discrimination In The Field Of Technology, Quinn Bolewicki

Dissertations, Theses, and Capstone Projects

In the United States, a significant population is facing an uphill battle trying to thrive in an industry that has seen exponential growth in recent years. Women, who account for approximately 50.8% of the U.S. population are statistically underpaid and underrepresented in science, technology, engineering, and mathematics (STEM). Despite women-led technology teams establishing a 21% greater return on investment than teams who don’t, and young women largely outperforming men in math according to a 2015 study, there are only three fortune 500 companies led by women, and they comprise only 10% of internet entrepreneurs. Research generates hundreds of articles, infographics, …