Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 685

Full-Text Articles in Physical Sciences and Mathematics

The Impact Of “Multiple Looks” When Performing Survival Analysis, Quentin Eloise Aug 2024

The Impact Of “Multiple Looks” When Performing Survival Analysis, Quentin Eloise

Electronic Theses and Dissertations

Survival analysis is a critical statistical method in healthcare to assess patient treatment effects and disease progression. Another critical area of statistical methodology in health care is the practice of adaptive designs. Adaptive designs allow for interim analyses to take place during a study and various decisions and actions can take place more ethically. This is beneficial for studies that take multiple years to complete and allows administrators and healthcare providers to make sound decisions as early as possible. A challenging aspect of adaptive designs is that the number of interim analyses is known in advance which is applicable in …


Bayesian Variational Inference In Keyword Identification And Multiple Instance Classification, Yaofang Hu Aug 2024

Bayesian Variational Inference In Keyword Identification And Multiple Instance Classification, Yaofang Hu

Statistical Science Theses and Dissertations

This dissertation investigates (1) Variational Bayesian Semi-supervised Keyword Extraction and (2) Variational Bayesian Multimodal Multiple Instance Classification.

The expansion of textual data, stemming from various sources such as online product reviews and scholarly publications on scientific discoveries, has created a demand for the extraction of succinct yet comprehensive information. As a result, in recent years, efforts have been spent in developing novel methodologies for keyword extraction. Although many methods have been proposed to automatically extract keywords in the contexts of both unsupervised and fully supervised learning, how to effectively use partially observed keywords, such as author-specified keywords, remains an under-explored …


Exploring Healthcare Chatbot Information Presentation: Applying Hierarchical Bayesian Regression And Inductive Thematic Analysis In A Mixed Methods Study, Samuel Nelson Koscelny Aug 2024

Exploring Healthcare Chatbot Information Presentation: Applying Hierarchical Bayesian Regression And Inductive Thematic Analysis In A Mixed Methods Study, Samuel Nelson Koscelny

All Theses

High blood pressure, also known as hypertension, significantly increases the risk of heart disease and stroke, which are leading causes of death in the United States. While contributing to over 691,000 deaths in 2021 alone in the United States (U.S.), it also imposes immense economic burden on the healthcare system, costing approximately $131 billion annually. One way to address this issue is for increased self-care behaviors and medication adherence, both of which require sufficient health literacy. Despite the importance of health literacy, 90% of U.S. adults struggle with health-related subjects. Overcoming the issues associated with health literacy requires addressing the …


Oh Statistics!, Heather L. Cook Jul 2024

Oh Statistics!, Heather L. Cook

Journal of Humanistic Mathematics

This poem was written about statistics and the usefulness thereof.


Book Review: How To Expect The Unexpected: The Science Of Making Predictions -- And The Art Of Knowing When Not To By Kit Yates, Mark Huber Jul 2024

Book Review: How To Expect The Unexpected: The Science Of Making Predictions -- And The Art Of Knowing When Not To By Kit Yates, Mark Huber

Journal of Humanistic Mathematics

Humans think about the future all the time. Prediction is a part of how we prepare for the coming of both good and bad events in our lives. Kit Yates' book, How to expect the unexpected, concentrates primarily on the question of why prediction is difficult, and what mental shortcuts people take in prediction that can lead to incorrect results. Unfortunately, a lack of concern for details and several omissions undermine the quality of the book.


Heavy Metals Implications To Sediment Microbiome And Coral Response To Arsenic Dosing, Dimitrios G. Giarikos, Amy Hirons, Jose V. Lopez, Abigail Renegar, Jason Gershman Jun 2024

Heavy Metals Implications To Sediment Microbiome And Coral Response To Arsenic Dosing, Dimitrios G. Giarikos, Amy Hirons, Jose V. Lopez, Abigail Renegar, Jason Gershman

SECLER Data

No abstract provided.


Learning Statistics With R: A Tutorial For Psychology Students And Other Beginners, Leslie Bain Jun 2024

Learning Statistics With R: A Tutorial For Psychology Students And Other Beginners, Leslie Bain

ATU Faculty OER Book Reviews

Review of OER Statistics textbook by Danielle Navarro, available at https://open.umn.edu/opentextbooks/textbooks/learning-statistics-with-r-a-tutorial-for-psychology-students-and-other-beginners


Introduction To Statistical Thinking, Leslie Bain Jun 2024

Introduction To Statistical Thinking, Leslie Bain

ATU Faculty OER Book Reviews

Review of OER Statistics textbook by Benjamin Yakir, available at https://open.umn.edu/opentextbooks/textbooks/introduction-to-statistical-thinking


The Impact Of Video Assistant Referee (Var) On The English Premier League, Jack Kenyon Brown Jun 2024

The Impact Of Video Assistant Referee (Var) On The English Premier League, Jack Kenyon Brown

Master's Theses

The aim of this study is to examine how the introduction of the Video Assisted Referee (VAR) system influenced the English Premier League (EPL). Since its implementation in the English Premier League in 2019, VAR has been a constant source of debate and controversy. Many studies have been done on the immediate impact of VAR on other elite professional soccer leagues, but the scope of results is very limited and due to be updated. The data for the ensuing analysis consists of 3800 matches played in the English Premier League during the five seasons before (14/15, 15/16, 16/17, 17/18, and …


Recursive Marix Game Analysis: Optimal, Simplified, And Human Strategies In Brave Rats, William A. Medwid Jun 2024

Recursive Marix Game Analysis: Optimal, Simplified, And Human Strategies In Brave Rats, William A. Medwid

Master's Theses

Brave Rats is a short game with simple rules, yet establishing a comprehensive strategy is very challenging without extensive computation. After explaining the rules, this paper begins by calculating the optimal strategy by recursively solving each turn’s Minimax strategy. It then provides summary statistics about the complex, branching Minimax solution. Next, we examine six other strategy models and evaluate their performance against each other. These models’ flaws highlight the key elements that contribute to the effectiveness of the Minimax strategy and offer insight into simpler strategies that human players could mimic. Finally, we analyze 123 games of human data collected …


Unraveling The History Of Deforestation In The Amazon Rainforest With Statistical Modeling, Ryan Destefano Jun 2024

Unraveling The History Of Deforestation In The Amazon Rainforest With Statistical Modeling, Ryan Destefano

Master's Theses

The Amazon rainforest, a vital ecosystem of immense biodiversity and global climate significance, faces the ongoing threat of deforestation driven by agricultural expansion. This thesis employs remote sensing techniques, focusing on the Enhanced Vegetation Index (EVI) derived from Landsat satellite imagery, to track land cover dynamics within the Amazon. The study examines historical land cover changes in current plantations in Peru and Brazil, regions where the exact timing of deforestation is uncertain. By analyzing EVI measurements dating back to 1984, inflection points indicative of deforestation events preceding plantation establishment are identified. Statistical modeling techniques, including spline fitting to analyze time …


Descriptions Of Interglacial Mastodons From Snowmass, Colorado, Connor White May 2024

Descriptions Of Interglacial Mastodons From Snowmass, Colorado, Connor White

Electronic Theses and Dissertations

The Ziegler Reservoir fossil site (ZRFS) in Colorado contains over 4000 mastodon bones that date from 140,000 to 100,000 years ago. At an elevation of ~2705 meters above sea level, ZRFS represents an alpine ecosystem dated to Marine Isotope Stage (MIS) 5. Formal descriptions of cheek teeth, mandibles, crania, and femora were completed. Statistical analyses of the upper and lower third molars, including a novel measurement of interloph(id) distances, indicate significant differences between ZRFS mastodons and Mammut pacificus, while falling within the ranges for Mammut americanum. This study agrees with the taxonomic assignment of ZRFS mastodons to Mammut …


A Statistical Look Into How Common Soccer Metrics Influence Expected Goal Measures In The Professional Game, Tristan George Rumsey May 2024

A Statistical Look Into How Common Soccer Metrics Influence Expected Goal Measures In The Professional Game, Tristan George Rumsey

Undergraduate Honors Thesis Collection

The advent of sports analytics has ignited a fervor across all sporting disciplines, particularly soccer, where clubs are sprinting to harness vast data reserves to elevate team performance, spearhead effective marketing endeavors, and bolster financial gains crucial for club expansion. Much like Billy Beane's transformative "Moneyball" approach, soccer clubs are in pursuit of innovative strategies to transcend financial limitations and achieve triumph. In soccer, where goals are scarce commodities, heightened offensive efficacy becomes imperative. Presently, one metric stands out as pivotal in gauging a team's goal-scoring success: expected goals (xG). This metric quantifies the likelihood of a given shot or …


Assessing Extant Methods For Generating G-Optimal Designs And A Novel Methodology To Compute The G-Score Of A Candidate Design, Hyrum John Hansen May 2024

Assessing Extant Methods For Generating G-Optimal Designs And A Novel Methodology To Compute The G-Score Of A Candidate Design, Hyrum John Hansen

All Graduate Theses and Dissertations, Fall 2023 to Present

Experimental designs are used by scientists to allocate treatments such that statistical inference is appropriate. Most traditional experimental designs have mathematical properties that make them desirable under certain conditions. Optimal experimental designs are those where the researcher can exercise total control over the treatment levels to maximize a chosen mathematical property. As is common in literature, the experimental design is represented as a matrix where each column represents a variable, and each row represents a trial. We define a function that takes as input the design matrix and outputs its score. We then algorithmically adjust each entry until a design …


A Survey Of The Murray State University Csis Department Of Student And Instructor Attitudes In Relation To Earlier Introduction Of Version Control Systems, Gavin Johnson Apr 2024

A Survey Of The Murray State University Csis Department Of Student And Instructor Attitudes In Relation To Earlier Introduction Of Version Control Systems, Gavin Johnson

Honors College Theses

Over the previous 20 years, the software development industry has overseen an evolution in application of Version Control Systems (VCS) from a Centralized Version Control System (CVCS) format to a Decentralized Version Control Format (DVCS). Examples of the former include Perforce and Subversion whilst the latter of the two include Github and BitBucket. As DVCS models allow software contributors to maintain their respective local repositories of relevant code bases, developers are able to work offline and maintain their work with relative fault tolerance. This contrasts to CVCS models, which require software contributors to be connected online to a main server. …


"Who Wrote The Epistle, God Only Knows": A Statistical Authorial Analysis Of Hebrews In Comparison With Pauline And Lukan Literature, Benjamin J. Erickson Apr 2024

"Who Wrote The Epistle, God Only Knows": A Statistical Authorial Analysis Of Hebrews In Comparison With Pauline And Lukan Literature, Benjamin J. Erickson

Senior Honors Theses

The authorship of Hebrews has been a point of contention for scholars for the past two millennia. While the epistle is traditionally attributed to Paul, many scholars assert that it carries thematic, structural, and stylistic differences from the remainder of his extant epistles; therefore, many other possible authors have been proposed. Of these, only Luke has other New Testament writings. Therefore, this project conducts a statistical comparison of Hebrews to the Pauline and Lukan corpora using stylometric authorial analysis methods. This analysis demonstrates that Hebrews is stylistically closer to Lukan literature than Pauline (but not to a significant degree), and …


Identifying Rural Health Clinics Within The Transformed Medicaid Statistical Information System (T-Msis) Analytic Files, Katherine Ahrens Mph, Phd, Zachariah Croll, Yvonne Jonk Phd, John Gale Ms, Heidi O'Connor Ms Mar 2024

Identifying Rural Health Clinics Within The Transformed Medicaid Statistical Information System (T-Msis) Analytic Files, Katherine Ahrens Mph, Phd, Zachariah Croll, Yvonne Jonk Phd, John Gale Ms, Heidi O'Connor Ms

Rural Health Clinics

Researchers at the Maine Rural Health Research Center describe a methodology for identifying Rural Health Clinic encounters within the Medicaid claims data using Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files.

Background: There is limited information on the extent to which Rural Health Clinics (RHC) provide pediatric and pregnancy-related services to individuals enrolled in state Medicaid/CHIP programs. In part this is because methods to identify RHC encounters within Medicaid claims data are outdated.

Methods: We used a 100% sample of the 2018 Medicaid Demographic and Eligibility and Other Services Transformed Medicaid Statistical Information System (T-MSIS) Analytic Files for 20 states …


Hockey Card Statistics Are Stagnant And Stale, Egan J. Chernoff Jan 2024

Hockey Card Statistics Are Stagnant And Stale, Egan J. Chernoff

Journal of Humanistic Mathematics

The purchase of a coffee at a Canadian institution, Tim Hortons, turned into an informal investigation into hockey card statistics. Turns out, hockey card statistics are stagnant and stale. This was disappointing to see because the game of hockey has changed, the statistics used to keep track of the game have changed. Even the cards have changed. Well, not the back of the cards, which do not well enough paint a statistical picture of the hockey player photographed on the front of the card.


The Limits Of Data Science, David E. Drew Jan 2024

The Limits Of Data Science, David E. Drew

Journal of Humanistic Mathematics

Data science can contribute valuable predictions in diverse fields. But I write to express some concerns and red flags. I suggest that data science is being oversold. This article contains three questions that I believe data science must address as this new discipline matures. Is data science significantly different from statistics? This is a question that has haunted the field since the term first was introduced. By creating algorithms based on current societal decision rules that may be biased, even bigoted, does data science lock in and exacerbate inequality? Scholars have identified a continuum from data to information to knowledge …


Defensive Impact Wins: Developing A New Method To Rate Individual Defense In Nba Games, Dylan J. Stiles Jan 2024

Defensive Impact Wins: Developing A New Method To Rate Individual Defense In Nba Games, Dylan J. Stiles

Honors Theses and Capstones

With the analytics revolution in sports in the past 20 years, it seems that everything that can be quantified is. In basketball though, trying to break the game down into a set of numbers comes with a unique problem. While we've come up with a good set of advanced numbers to measure offensive efficiency, defense is fundamentally harder to quantify. The game is played five on five, but it has often been popular or convenient to model defense as a set of five one on one games. As defenses became more complex into the 2010s, this methodology became more insignificant. …


Ensemble Classification: An Analysis Of The Random Forest Model, Jarod Korn Jan 2024

Ensemble Classification: An Analysis Of The Random Forest Model, Jarod Korn

Williams Honors College, Honors Research Projects

The random forest model proposed by Dr. Leo Breiman in 2001 is an ensemble machine learning method for classification prediction and regression. In the following paper, we will conduct an analysis on the random forest model with a focus on how the model works, how it is applied in software, and how it performs on a set of data. To fully understand the model, we will introduce the concept of decision trees, give a summary of the CART model, explain in detail how the random forest model operates, discuss how the model is implemented in software, demonstrate the model by …


Statistically Principled Deep Learning For Sar Image Segmentation, Cassandra Goldberg Jan 2024

Statistically Principled Deep Learning For Sar Image Segmentation, Cassandra Goldberg

Honors Projects

This project explores novel approaches for Synthetic Aperture Radar (SAR) image segmentation that integrate established statistical properties of SAR into deep learning models. First, Perlin Noise and Generalized Gamma distribution sampling methods were utilized to generate a synthetic dataset that effectively captures the statistical attributes of SAR data. Subsequently, deep learning segmentation architectures were developed that utilize average pooling and 1x1 convolutions to perform statistical moment computations. Finally, supervised and unsupervised disparity-based losses were incorporated into model training. The experimental outcomes yielded promising results: the synthetic dataset effectively trained deep learning models for real SAR data segmentation, the statistically-informed architectures …


Radiofrequency Interference Detection Using Lstmand Statistical Analysis Discriminator, Luke Smith Jan 2024

Radiofrequency Interference Detection Using Lstmand Statistical Analysis Discriminator, Luke Smith

Masters Theses

"Wireless devices are becoming increasingly pervasive across all aspects of society. Examples of such devices include radios, routers, mobile phones, tablets, and more. As the number of radio frequency (RF) devices continues to rise, so does the amount of interference and noise increase. This is why an efficient approach to interference detection is explored. Most research within this area has been done strictly within the frequency domain as viewing a signal within this domain provides many insights into what makes the signal. This has, however, led to the time domain being underutilized for this area of research.

To explore the …


Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang Dec 2023

Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang

Statistical Science Theses and Dissertations

In this study, our main objective is to tackle the black-box nature of popular machine learning models in sentiment analysis and enhance model interpretability. We aim to gain more insight into the decision-making process of sentiment analysis models, which is often obscure in those complex models. To achieve this goal, we introduce two word-level sentiment analysis models.

The first model is called the attention-based multiple instance classification (AMIC) model. It combines the transparent model structure of multiple instance classification and the self-attention mechanism in deep learning to incorporate the contextual information from documents. As demonstrated by a wine review dataset …


Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre Dec 2023

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre

SMU Data Science Review

Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …


A Prompt Engineering Approach To Creating Automated Commentary For Microsoft Self-Help Documentation Metric Reports Using Chatgpt, Ryan Herrin, Luke Stodgel, Brian Raffety Dec 2023

A Prompt Engineering Approach To Creating Automated Commentary For Microsoft Self-Help Documentation Metric Reports Using Chatgpt, Ryan Herrin, Luke Stodgel, Brian Raffety

SMU Data Science Review

Microsoft collects an immense amount of data from the users of their product-self-help documentation. Employees use this data to identify these self-help articles' performance trends and measure their impact on business Key Performance Indicators (KPIs). Microsoft uses various tools like Power BI and Python to analyze this data. The problem is that their analysis and findings are summarized manually. Therefore, this research will improve upon their current analysis methods by applying the latest prompt engineering practices and the power of ChatGPT's large language models (LLMs). Using VBA code, Microsoft Excel, and the ChatGPT API as an Excel add-in, this research …


Gastropod Evolutionary Phylogeny, Priscilla Doran, Neal A. Doran Dec 2023

Gastropod Evolutionary Phylogeny, Priscilla Doran, Neal A. Doran

Proceedings of the International Conference on Creationism

This research seeks to investigate a correlation between the first appearance order date (FAD) and predicted evolutionary phylogeny of gastropods. Using a Spearman Correlation, 17 data sets of gastropods were analyzed, with a no significant correlation found between the first appearance date and predicted evolutionary date for the fossils.


Translation Speed Influence On Tropical Cyclone Storm Tide And Surge Generation Along The Gulf Of Mexico Coast, Samantha L. Camarda Nov 2023

Translation Speed Influence On Tropical Cyclone Storm Tide And Surge Generation Along The Gulf Of Mexico Coast, Samantha L. Camarda

LSU Master's Theses

This research examines tropical cyclone translation speed as a factor in storm tide and surge height upon landfall on the United States Gulf Coast. Understanding the effect of translation speed on peak storm tide/surge height is needed to better prepare for and predict future damage from tropical cyclone events. Tropical cyclone data are taken from hourly interpolated best-track HURDAT2 data from 1970–2021. This study uses the HURDAT2 hourly interpolated observation data points (24-hours) pre-landfall to landfall. Translation speed is calculated based on the distance traversed between hourly points. Peak storm tide and storm surge data are taken from SURGEDAT from …


Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang Oct 2023

Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang

Statistical Science Theses and Dissertations

Spatially resolved transcriptomics (SRT) quantifies expression levels at different spatial locations, providing a new and powerful tool to investigate novel biological insights. As experimental technologies enhance both in capacity and efficiency, there arises a growing demand for the development of analytical methodologies.

One question in SRT data analysis is to identify genes whose expressions exhibit spatially correlated patterns, called spatially variable (SV) genes. Most current methods to identify SV genes are built upon the geostatistical model with Gaussian process, which could limit the models' ability to identify complex spatial patterns. In order to overcome this challenge and capture more types …


Reu-Deim Classification Of Hispanic Voters In Hispanic Groups Using Name And Zip Code Data In Palm Beach, Florida, Kamila Soto-Ortiz Sep 2023

Reu-Deim Classification Of Hispanic Voters In Hispanic Groups Using Name And Zip Code Data In Palm Beach, Florida, Kamila Soto-Ortiz

Beyond: Undergraduate Research Journal

When it comes to registering to vote, Hispanic voters can only register as “Hispanic” in the “Race/Ethnicity” category, causing difficulties when analyzing voting trends amongst the Hispanic community. Upon the recent idea that not all Hispanic Groups vote the same, the goal is to create a model that can possibly identify a voter’s Hispanic Group with the information provided on the public Florida voter file. This is accomplished using name and zip code data for all voters in Palm Beach, Florida. This paper will explore the model implemented, its findings and limitations. Palm Beach, Florida, is met with low confidence …