Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 91 - 120 of 685

Full-Text Articles in Physical Sciences and Mathematics

Compare And Contrast Maximum Likelihood Method And Inverse Probability Weighting Method In Missing Data Analysis, Scott Sun May 2021

Compare And Contrast Maximum Likelihood Method And Inverse Probability Weighting Method In Missing Data Analysis, Scott Sun

Mathematical Sciences Technical Reports (MSTR)

Data can be lost for different reasons, but sometimes the missingness is a part of the data collection process. Unbiased and efficient estimation of the parameters governing the response mean model requires the missing data to be appropriately addressed. This paper compares and contrasts the Maximum Likelihood and Inverse Probability Weighting estimators in an Outcome-Dependendent Sampling design that deliberately generates incomplete observations. WE demonstrate the comparison through numerical simulations under varied conditions: different coefficient of determination, and whether or not the mean model is misspecified.


We’Re Here To Get You There: A Statistical Analysis Of Bridgewater State University’S Transit System, Abigail Adams May 2021

We’Re Here To Get You There: A Statistical Analysis Of Bridgewater State University’S Transit System, Abigail Adams

Honors Program Theses and Projects

Bridgewater State University first established its on-campus transportation service in January of 1984. While it began only running as an on-campus service for students throughout the day, the service grew to expand by offering an off-campus connection to the neighboring city of Brockton and absorbed the night service system from the campus safety team. As BSU Transit continues to grow, the organization is seeking ways to improve their overall service and better prepare their fleet and driver pool to accommodate this growth. The purpose of this research is to analyze trends among the data collected by BSU Transit and assist …


Guidelines For Regression Analysis In Sas And R: A Case Study, Sarah Milligan May 2021

Guidelines For Regression Analysis In Sas And R: A Case Study, Sarah Milligan

Honors Program Theses and Projects

When a player is a free agent, an individual who is able to sign to any team, one wonders what their best option is. Will signing with Team A or Team B provide them with the largest salary? What factors will affect their salary the most? Does last year’s statistics have a strong impact on next year’s salary? These questions can be answered by performing a regression analysis on previous years data. The primary focus of this project is to determine the most important variables related to an NBA salary. Likewise, the statistical programs SAS and R will be compared …


A Study On Differing Generational Values And Expectations In Corporate America, Abigail Grella May 2021

A Study On Differing Generational Values And Expectations In Corporate America, Abigail Grella

Honors Program Theses and Projects

This paper examines the most common factors that lead to voluntary employee turnover, and the implications employee turnover has on an organization. Additionally, this paper will consider the varying values and workplace expectations of different demographic groups such as Millennials, Generation X, Generation Y, and Baby Boomers and how such factors could influence voluntary turnover. A study is conducted from survey results gathered across a large span of generations that are currently employed. Using statistical analysis employing t-tests and a Mood’s Median test, the results show that different generations have differently weighing values for specific organizational offerings. The results show …


Statistical Analysis Of 2017-18 Premier League Match Statistics Using A Regression Analysis In R, Bergen Campbell May 2021

Statistical Analysis Of 2017-18 Premier League Match Statistics Using A Regression Analysis In R, Bergen Campbell

Undergraduate Theses and Capstone Projects

This thesis analyzes the correlation between a team’s statistics and the success of their performances, and develops a predictive model that can be used to forecast final season results for that team. Data from the 2017-2018 Premier League season is to be gathered and broken down within R to highlight what factors and variables are largely contributing to the success or downfall of a team. A multiple linear regression model and stepwise selection process is then used to include any factors that are significant in predicting in match results.

The predictions about the 17-18 season results based on the model …


Machine Learning With Topological Data Analysis, Ephraim Robert Love May 2021

Machine Learning With Topological Data Analysis, Ephraim Robert Love

Doctoral Dissertations

Topological Data Analysis (TDA) is a relatively new focus in the fields of statistics and machine learning. Methods of exploiting the geometry of data, such as clustering, have proven theoretically and empirically invaluable. TDA provides a general framework within which to study topological invariants (shapes) of data, which are more robust to noise and can recover information on higher dimensional features than immediately apparent in the data. A common tool for conducting TDA is persistence homology, which measures the significance of these invariants. Persistence homology has prominent realizations in methods of data visualization, statistics and machine learning. Extending ML with …


The Effect Of Initial Conditions On The Weather Research And Forecasting Model, Aaron D. Baker May 2021

The Effect Of Initial Conditions On The Weather Research And Forecasting Model, Aaron D. Baker

Electronic Theses and Dissertations

Modeling our atmosphere and determining forecasts using numerical methods has been a challenge since the early 20th Century. Most models use a complex dynamical system of equations that prove difficult to solve by hand as they are chaotic by nature. When computer systems became more widely adopted and available, approximating the solution of these equations, numerically, became easier as computational power increased. This advancement in computing has caused numerous weather models to be created and implemented across the world. However a challenge of approximating these solutions accurately still exists as each model have varying set of equations and variables to …


How Risk-Related Statistics, As Reported In News And Social Media, Are Linked To The Use Of The Public Transit System, Prashiddhi Pokhrel Apr 2021

How Risk-Related Statistics, As Reported In News And Social Media, Are Linked To The Use Of The Public Transit System, Prashiddhi Pokhrel

Thinking Matters Symposium

Due to the pandemic, people have started relying more on televisions, news, social media, and other news outlets for guidance. Moreover, with the increasing amount of news, data, and information there is also an increase in the amount of misleading statistics. People’s opinions and decisions significantly depend on the data, statistics, and information that they are exposed to, as well as their sources. For this project, we want to look at how information and its sources are affecting the decision made by the general public for the usage of the Portland Transit System. It is very important to know why …


On Tests Of Independence Among Multiplerandom Vectors Of Arbitrary Dimensions., Angshuman Roy Dr. Apr 2021

On Tests Of Independence Among Multiplerandom Vectors Of Arbitrary Dimensions., Angshuman Roy Dr.

Doctoral Theses

Measures of dependence among several random vectors and associated tests of independence play a major role in different statistical applications. Blind source separation or independent component analysis (see, e.g., Hyv¨arinen et al., 2001; Shen et al., 2009), feature selection and feature extraction (see, e.g., Li et al., 2012), detection of serial correlation in time series (see, e.g., Ghoudi et al., 2001) and finding the causal relationships among the variables (see, e.g., Chakraborty and Zhang, 2019) are some examples of their wide-spread applications. Tests of independence has vast applications in other areas of sciences as well. For instance, to characterize the …


Does Defense Actually Win Championships? Using Statistics To Examine One Of The Greatest Stereotypes In Sports, Thomas Burkett Apr 2021

Does Defense Actually Win Championships? Using Statistics To Examine One Of The Greatest Stereotypes In Sports, Thomas Burkett

Senior Theses

A common saying in sports is that “defense wins championships.” However, the past decade of play in the modern NBA has seen a rise and focus in offensive efficiency and 3-pointers. This thesis tests whether defense can truly predict a championship winning team in today’s NBA through two-sample hypothesis testing and multiple logistic regression models. The results found that both defensive and offensive statistics were significant predictors of championship teams, meaning that a balanced team, rather than one specialized in defense alone, is a more accurate predictor of championship success.


The Wargaming Commodity Course Of Action Automated Analysis Method, William T. Deberry Mar 2021

The Wargaming Commodity Course Of Action Automated Analysis Method, William T. Deberry

Theses and Dissertations

This research presents the Wargaming Commodity Course of Action Automated Analysis Method (WCCAAM), a novel approach to assist wargame commanders in developing and analyzing courses of action (COAs) through semi-automation of the Military Decision Making Process (MDMP). MDMP is a seven-step iterative method that commanders and mission partners follow to build an operational course of action to achieve strategic objectives. MDMP requires time, resources, and coordination – all competing items the commander weighs to make the optimal decision. WCCAAM receives the MDMP's Mission Analysis phase as input, converts the wargame into a directed graph, processes a multi-commodity flow algorithm on …


Clustering Web Users By Mouse Movement To Detect Bots And Botnet Attacks, Justin L. Morgan Mar 2021

Clustering Web Users By Mouse Movement To Detect Bots And Botnet Attacks, Justin L. Morgan

Master's Theses

The need for website administrators to efficiently and accurately detect the presence of web bots has shown to be a challenging problem. As the sophistication of modern web bots increases, specifically their ability to more closely mimic the behavior of humans, web bot detection schemes are more quickly becoming obsolete by failing to maintain effectiveness. Though machine learning-based detection schemes have been a successful approach to recent implementations, web bots are able to apply similar machine learning tactics to mimic human users, thus bypassing such detection schemes. This work seeks to address the issue of machine learning based bots bypassing …


Essays In Social Choice Theory., Dipjyoti Majumdar Dr. Feb 2021

Essays In Social Choice Theory., Dipjyoti Majumdar Dr.

Doctoral Theses

The purpose of this thesis is to explore some issues in social choice theory and decision theory. Social choice theory provides the theoretical foundations for the field of public choice and welfare economics. It tries to bring together normative aspects like perspective value judgements and positive aspects, like strategic con- siderations. The second feature which is our focus, is closely related to the problem of providing appropriate incentives to agents, an issue of prime importance in eco- nomics.Consider for example, a set of agents who must elect one among a set of can- didates. These candidates may be physical agents …


Adventures In The "Islands" - Enhancing Student Engagement In Teaching Statistics, Leszek Gawarecki Feb 2021

Adventures In The "Islands" - Enhancing Student Engagement In Teaching Statistics, Leszek Gawarecki

Mathematics Presentations And Conference Materials

The factors for enhancing student engagement frequently identified are active and problem-based learning as well as real-life experience relevant to students' interests. The importance of using real data in teaching statistics has been repeatedly emphasized and its importance is growing. However, data collection, as part of a student project, faces serious practical problems. It is time-consuming, may require access to equipment, or raise ethical issues.


Machine Learning Morphisms: A Framework For Designing And Analyzing Machine Learning Work Ows, Applied To Separability, Error Bounds, And 30-Day Hospital Readmissions, Eric Zenon Cawi Jan 2021

Machine Learning Morphisms: A Framework For Designing And Analyzing Machine Learning Work Ows, Applied To Separability, Error Bounds, And 30-Day Hospital Readmissions, Eric Zenon Cawi

McKelvey School of Engineering Theses & Dissertations

A machine learning workflow is the sequence of tasks necessary to implement a machine learning application, including data collection, preprocessing, feature engineering, exploratory analysis, and model training/selection. In this dissertation we propose the Machine Learning Morphism (MLM) as a mathematical framework to describe the tasks in a workflow. The MLM is a tuple consisting of: Input Space, Output Space, Learning Morphism, Parameter Prior, Empirical Risk Function. This contains the information necessary to learn the parameters of the learning morphism, which represents a workflow task. In chapter 1, we give a short review of typical tasks present in a workflow, as …


Genetics Of Pediatric Musculoskeletal Disorders, Lilian Antunes Jan 2021

Genetics Of Pediatric Musculoskeletal Disorders, Lilian Antunes

Arts & Sciences Electronic Theses and Dissertations

Pediatric musculoskeletal disorders are an extremely broad category of diseases that are often inherited. While individually rare, collectively these disorders are common, affecting around 3% of live births in the US. Despite the mounting clinical and molecular evidence for a genetic etiology, the cause for many patients with pediatric musculoskeletal disorders remain largely unknown. Major challenges in rare pediatric diseases include recruiting large numbers of patients and determining the significance and functional impacts of variants associated with disease within individuals or families. Whole exome sequencing (WES) is a powerful tool to identify coding variants that are associated with rare pediatric …


Review Of Social Workers Count: Numbers And Social Issues By Michael Anthony Lewis, Michael T. Catalano Jan 2021

Review Of Social Workers Count: Numbers And Social Issues By Michael Anthony Lewis, Michael T. Catalano

Numeracy

Lewis, Michael Anthony. 2017. Social Workers Count: Numbers and Social Issues. 2019. New York: Oxford University Press. 223 pp. ISBN 978-019046713-5

The numeracy movement, although largely birthed within the mathematics community, is an outside-the-box endeavor which has always sought to break down or at least transgress traditional disciplinary boundaries. Michael Anthony Lewis’s book is a testament that this effort is succeeding. Lewis is a social worker and sociologist with an impressive resume, author of Economics for Social Workers, co-editor of The Ethics and Economics of the Basic Income Guarantee, and member of the faculty at the Silberman School …


Fourth Down Decision Making: Challenging The Conservative Nature Of Nfl Coaches, Will Palmquist, Ryan Elmore, Benjamin Williams Jan 2021

Fourth Down Decision Making: Challenging The Conservative Nature Of Nfl Coaches, Will Palmquist, Ryan Elmore, Benjamin Williams

DU Undergraduate Research Journal Archive

This thesis analyzes the hypothesis that coaches in the National Football League are often too conservative in their decision making on fourth downs. I used R Studio and NFL play-by-play data to simulate actual football plays and drives according to different fourth down strategies. By measuring expected points per drive over thousands of simulated drives, we are able to evaluate the effectiveness of different fourth down strategies. This research points to a number of conclusions regarding the nature of NFL coaches on fourth downs as well as the complexity of modeling and simulating decision making in a complex sport such …


Indispensable Statistics For The Behavioral Sciences ~With Spss 26, Howard Reid Ph.D. Jan 2021

Indispensable Statistics For The Behavioral Sciences ~With Spss 26, Howard Reid Ph.D.

Open Educational Resources (OER)

While there are many fine introductory statistics books, undergraduate students often continue to view statistics courses negatively. And many fear they will be unable to master the basic level of understanding that is essential to progress in their majors. The present text is an attempt to rethink what students majoring in the behavioral sciences absolutely must learn in an introductory statistics course and how best to organize the presentation of this material so they can succeed in their chosen field of study.

Every book is written from some perspective. The perspective of this book is that a first course in …


Power And Statistical Significance In Securities Fraud Litigation, Jill E. Fisch, Jonah B. Gelbach Jan 2021

Power And Statistical Significance In Securities Fraud Litigation, Jill E. Fisch, Jonah B. Gelbach

All Faculty Scholarship

Event studies, a half-century-old approach to measuring the effect of events on stock prices, are now ubiquitous in securities fraud litigation. In determining whether the event study demonstrates a price effect, expert witnesses typically base their conclusion on whether the results are statistically significant at the 95% confidence level, a threshold that is drawn from the academic literature. As a positive matter, this represents a disconnect with legal standards of proof. As a normative matter, it may reduce enforcement of fraud claims because litigation event studies typically involve quite low statistical power even for large-scale frauds.

This paper, written for …


The Combined Impact Of Continuous And Ordinal Auxiliary Variables On Missing Data Imputation In Sem, Salina Wu Whitaker Jan 2021

The Combined Impact Of Continuous And Ordinal Auxiliary Variables On Missing Data Imputation In Sem, Salina Wu Whitaker

Electronic Theses and Dissertations

“Modern” methods of addressing missing data using full-information maximum-likelihood (FIML) have become mainstays in SEM analyses. FIML allows the inclusion of auxiliary variables which carry information that is related to missing values and can reduce bias in parameter estimates. Past research has illustrated the benefits of auxiliary variable inclusion under different missingness conditions (MCAR and MNAR; e.g., Enders, 2008), missingness proportions (e.g., Collins et al., 2001), and although limited, missingness patterns (e.g., Yoo, 2009) in FIML analyses. While past studies have focused on the effects of either continuous or ordinal auxiliary variables, no study has included both types in their …


Assessing And Forecasting Chlorophyll Abundances In Minnesota Lake Using Remote Sensing And Statistical Approaches, Ben Von Korff Jan 2021

Assessing And Forecasting Chlorophyll Abundances In Minnesota Lake Using Remote Sensing And Statistical Approaches, Ben Von Korff

All Graduate Theses, Dissertations, and Other Capstone Projects

Harmful algae blooms (HABs) can negatively impact water quality, lake aesthetics, and can harm human and animal health. However, monitoring for HABs is rare in Minnesota. Detecting blooms which can vary spatially and may only be present briefly is challenging, so expanding monitoring in Minnesota would require the use of new and cost efficient technologies. Unmanned aerial vehicles (UAVs) were used for bloom mapping using RGB and near-infrared imagery. Real time monitoring was conducted in Bass Lake, in Faribault County, MN using trail cameras. Time series forecasting was conducted with high frequency chlorophyll-a data from a water quality sonde. Normalized …


Computational Simulation And Analysis Of Neuroplasticity, Madison E. Yancey Jan 2021

Computational Simulation And Analysis Of Neuroplasticity, Madison E. Yancey

Browse all Theses and Dissertations

Homeostatic synaptic plasticity is the process by which neurons alter their activity in response to changes in network activity. Neuroscientists attempting to understand homeostatic synaptic plasticity have developed three different mathematical methods to analyze collections of event recordings from neurons acting as a proxy for neuronal activity. These collections of events are from control data and treatment data, referring to the treatment of neuron cultures with pharmacological agents that augment or inhibit network activity. If the distribution of control events can be functionally mapped to the distribution of treatment events, a better understanding of the biological processes underlying homeostatic synaptic …


Statistical Modeling Of Hpc Performance Variability And Communication, Jered B. Dominguez-Trujillo Jan 2021

Statistical Modeling Of Hpc Performance Variability And Communication, Jered B. Dominguez-Trujillo

Computer Science ETDs

Understanding the performance of parallel and distributed programs remains a focal point in determining how compute systems can be optimized to achieve exascale performance. Lightweight, statistical models allow developers to both characterize and predict performance trade-offs, especially as HPC systems become more heterogeneous with many-core CPUs and GPUs. This thesis presents a lightweight, statistical modeling approach of performance variation which leverages extreme value theory by focusing on the maximum length of distributed workload intervals. This approach was implemented in MPI and evaluated on several HPC systems and workloads. I then present a performance model of partitioned communication which also uses …


Bickel-Rosenblatt Test Based On Tilted Estimation For Autoregressive Models & Deep Merged Survival Analysis On Cancer Study Using Multiple Types Of Bioinformatic Data, Yan Su Jan 2021

Bickel-Rosenblatt Test Based On Tilted Estimation For Autoregressive Models & Deep Merged Survival Analysis On Cancer Study Using Multiple Types Of Bioinformatic Data, Yan Su

Browse all Theses and Dissertations

This dissertation includes two topics, Bickel-Rosenblatt test based on tilted density estimation for autoregressive models and deep merged survival analysis on cancer study using multiple types of bioinformatic data. In the first topic study, we consider the goodness of fit test the error density of linear and nonlinear autoregressive models using tilted kernel density estimation based on residuals. Bickel-Rosenblatt test statistic is based on the integrated square error of non-parametric error density estimation and a smoothed version of the parametric fit of the density. It is shown that the new type of Bickel-Rosenblatt test statistics behaves asymptotically the same as …


Evaluation Of The Effect Of The Clinical-Decision-Support Systems On Diabetes Management: A Multivariate Meta-Analysis Comparison With Univariate Meta-Analysis, Abdelfattah Elbarsha Jan 2021

Evaluation Of The Effect Of The Clinical-Decision-Support Systems On Diabetes Management: A Multivariate Meta-Analysis Comparison With Univariate Meta-Analysis, Abdelfattah Elbarsha

Electronic Theses and Dissertations

The advantage of using meta-analysis lies in its ability in providing a quantitative summary of the findings from multiple studies. The aim of this dissertation was first to conduct a simulation study in order to understand what factors (sample size, between-study correlation, and percent of missing data) have a significant effect on meta-analysis estimates and whether using univariate or multivariate meta-analysis would produce different estimates.

The second goal of this study was to evaluate the effect of clinical decision support systems CDSS on diabetes care management by conducting three separate univariate meta-analyses and one multivariate meta-analysis. CDSS are health information …


Improving The Data Quality In Gravitation-Wave Detectors By Mitigating Transient Noise Artifacts, Kentaro Mogushi Jan 2021

Improving The Data Quality In Gravitation-Wave Detectors By Mitigating Transient Noise Artifacts, Kentaro Mogushi

Doctoral Dissertations

“The existence of gravitational waves (GWs), small perturbations in spacetime produced by accelerating massive objects was first predicted in 1916 as solutions of Einstein’s Theory of General Relativity (Einstein, 1916). Detecting and analyzing GWs produced by sources allows us to probe astrophysical phenomena.

The era of GW astronomy began from the first direct detection of the coalescence of a binary black hole in 2015 by the collaboration of the advanced Laser Interferometer Gravitational-wave Observatory (LIGO) (Aasi et al., 2015) and advanced Virgo (Abbott et al., 2016a). Since 2015, LIGO-Virgo detected about 50 confident transient events of GW signals (Abbott et …


Ensemble Protein Inference Evaluation, Kyle Lee Lucke Jan 2021

Ensemble Protein Inference Evaluation, Kyle Lee Lucke

Graduate Student Theses, Dissertations, & Professional Papers

The Protein inference problem is becoming an increasingly important tool that aids in the characterization of complex proteomes and analysis of complex protein samples. In bottom-up shotgun proteomics experiments the metrics for evaluation (like AUC and calibration error) are based on an often imperfect target-decoy database. These metrics make the inherent assumption that all of the proteins in the target set are present in the sample being analyzed. In general, this is not the case, they are typically a mix of present and absent proteins. To objectively evaluate inference methods, protein standard datasets are used. These datasets are special in …


Energy And Greenhouse Gas Savings For Leed-Certified U.S. Office Buildings Using Weighted Regression, Tian Liang Jan 2021

Energy And Greenhouse Gas Savings For Leed-Certified U.S. Office Buildings Using Weighted Regression, Tian Liang

Honors Papers

In this study, we studied the energy consumption and greenhouse gas emission performance of LEED-certified office buildings. We obtained the 2016 energy consumption and greenhouse gas emission data for 4002 office buildings from nine major US cities, including 522 buildings that we identified as LEED-certified. We discovered that LEED buildings used significantly more electricity percentagewise as their energy source. We also discovered that the locations and ages of buildings have significant effect on their performance. We removed the effect of locations and building ages using weighted regression. Our result showed that LEED office buildings used 11% less site energy, 9% …


Examining Multiple Imputation For Measurement Error Correction In Count Data With Excess Zeros, Shalima Zalsha Dec 2020

Examining Multiple Imputation For Measurement Error Correction In Count Data With Excess Zeros, Shalima Zalsha

Statistical Science Theses and Dissertations

Measurement error and missing data are two common problems in wildlife population surveys. These data are collected from the environment and may be missing or measured with error when the observer’s ability to see the animal is obscured. Methods such as video transects for estimating red snapper abundance and aerial surveys for estimating moose population sizes are highly affected by these problems since total abundance will be underestimated if missing/mismeasured counts are ignored. We shall refer to this problem as visibility bias; it occurs when the true counts are observed when visibility is high, partially observed when visibility is low …