Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 151 - 180 of 685

Full-Text Articles in Physical Sciences and Mathematics

Gradient Boosting For Survival Analysis With Applications In Oncology, Nam Phuong Nguyen Jan 2020

Gradient Boosting For Survival Analysis With Applications In Oncology, Nam Phuong Nguyen

USF Tampa Graduate Theses and Dissertations

Cancer is one of the most deadly diseases that the world has been fighting against over decades. An enormous number of research has been conducted, via a wide scale of approaches, raging from genetic analysis to mathematical modeling. Survival analysis is a well-performed methodology frequently used to estimate the survival probability of a patient. Although there has been a large number of methods for survival analysis, efficient exploration of a high-dimensional feature space has been challenging due to its computational cost and complexity. This thesis adapts the component-wise gradient boosting algorithms for cancer survival analysis, and also proposes a new …


Power Analysis On A Pilot Study Of The Caloric Intake Of Children Helping Prepare Meals Versus Children Not, Danielle Clifford Jan 2020

Power Analysis On A Pilot Study Of The Caloric Intake Of Children Helping Prepare Meals Versus Children Not, Danielle Clifford

Student Research Poster Presentations 2020

The purpose of this analysis is to determine the sample size needed for a study that will be used to discover if there is a difference in the caloric intake of children who help with meal preparation and children who do not help with meal preparation.


Predicting Diabetes Diagnoses, Sarah Netchert Jan 2020

Predicting Diabetes Diagnoses, Sarah Netchert

Student Research Poster Presentations 2020

This study explored the traits and health state of African Americans in central Virginia in order to determine what traits put people at a higher probability of being diagnosed with diabetes. We also want to know which traits will generate the highest probability a person will be diagnosed with diabetes. Traits that were included and used in this study were cholesterol, stabilized glucose, high density lipoprotein levels, age(years), gender, height(inches), weight(pounds), systolic blood pressure, diastolic blood pressure, waist size(inches), and hip size(inches). There were 403 individuals included in study since they were only ones screened for diabetes out of 1,046 …


Playfair's Introduction Of Bar And Pie Charts To Represent Data, Diana White, River Bond, Joshua Eastes, Negar Janani Jan 2020

Playfair's Introduction Of Bar And Pie Charts To Represent Data, Diana White, River Bond, Joshua Eastes, Negar Janani

Statistics and Probability

No abstract provided.


Designing A Student Exchange Program: Facilitating Interdisciplinary, Mathematics-Focused Collaboration Among College Students, Bryan D. Poole, Linden Turner, Caroline Maher-Boulis Jan 2020

Designing A Student Exchange Program: Facilitating Interdisciplinary, Mathematics-Focused Collaboration Among College Students, Bryan D. Poole, Linden Turner, Caroline Maher-Boulis

Journal of Mathematics and Science: Collaborative Explorations

Interdisciplinary collaboration is necessary for students’ professional preparation (Laird et al., 2014; Repko, 2014) and may promote effective learning transfer of course content. Such collaborations have resulted in enhanced problem-solving skills and conceptual understanding of statistics content (Dierker et al., 2012; Everett, 2016; Hammersley et al., 2019; Woodzicka et al., 2015). As a result of ongoing collaborations between faculty members in different disciplines and at different universities, we created a “Student Exchange Program” to encourage interdisciplinary collaboration between undergraduate students in mathematics and social sciences. In the current paper, we describe past research that informed the design of this program, …


Bayesian Approach To Finding The Most Likely Circuit Structure, Shannon Harms Jan 2020

Bayesian Approach To Finding The Most Likely Circuit Structure, Shannon Harms

Graduate Research Theses & Dissertations

Systems, and their reliabilities, depend on the reliabilities of the components that theyare composed of, and in this paper we want to nd the system structure that is the most likely given observed data. Bayesian methods were utilized in order to discover the posterior means, or observed reliabilities, of both the components and the systems. Assuming the serial and parallel system structures have independent components, we calculated system reliabilities based on observed component reliabilities by using the multiplication and addi- tion probability rules. We are then able to expand upon the numerical comparison method through a maximum likelihood analysis that …


The Role Of Topography, Soil, And Remotely Sensed Vegetation Condition Towards Predicting Crop Yield, Trenton E. Franz, Sayli Pokal, Justin P. Gibson, Yuzhen Zhou, Hamed Gholizadeh, Fatima Amor Tenorio, Daran Rudnick, Derek M. Heeren, Matthew F. Mccabe, Matteo Ziliani, Zhenong Jin, Kaiyu Guan, Ming Pan, John Gates, Brian Wardlow Jan 2020

The Role Of Topography, Soil, And Remotely Sensed Vegetation Condition Towards Predicting Crop Yield, Trenton E. Franz, Sayli Pokal, Justin P. Gibson, Yuzhen Zhou, Hamed Gholizadeh, Fatima Amor Tenorio, Daran Rudnick, Derek M. Heeren, Matthew F. Mccabe, Matteo Ziliani, Zhenong Jin, Kaiyu Guan, Ming Pan, John Gates, Brian Wardlow

School of Natural Resources: Faculty Publications

Foreknowledge of the spatiotemporal drivers of crop yield would provide a valuable source of information to optimize on-farm inputs and maximize profitability. In recent years, an abundance of spatial data providing information on soils, topography, and vegetation condition have become available from both proximal and remote sensing platforms. Given the wide range of data costs (between USD $0−50/ha), it is important to understand where often limited financial resources should be directed to optimize field production. Two key questions arise. First, will these data actually aid in better fine-resolution yield prediction to help optimize crop management and farm economics? Second, what …


An Examination Of Covid-19 Statistical Modeling, Shane Vaughan Jan 2020

An Examination Of Covid-19 Statistical Modeling, Shane Vaughan

Williams Honors College, Honors Research Projects

The 2019 novel coronavirus, also known as COVID-19, is an infectious disease which was first reported in late 2019 and soon spread to become a global pandemic, prompting major action from world governments. Soon after, many institutions began attempts to analyze and predict the spread and severity of the disease via statistical modeling. Some information is not available for public consumption; however, a number of institutions have published the results of their analyses and some have made public repositories of the code used to build the models. This research paper attempts use these and other resources to examine the modeling …


Reliability Comparisons Of Mobile Network Operators: An Experimental Case Study From A Crowdsourced Dataset, Engi̇n Zeydan, Ahmet Yildirim Jan 2020

Reliability Comparisons Of Mobile Network Operators: An Experimental Case Study From A Crowdsourced Dataset, Engi̇n Zeydan, Ahmet Yildirim

Turkish Journal of Electrical Engineering and Computer Sciences

It is of great interest for Mobile Network Operators (MNOs) to know how well their network infrastructure performance behaves in different geographical regions of their operating country compared to their horizontal competitors. However, traditional network monitoring and measurement methods of network infrastructure use limited numbers of measurement points that are insufficient for detailed analysis and expensive to scale using an internal workforce. On the other hand, the abundance of crowdsourced content can engender various unforeseen opportunities for MNOs to cope with this scaling problem. This paper investigates end-to-end reliability and packet loss (PL) performance comparisons of MNOs using a previously …


Representing And Interpreting Data From Playfair, Diana White, River Bond, Joshua Eastes, Negar Janani Jan 2020

Representing And Interpreting Data From Playfair, Diana White, River Bond, Joshua Eastes, Negar Janani

Statistics and Probability

No abstract provided.


Should We Expect Each Year In The Next Decade (2019–28) To Be Ranked Among The Top 10 Warmest Years Globally?, Anthony Arguez, Shannan Hurley, Anand Inamdar, Laurel Mahoney, Ahira Sanchez-Lugo, Lilian Yang Jan 2020

Should We Expect Each Year In The Next Decade (2019–28) To Be Ranked Among The Top 10 Warmest Years Globally?, Anthony Arguez, Shannan Hurley, Anand Inamdar, Laurel Mahoney, Ahira Sanchez-Lugo, Lilian Yang

Political Science & Geography Faculty Publications

Annual rankings of global temperature are widely cited by media and the general public, not only to place the most recent year in a historical perspective, but also as a first-order metric of recent climate change that is easily digestible by the general public. Moreover, all annual NOAAGlobalTemp anomalies from 1880 (the earliest reading available) through the mid-1970s are well below anomalies of the top 10 warmest years in Table 1, even when considering the uncertainty of the NOAAGlobalTemp time series values. While we expect the algorithm's performance to be largely independent of any changes made to the way that …


Outlier Profiles Of Atomic Structures Derived From X-Ray Crystallography And From Cryo-Electron Microscopy, Lin Chen, Jing He, Angelo Facchiano Jan 2020

Outlier Profiles Of Atomic Structures Derived From X-Ray Crystallography And From Cryo-Electron Microscopy, Lin Chen, Jing He, Angelo Facchiano

Computer Science Faculty Publications

Background: As more protein atomic structures are determined from cryo-electron microscopy (cryo-EM) density maps, validation of such structures is an important task. Methods: We applied a histogram-based outlier score (HBOS) to six sets of cryo-EM atomic structures and five sets of X-ray atomic structures, including one derived from X-ray data with better than 1.5 Å resolution. Cryo-EM data sets contain structures released by December 2016 and those released between 2017 and 2019, derived from resolution ranges 0–4 Å and 4–6 Å respectively. Results: The distribution of HBOS values in five sets of X-ray structures show that HBOS is sensitive distinguishing …


Deriving Statistical Inference From The Application Of Artificial Neural Networks To Clinical Metabolomics Data, Kevin M. Mendez Jan 2020

Deriving Statistical Inference From The Application Of Artificial Neural Networks To Clinical Metabolomics Data, Kevin M. Mendez

Theses: Doctorates and Masters

Metabolomics data are complex with a high degree of multicollinearity. As such, multivariate linear projection methods, such as partial least squares discriminant analysis (PLS-DA) have become standard. Non-linear projections methods, typified by Artificial Neural Networks (ANNs) may be more appropriate to model potential nonlinear latent covariance; however, they are not widely used due to difficulty in deriving statistical inference, and thus biological interpretation. Herein, we illustrate the utility of ANNs for clinical metabolomics using publicly available data sets and develop an open framework for deriving and visualising statistical inference from ANNs equivalent to standard PLS-DA methods.


Inference Of Heterogeneity In Meta-Analysis Of Rare Binary Events And Rss-Structured Cluster Randomized Studies, Chiyu Zhang Dec 2019

Inference Of Heterogeneity In Meta-Analysis Of Rare Binary Events And Rss-Structured Cluster Randomized Studies, Chiyu Zhang

Statistical Science Theses and Dissertations

This dissertation contains two topics: (1) A Comparative Study of Statistical Methods for Quantifying and Testing Between-study Heterogeneity in Meta-analysis with Focus on Rare Binary Events; (2) Estimation of Variances in Cluster Randomized Designs Using Ranked Set Sampling.

Meta-analysis, the statistical procedure for combining results from multiple studies, has been widely used in medical research to evaluate intervention efficacy and safety. In many practical situations, the variation of treatment effects among the collected studies, often measured by the heterogeneity parameter, may exist and can greatly affect the inference about effect sizes. Comparative studies have been done for only one or …


Fractional Random Weighted Bootstrapping For Classification On Imbalanced Data With Ensemble Decision Tree Methods, Sean Charles Carter Nov 2019

Fractional Random Weighted Bootstrapping For Classification On Imbalanced Data With Ensemble Decision Tree Methods, Sean Charles Carter

USF Tampa Graduate Theses and Dissertations

Ensemble methods are commonly used for building predictive models for classification. Models that are unstable to perturbations in the training set, such as the decision tree, often see considerable reductions in error when grouped, using bootstrapped resamples of the training data to train many models. The non-parametric bootstrap, however, has limited efficacy when used on severely imbalanced data, especially when the number of observations of one or more classes is exceptionally small. We explore the fractional random weighted bootstrap, which randomly assigns fractional weights to observations, as an alternative resampling pro cedure in training machine learning ensembles, particularly decision tree …


9th Annual Postdoctoral Science Symposium, University Of Texas Md Anderson Cancer Center Postdoctoral Association Sep 2019

9th Annual Postdoctoral Science Symposium, University Of Texas Md Anderson Cancer Center Postdoctoral Association

Annual Postdoctoral Science Symposium Abstracts

The mission of the Annual Postdoctoral Science Symposium (APSS) is to provide a platform for talented postdoctoral fellows throughout the Texas Medical Center to present their work to a wider audience. The MD Anderson Postdoctoral Association convened its inaugural Annual Postdoctoral Science Symposium (APSS) on August 4, 2011.

The APSS provides a professional venue for postdoctoral scientists to develop, clarify, and refine their research as a result of formal reviews and critiques of faculty and other postdoctoral scientists. Additionally, attendees discuss current research on a broad range of subjects while promoting academic interactions and enrichment and developing new collaborations.


Stability And Application Of The K-Core Dynamical Model To Biological Networks, Francesca Beatrice Arese Lucini Sep 2019

Stability And Application Of The K-Core Dynamical Model To Biological Networks, Francesca Beatrice Arese Lucini

Dissertations, Theses, and Capstone Projects

The objective of the dissertation is to illustrate the importance of the k-core dynamical model, by first presenting the stability analysis of the nonlinear k-core model and compare its solution to the most widely used linear model. Second, I show a real world application of the k-core model to describe properties of neural networks, specifically, the transition from conscious to subliminal perception.


Sample Size Requirements And Considerations For Models To Assess Human-Machine System Performance, Jennifer S. G. Lopez Sep 2019

Sample Size Requirements And Considerations For Models To Assess Human-Machine System Performance, Jennifer S. G. Lopez

Theses and Dissertations

Hierarchical Linear Models (HLMs), also known as multi-level models, are an extension of multiple regression analysis and can aid in the understanding of human and machine workloads of a system. These models allow for prediction and testing in systems with hierarchies of two or more levels. The complex interrelated variability of these multi-level models exists in operational settings, such as the Air Force Distributed Common Ground System Full Motion Video (AF DCGS FMV) community which is composed of individuals (Level-1), groups (Level-2), units (Level-3), and organizations (Level-4). Through the development of sample size requirements and considerations for multi-level models, this …


Who Can Act? Critical Assumptions At The Foundations Of Statistical Analysis, Peter J. Taylor Aug 2019

Who Can Act? Critical Assumptions At The Foundations Of Statistical Analysis, Peter J. Taylor

Working Papers on Science in a Changing World

Thinking about a simple teaching example on the t-test for comparing the average (mean) for some measurement in a group versus the average in another led me to articulate a sequence of thoughts and questions about the foundations of statistical analysis. In particular, my inquiry explores contrasts between: the statistical emphasis on averages or types around which there is variation or noise; variation as a mixture of types; the dynamics (or heterogeneous mix of dynamics) that generated the data analyzed; and participatory restructuring of these dynamics in the future. Two key issues are: Who is assumed to be able to …


A Bayesian Approach To Deriving Ages Of Individual Field White Dwarfs, Erin M. O'Malley, Ted Von Hippel, David A. Van Dyk Aug 2019

A Bayesian Approach To Deriving Ages Of Individual Field White Dwarfs, Erin M. O'Malley, Ted Von Hippel, David A. Van Dyk

Ted von Hippel

We apply a self-consistent and robust Bayesian statistical approach to determine the ages, distances, and zero-age main sequence (ZAMS) masses of 28 field DA white dwarfs (WDs) with ages of approximately 4-8 Gyr. Our technique requires only quality optical and near-infrared photometry to derive ages with <15% uncertainties, generally with little sensitivity to our choice of modern initial-final mass relation. We find that age, distance, and ZAMS mass are correlated in a manner that is too complex to be captured by traditional error propagation techniques. We further find that the posterior distributions of age are often asymmetric, indicating that the standard approach to deriving WD ages can yield misleading results.


Sample Size Calculation Of Clinical Trials With Correlated Outcomes, Dateng Li Aug 2019

Sample Size Calculation Of Clinical Trials With Correlated Outcomes, Dateng Li

Statistical Science Theses and Dissertations

In this thesis, we investigate sample size calculation for three kinds of clinical trials: (1). Randomized controlled trials (RCTs) with longitudinal count outcomes; (2). Cluster randomized trials (CRTs) with count outcomes; (3). CRTs with multiple binary co-primary endpoints.


Is Corequisite Developmental Math Effective At East Tennessee State University?, Christine Padden Aug 2019

Is Corequisite Developmental Math Effective At East Tennessee State University?, Christine Padden

Electronic Theses and Dissertations

This thesis looks at the corequisite developmental math program at East Tennessee State University (ETSU) and compares the effectiveness to the previous developmental math program by comparing the student outcomes in MATH 1530. MATH 1530 is a non-calculus based statistic and probability course that satisfies most majors’ general education math requirements. ETSU sees approximately 1,000 students a year pass through MATH 1530 which is around 6.7% of the total enrollment at ETSU[9]. We are interested in the last five years of the developmental math program before it was changed to corequisite developmental math and the first five years of corequisite …


Effect Of Cross-Validation On The Output Of Multiple Testing Procedures, Josh Dallas Price Aug 2019

Effect Of Cross-Validation On The Output Of Multiple Testing Procedures, Josh Dallas Price

Graduate Theses and Dissertations

High dimensional data with sparsity is routinely observed in many scientific disciplines. Filtering out the signals embedded in noise is a canonical problem in such situations requiring multiple testing. The Benjamini--Hochberg procedure using False Discovery Rate control is the gold standard in large scale multiple testing. In Majumder et al. (2009) an internally cross-validated form of the procedure is used to avoid a costly replicate study and the complications that arise from population selection in such studies (i.e. extraneous variables). I implement this procedure and run extensive simulation studies under increasing levels of dependence among parameters and different data generating …


Spatio-Temporal Analysis Of Tree Ring Chronology And Precipitation, Ruizhe Yin Aug 2019

Spatio-Temporal Analysis Of Tree Ring Chronology And Precipitation, Ruizhe Yin

Graduate Theses and Dissertations

Tree ring chronology data is known to reflect regional climate due to the strong impact of rainfall and temperature. Therefore, tree ring data can be used to reconstruct historical climate in order to understand how climate changed in the past and make prediction about the future behavior of the climate. For simplicity, this research only considers the influence of precipitation on tree ring growth within the New England area. A total of 94 measurement sites are used to record tree ring width over 881 years and corresponding precipitation data are given at some locations for 121 years. We developed a …


Predictive Diagnostic Analysis Of Mammographic Breast Tissue Microenvironment, Dexter G. Canning Aug 2019

Predictive Diagnostic Analysis Of Mammographic Breast Tissue Microenvironment, Dexter G. Canning

Honors College

Improving computer-aided early detection techniques for breast cancer is paramount because current technology has high false positive rates. Existing methods have led to a substantial number of false diagnostics, which lead to stress, unnecessary biopsies, and an added financial burden to the health care system. In order to augment early detection methodology, one must understand the breast microenvironment. The CompuMAINE Lab has researched computational metrics on mammograms based on an image analysis technique called the Wavelet Transform Modulus Maxima (WTMM) method to identify the fractal and roughness signature from mammograms. The WTMM method was used to color code the mammograms …


Mathematics Versus Statistics, Mindy B. Capaldi Jul 2019

Mathematics Versus Statistics, Mindy B. Capaldi

Journal of Humanistic Mathematics

Mathematics and statistics are both important and useful subjects, but the former has maintained prominence in the American education system. On the other hand, statistics is more prevalent in daily life and is an increasingly marketable subject to know. This article gives a personal history of one mathematician’s bumpy road to learning and teaching statistics. Additionally, arguments for how and why to include statistics in the K-12 and college curricula are provided.


Estimation Of Association Between A Longitudinal Marker And Interval-Censored Progression Times, Naghmeh Daneshi Jul 2019

Estimation Of Association Between A Longitudinal Marker And Interval-Censored Progression Times, Naghmeh Daneshi

Dissertations and Theses

In longitudinal studies, we observe the subjects who are likely to progress to a new state during the study time. For example, in clinical trials the stage of a progressing disease is recorded at each follow-up visit. The primary goal is to estimate the relationship between the attributes and the subject's progression state. In such studies, some subjects complete all their follow-up visits and their progression state are observed without any missingness. However, others miss their follow-up visits and when they come back, they learn that they have progressed to a new state. In this case, not only are their …


Differentially Expressed Genes In Blood From Young Pigs Between Two Swine Lines Divergently Selected For Feed Efficiency: Potential Biomarkers For Improving Feed Efficiency, Haibo Liu, Yet T. Nguyen, Daniel S. Nettleton, Jack C. M. Dekkers, Christopher K. Tuggle Jun 2019

Differentially Expressed Genes In Blood From Young Pigs Between Two Swine Lines Divergently Selected For Feed Efficiency: Potential Biomarkers For Improving Feed Efficiency, Haibo Liu, Yet T. Nguyen, Daniel S. Nettleton, Jack C. M. Dekkers, Christopher K. Tuggle

Dan Nettleton

The goal of this study was to find potential gene expression biomarkers in blood of piglets that can be used to predict pigs’ future feed efficiency. Using RNA-seq technology, we found 453 genes were differentially expressed (false discovery rate (FDR) ≤ 0.05) in the blood of two Yorkshire lines of pigs divergently selected for feed efficiency (FE) based on residual feed intake (RFI). Genes involved in several biosynthetic processes were overrepresented among genes more highly expressed in the low RFI line compared to the high RFI line. Weighted gene co-expression network analysis (WGCNA) also revealed genes involved in some of …


Stock Market Analysis: A Review And Taxonomy Of Prediction Techniques, Dev Shah, Haruna Isah, Farhana Zulkernine May 2019

Stock Market Analysis: A Review And Taxonomy Of Prediction Techniques, Dev Shah, Haruna Isah, Farhana Zulkernine

Publications and Scholarship

Stock market prediction has always caught the attention of many analysts and researchers. Popular theories suggest that stock markets are essentially a random walk and it is a fool’s game to try and predict them. Predicting stock prices is a challenging problem in itself because of the number of variables which are involved. In the short term, the market behaves like a voting machine but in the longer term, it acts like a weighing machine and hence there is scope for predicting the market movements for a longer timeframe. Application of machine learning techniques and other algorithms for stock price …


Advances In Measurement Error Modeling, Linh Nghiem May 2019

Advances In Measurement Error Modeling, Linh Nghiem

Statistical Science Theses and Dissertations

Measurement error in observations is widely known to cause bias and a loss of power when fitting statistical models, particularly when studying distribution shape or the relationship between an outcome and a variable of interest. Most existing correction methods in the literature require strong assumptions about the distribution of the measurement error, or rely on ancillary data which is not always available. This limits the applicability of these methods in many situations. Furthermore, new correction approaches are also needed for high-dimensional settings, where the presence of measurement error in the covariates adds another level of complexity to the desirable structure …