Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 31 - 60 of 6717

Full-Text Articles in Physical Sciences and Mathematics

Comparative Analysis Of Hate Speech Detection: Traditional Vs. Deep Learning Approaches, Haibo Pen, Nicole Anne Huiying Teo, Zhaoxia Wang Jul 2024

Comparative Analysis Of Hate Speech Detection: Traditional Vs. Deep Learning Approaches, Haibo Pen, Nicole Anne Huiying Teo, Zhaoxia Wang

Research Collection School Of Computing and Information Systems

Detecting hate speech on social media poses a significant challenge, especially in distinguishing it from offensive language, as learning-based models often struggle due to nuanced differences between them, which leads to frequent misclassifications of hate speech instances, with most research focusing on refining hate speech detection methods. Thus, this paper seeks to know if traditional learning-based methods should still be used, considering the perceived advantages of deep learning in this domain. This is done by investigating advancements in hate speech detection. It involves the utilization of deep learning-based models for detailed hate speech detection tasks and compares the results with …


Performance Analysis Of Llama 2 Among Other Llms, Donghao Huang, Zhenda Hu, Zhaoxia Wang Jul 2024

Performance Analysis Of Llama 2 Among Other Llms, Donghao Huang, Zhenda Hu, Zhaoxia Wang

Research Collection School Of Computing and Information Systems

Llama 2, an open-source large language model developed by Meta, offers a versatile and high-performance solution for natural language processing, boasting a broad scale, competitive dialogue capabilities, and open accessibility for research and development, thus driving innovation in AI applications. Despite these advancements, there remains a limited understanding of the underlying principles and performance of Llama 2 compared with other LLMs. To address this gap, this paper presents a comprehensive evaluation of Llama 2, focusing on its application in in-context learning — an AI design pattern that harnesses pre-trained LLMs for processing confidential and sensitive data. Through a rigorous comparative …


A Bottom-Up Multi-Disciplinary Approach For Sustainability Education: Un-Sdg 13.3, Benjamin Gan, Thomas Menkhoff, Eng Lieh Ouh, Kevin Cheong Jul 2024

A Bottom-Up Multi-Disciplinary Approach For Sustainability Education: Un-Sdg 13.3, Benjamin Gan, Thomas Menkhoff, Eng Lieh Ouh, Kevin Cheong

Research Collection School Of Computing and Information Systems

Teaching both information systems and business undergraduates to break the current inertia in sustainability action requires innovative teaching & learning approaches as well as inter-disciplinary knowledge inputs. This study presents a bottom-up T&L approach delivered by a group of educators from different disciplines aimed at addressing UN-SDG Goal 13 ‘Climate Action’ with a novel approach. Integrating a problem-centric community project assignment into existing courses, our students worked on different disciplinary elements such as persuasive technologies and awareness campaigns to help to address local sustainability initiatives by community partners. We collected data to measure how students’ motivation, engagement, teamwork, and community …


Exploring The Market Impact Of Web3 Identity Imitation In Ethereum Name Service, Ping Fan Ke, Yi Meng Lau Jul 2024

Exploring The Market Impact Of Web3 Identity Imitation In Ethereum Name Service, Ping Fan Ke, Yi Meng Lau

Research Collection School Of Computing and Information Systems

Digital identities are paramount in today’s digital landscape. However, in the Web3 ecosystem, the absence of a central governing body leaves digital identities, such as domain names, vulnerable to cybersquatting and identity imitation. This study examines the market impact of identity imitation in the Web3 ecosystem. By scrutinizing trading activities within Web3 domain names from Ethereum Name Service (ENS) and its imitator, "Ether Name Service," we found that the presence of a newly imitating domain name increases the subsequent resale value of the authentic domain name. Additionally, we find a positive correlation between the resale value of the imitating domain …


Broadening The View: Demonstration-Augmented Prompt Learning For Conversational Recommendation, Quang Huy Dao, Yang Deng, Dung D. Le, Lizi Liao Jul 2024

Broadening The View: Demonstration-Augmented Prompt Learning For Conversational Recommendation, Quang Huy Dao, Yang Deng, Dung D. Le, Lizi Liao

Research Collection School Of Computing and Information Systems

Conversational Recommender Systems (CRSs) leverage natural language dialogues to provide tailored recommendations. Traditional methods in this field primarily focus on extracting user preferences from isolated dialogues. It often yields responses with a limited perspective, confined to the scope of individual conversations. Recognizing the potential in collective dialogue examples, our research proposes an expanded approach for CRS models, utilizing selective analogues from dialogue histories and responses to enrich both generation and recommendation processes. This introduces significant research challenges, including: (1) How to secure high-quality collections of recommendation dialogue exemplars? (2) How to effectively leverage these exemplars to enhance CRS models?To tackle …


Towards Human-Centered Proactive Conversational Agents, Yang Deng, Lizi Liao, Zhonghua Zheng, Grace Hui Yang, Tat-Seng Chua Jul 2024

Towards Human-Centered Proactive Conversational Agents, Yang Deng, Lizi Liao, Zhonghua Zheng, Grace Hui Yang, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

Recent research on proactive conversational agents (PCAs) mainly focuses on improving the system's capabilities in anticipating and planning action sequences to accomplish tasks and achieve goals before users articulate their requests. This perspectives paper highlights the importance of moving towards building human-centered PCAs that emphasize human needs and expectations, and that considers ethical and social implications of these agents, rather than solely focusing on technological capabilities. The distinction between a proactive and a reactive system lies in the proactive system's initiative-taking nature. Without thoughtful design, proactive systems risk being perceived as intrusive by human users. We address the issue by …


Sequential Decision Learning For Social Good And Fairness, Dexun Li Jul 2024

Sequential Decision Learning For Social Good And Fairness, Dexun Li

Dissertations and Theses Collection (Open Access)

Sequential decision learning is one of the key research areas in artificial intelligence. Typically, a sequence of events is observed through a transformation that introduces uncertainty into the observations and based on these observations, the recognition process produces a hypothesis of the underlying events. This learning process is characterized by maximizing the sum of the reward signals. However, many real-life problems are inherently constrained by limited resources. Besides, when the learning algorithms are used to inform decisions involving human beings (e.g., Security and justice, health intervention, etc), they may inherit the potential, pre-existing bias in the dataset and exhibit similar …


Towards Automated Slide Augmentation To Discover Credible And Relevant Links, Dilan Dinushka Senarath Arachchige, Christopher M. Poskitt, Kwan Chin (Xu Guangjin) Koh, Heng Ngee Mok, Hady Wirawan Lauw Jul 2024

Towards Automated Slide Augmentation To Discover Credible And Relevant Links, Dilan Dinushka Senarath Arachchige, Christopher M. Poskitt, Kwan Chin (Xu Guangjin) Koh, Heng Ngee Mok, Hady Wirawan Lauw

Research Collection School Of Computing and Information Systems

Learning from concise educational materials, such as lecture notes and presentation slides, often prompts students to seek additional resources. Newcomers to a subject may struggle to find the best keywords or lack confidence in the credibility of the supplementary materials they discover. To address these problems, we introduce Slide++, an automated tool that identifies keywords from lecture slides, and uses them to search for relevant links, videos, and Q&As. This interactive website integrates the original slides with recommended resources, and further allows instructors to 'pin' the most important ones. To evaluate the effectiveness of the tool, we trialled the system …


Jigsaw: Edge-Based Streaming Perception Over Spatially Overlapped Multi-Camera Deployments, Ila Gokarn, Yigong Hu, Tarek Abdelzaher, Archan Misra Jul 2024

Jigsaw: Edge-Based Streaming Perception Over Spatially Overlapped Multi-Camera Deployments, Ila Gokarn, Yigong Hu, Tarek Abdelzaher, Archan Misra

Research Collection School Of Computing and Information Systems

We present JIGSAW, a novel system that performs edge-based streaming perception over multiple video streams, while additionally factoring in the redundancy offered by the spatial overlap often exhibited in urban, multi-camera deployments. To assure high streaming throughput, JIGSAW extracts and spatially multiplexes multiple regions-of-interest from different camera frames into a smaller canvas frame. Moreover, to ensure that perception stays abreast of evolving object kinematics, JIGSAW includes a utility-based weighted scheduler to preferentially prioritize and even skip object-specific tiles extracted from an incoming stream of camera frames. Using the CityflowV2 traffic surveillance dataset, we show that JIGSAW can simultaneously process 25 …


Generalization Analysis Of Deep Nonlinear Matrix Completion, Antoine Ledent, Rodrigo Alves Jul 2024

Generalization Analysis Of Deep Nonlinear Matrix Completion, Antoine Ledent, Rodrigo Alves

Research Collection School Of Computing and Information Systems

We provide generalization bounds for matrix completion with Schatten $p$ quasi-norm constraints, which is equivalent to deep matrix factorization with Frobenius constraints. In the uniform sampling regime, the sample complexity scales like $\widetilde{O}\left( rn\right)$ where $n$ is the size of the matrix and $r$ is a constraint of the same order as the ground truth rank in the isotropic case. In the distribution-free setting, the bounds scale as $\widetilde{O}\left(r^{1-\frac{p}{2}}n^{1+\frac{p}{2}}\right)$, which reduces to the familiar $\sqrt{r}n^{\frac{3}{2}}$ for $p=1$. Furthermore, we provide an analogue of the weighted trace norm for this setting which brings the sample complexity down to $\widetilde{O}(nr)$ in all …


A Deep Learning Method To Predict Bacterial Adp-Ribosyltransferase Toxins, Dandan Zheng, Siyu Zhou, Lihong Chen, Guansong Pang, Jian Yang Jul 2024

A Deep Learning Method To Predict Bacterial Adp-Ribosyltransferase Toxins, Dandan Zheng, Siyu Zhou, Lihong Chen, Guansong Pang, Jian Yang

Research Collection School Of Computing and Information Systems

Motivation: ADP-ribosylation is a critical modification involved in regulating diverse cellular processes, including chromatin structure regulation, RNA transcription, and cell death. Bacterial ADP-ribosyltransferase toxins (bARTTs) serve as potent virulence factors that orchestrate the manipulation of host cell functions to facilitate bacterial pathogenesis. Despite their pivotal role, the bioinformatic identification of novel bARTTs poses a formidable challenge due to limited verified data and the inherent sequence diversity among bARTT members. Results: We proposed a deep learning-based model, ARTNet, specifically engineered to predict bARTTs from bacterial genomes. Initially, we introduced an effective data augmentation method to address the issue of data scarcity …


Large Language Model Powered Agents For Information Retrieval, An Zhang, Yang Deng, Yankai Lin, Xu Chen, Ji-Rong Wen, Tat-Seng Chua Jul 2024

Large Language Model Powered Agents For Information Retrieval, An Zhang, Yang Deng, Yankai Lin, Xu Chen, Ji-Rong Wen, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

The vital goal of information retrieval today extends beyond merely connecting users with relevant information they search for. It also aims to enrich the diversity, personalization, and interactivity of that connection, ensuring the information retrieval process is as seamless, beneficial, and supportive as possible in the global digital era. Current information retrieval systems often encounter challenges like a constrained understanding of queries, static and inflexible responses, limited personalization, and restricted interactivity. With the advent of large language models (LLMs), there's a transformative paradigm shift as we integrate LLM-powered agents into these systems. These agents bring forth crucial human capabilities like …


The Institutional Challenges Of A Quantified Self Study: An Attempt To Ascertain How Data Collected From A Mobile Device Can Be An Indicator Of Personal Mental Health Over Time, Julian Lazaras Jun 2024

The Institutional Challenges Of A Quantified Self Study: An Attempt To Ascertain How Data Collected From A Mobile Device Can Be An Indicator Of Personal Mental Health Over Time, Julian Lazaras

University Honors Theses

The adoption of an application of new technology always comes with a bias, this is never more true for the case of human behavioral analytics within higher education. While movements such as the quantified self movement make strides to reinterpret the realm of data analytics, psychology, and computer science, there are inevitably limitations to the adoption and application of such approaches within the standard realm of research. Herein is presented a case where an effort to evaluate the prospect of use of mobile phone data as secondary indicators of personal mental health through the lens of data analysis was put …


Public Data Resources And Total Factor Productivity Of Enterprises: A Quasi-Natural Experiment Based On Local Government Data Opening, Wuping Wu, Qiheng Li, Liuyi Zhang, Yue Zhao Jun 2024

Public Data Resources And Total Factor Productivity Of Enterprises: A Quasi-Natural Experiment Based On Local Government Data Opening, Wuping Wu, Qiheng Li, Liuyi Zhang, Yue Zhao

Research Collection School Of Accountancy

The opening of public data is the government’s major strategic move to release the value of data factor. However, whether these data resources are used by the public to release their value needs to be empirically tested. Therefore, based on the perspective of high-quality development of firms, this paper examines the relation between open public data and firms’ total factor productivity so as to reflect the value of public data resources in driving force of promoting firms’ high-quality development. Taking A-share listed firms from 2010 to 2019 as samples, using a natural experiment based on the launch of the local …


The Efficacy Of Using Machine Learning Techniques For Identifying And Classifying “Fake News”, Muhammad Islam Jun 2024

The Efficacy Of Using Machine Learning Techniques For Identifying And Classifying “Fake News”, Muhammad Islam

Dissertations, Theses, and Capstone Projects

In today's digital world, detecting fake news has emerged as a critical challenge, one that has significant effects on democracy and public discourse at large both regionally and globally. This research studies how diversity of news sources in training datasets affects how well machine learning models can classify fake vs true news. I used the Linear Support Vector Classification (LinearSVC) to create and compare two classification models: one was trained on a dataset that only had real news from a singular source, Reuters (Dataset 1), and the other was trained on a dataset that contained real news from Reuters, The …


Usability Versus Collectibility In Nft: The Case Of Web3 Domain Names, Ping Fan Ke, Yi Meng Lau Jun 2024

Usability Versus Collectibility In Nft: The Case Of Web3 Domain Names, Ping Fan Ke, Yi Meng Lau

Research Collection School Of Computing and Information Systems

This study examines the market’s inclination towards usability and collectibility aspects of Non-Fungible Tokens (NFTs) within Web3 domain name marketplaces, drawing insights from resale records. Our findings reveal a prevailing preference for usability, as evidenced by consistently higher average resale prices observed for Ethereum Name Service (ENS) domains compared to Linagee Name Registrar (LNR) domains. However, domains with diminished usability, such as those containing non-ASCII characters, tend to attract investors due to their enhanced collectibility. Our analysis on the effect from previous resale suggests a potential aversion towards second-hand acquisitions among NFT investors when value derives primarily from usability, while …


Poster: Profiling Event Vision Processing On Edge Devices, Ila Nitin Gokarn, Archan Misra Jun 2024

Poster: Profiling Event Vision Processing On Edge Devices, Ila Nitin Gokarn, Archan Misra

Research Collection School Of Computing and Information Systems

As RGB camera resolutions and frame-rates improve, their increased energy requirements make it challenging to deploy fast, efficient, and low-power applications on edge devices. Newer classes of sensors, such as the biologically inspired neuromorphic event-based camera, capture only changes in light intensity per-pixel to achieve operational superiority in sensing latency (O(μs)), energy consumption (O(mW)), high dynamic range (140dB), and task accuracy such as in object tracking, over traditional RGB camera streams. However, highly dynamic scenes can yield an event rate of up to 12MEvents/second, the processing of which could overwhelm …


Efficient Cross-Modal Video Retrieval With Meta-Optimized Frames, Ning Han, Xun Yang, Ee-Peng Lim, Hao Chen, Qianru Sun Jun 2024

Efficient Cross-Modal Video Retrieval With Meta-Optimized Frames, Ning Han, Xun Yang, Ee-Peng Lim, Hao Chen, Qianru Sun

Research Collection School Of Computing and Information Systems

Cross-modal video retrieval aims to retrieve semantically relevant videos when given a textual query, and is one of the fundamental multimedia tasks. Most top-performing methods primarily leverage Vision Transformer (ViT) to extract video features [1]-[3]. However, they suffer from the high computational complexity of ViT, especially when encoding long videos. A common and simple solution is to uniformly sample a small number (e.g., 4 or 8) of frames from the target video (instead of using the whole video) as ViT inputs. The number of frames has a strong influence on the performance of ViT, e.g., using 8 frames yields better …


Applicability And Challenges Of Indoor Localization Using One-Sided Round Trip Time Measurements, Quang Hai Truong, Xi Kai Justin Lam, Guru Anand Anish, Rajesh Krishna Balan Jun 2024

Applicability And Challenges Of Indoor Localization Using One-Sided Round Trip Time Measurements, Quang Hai Truong, Xi Kai Justin Lam, Guru Anand Anish, Rajesh Krishna Balan

Research Collection School Of Computing and Information Systems

Radio Frequency fingerprinting, based on WiFi or cellular signals, has been a popular approach for localization. However, adoptions in real-world applications have confronted with challenges due to low accuracy, especially in crowded environments. The received signal strength (RSS) could be easily interfered by a large number of other devices or strictly depends on physical surrounding environments, which may cause localization errors of a few meters. On the other hand, the fine time measurement (FTM) round-trip time (RTT) has shown compelling improvement in indoor localization with ~1-2 meter accuracy in both 2D and 3D environments [13]. This method relies on the …


Fully Automated Selfish Mining Analysis In Efficient Proof Systems Blockchains, Krishnendu Chatterjee, Amirali Ebrahimzadeh, Mehrdad Karrabi, Krzysztof Pietrzak, Michelle Yeo, Dorde Zikelic Jun 2024

Fully Automated Selfish Mining Analysis In Efficient Proof Systems Blockchains, Krishnendu Chatterjee, Amirali Ebrahimzadeh, Mehrdad Karrabi, Krzysztof Pietrzak, Michelle Yeo, Dorde Zikelic

Research Collection School Of Computing and Information Systems

We study selfish mining attacks in longest-chain blockchains like Bitcoin, but where the proof of work is replaced with efficient proof systems - like proofs of stake or proofs of space - and consider the problem of computing an optimal selfish mining attack which maximizes expected relative revenue of the adversary, thus minimizing the chain quality. To this end, we propose a novel selfish mining attack that aims to maximize this objective and formally model the attack as a Markov decision process (MDP). We then present a formal analysis procedure which computes an ϵ-tight lower bound on the optimal expected …


Semantic Structuring Of Digital Documents: Knowledge Graph Generation And Evaluation, Erik E. Luu Jun 2024

Semantic Structuring Of Digital Documents: Knowledge Graph Generation And Evaluation, Erik E. Luu

Master's Theses

In the era of total digitization of documents, navigating vast and heterogeneous data landscapes presents significant challenges for effective information retrieval, both for humans and digital agents. Traditional methods of knowledge organization often struggle to keep pace with evolving user demands, resulting in suboptimal outcomes such as information overload and disorganized data. This thesis presents a case study on a pipeline that leverages principles from cognitive science, graph theory, and semantic computing to generate semantically organized knowledge graphs. By evaluating a combination of different models, methodologies, and algorithms, the pipeline aims to enhance the organization and retrieval of digital documents. …


Learning Dynamic Multimodal Network Slot Concepts From The Web For Forecasting Environmental, Social And Governance Ratings, Meng Kiat Gary Ang, Ee-Peng Lim Jun 2024

Learning Dynamic Multimodal Network Slot Concepts From The Web For Forecasting Environmental, Social And Governance Ratings, Meng Kiat Gary Ang, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

Dynamic multimodal networks are networks with node attributes from different modalities where the at- tributes and network relationships evolve across time, i.e., both networks and multimodal attributes are dynamic; for example, dynamic relationship networks between companies that evolve across time due to changes in business strategies and alliances, which are associated with dynamic company attributes from multiple modalities such as textual online news, categorical events, and numerical financial-related data. Such information can be useful in predictive tasks involving companies. Environmental, social, and gov- ernance (ESG) ratings of companies are important for assessing the sustainability risks of companies. The process of …


Gts: Gpu-Based Tree Index For Fast Similarity Search, Yifan Zhu, Ruiyao Ma, Baihua Zheng, Xiangyu Ke, Lu Chen, Yunjun Gao Jun 2024

Gts: Gpu-Based Tree Index For Fast Similarity Search, Yifan Zhu, Ruiyao Ma, Baihua Zheng, Xiangyu Ke, Lu Chen, Yunjun Gao

Research Collection School Of Computing and Information Systems

Similarity search, the task of identifying objects most similar to a given query object under a specific metric, has gathered significant attention due to its practical applications. However, the absence of coordinate information to accelerate similarity search and the high computational cost of measuring object similarity hinder the efficiency of existing CPU-based methods. Additionally, these methods struggle to meet the demand for high throughput data management. To address these challenges, we propose GTS, a GPU-based tree index designed for the parallel processing of similarity search in general metric spaces, where only the distance metric for measuring object similarity is known. …


To Protect Or To Hide: An Investigation On Corporate Redacted Disclosure Motives Under New Fast Act Regulation, Yan Ma, Qian Mao, Nan Hu Jun 2024

To Protect Or To Hide: An Investigation On Corporate Redacted Disclosure Motives Under New Fast Act Regulation, Yan Ma, Qian Mao, Nan Hu

Research Collection School Of Computing and Information Systems

China adopted amendments allowing companies to redact filings without prior approval in 2016. Leveraging this change as a quasi-nature experiment, we explore whether managers utilize redacted information to withhold bad information in the more lenient regulatory environment. Our investigation uncovers a significant shift in managerial behavior: Since 2016, managers incline to employ redactions to obscure negative news rather than safeguarding proprietary data. Furthermore, we find that the poorer firm performance and a higher cost of equity are associated with the redacted disclosures after 2016, suggesting that investors perceive an increase in firm-specific risk attributed to withholding bad news through redactions.


How Is Our Mobility Affected As We Age? Findings From A 934 Users Field Study Of Older Adults Conducted In An Urban Asian City, Yi Zhen Tan, Ngoc Doan Thu Tran, Sapphire Lin, Fang Zhao, Yee Sien Ng, Dong Ma, Jeonggil Ko, Rajesh Krishna Balan Jun 2024

How Is Our Mobility Affected As We Age? Findings From A 934 Users Field Study Of Older Adults Conducted In An Urban Asian City, Yi Zhen Tan, Ngoc Doan Thu Tran, Sapphire Lin, Fang Zhao, Yee Sien Ng, Dong Ma, Jeonggil Ko, Rajesh Krishna Balan

Research Collection School Of Computing and Information Systems

In this paper, we analyze the results of a large study involving 934 older adults living in an urban Asian city that collected their mobility patterns, in the form of logged GPS data, along with a multitude of demographic and health data. We show that mobility, in terms of average distance travelled per day, is greatly affected by age and by employment status. In addition, other factors such as type of day, household size, physical and financial conditions and the onset of retirement also play a significant role in determining the mobility of an individual. These results will have high …


Closest Pairs Search Over Data Stream, Rui Zhu Zhu, Bin Wang, Xiaochun Yang, Baihua Zheng Jun 2024

Closest Pairs Search Over Data Stream, Rui Zhu Zhu, Bin Wang, Xiaochun Yang, Baihua Zheng

Research Collection School Of Computing and Information Systems

��-closest pair (KCP for short) search is a fundamental problem in database research. Given a set of��-dimensional streaming data S, KCP search aims to retrieve �� pairs with the shortest distances between them. While existing works have studied continuous 1-closest pair query (i.e., �� = 1) over dynamic data environments, which allow for object insertions/deletions, they require high computational costs and cannot easily support KCP search with �� > 1. This paper investigates the problem of KCP search over data stream, aiming to incrementally maintain as few pairs as possible to support KCP search with arbitrarily ��. To achieve this, we …


Improving Interpretable Embeddings For Ad-Hoc Video Search With Generative Captions And Multi-Word Concept Bank, Jiaxin Wu, Chong-Wah Ngo, Wing-Kwong Chan Jun 2024

Improving Interpretable Embeddings For Ad-Hoc Video Search With Generative Captions And Multi-Word Concept Bank, Jiaxin Wu, Chong-Wah Ngo, Wing-Kwong Chan

Research Collection School Of Computing and Information Systems

Aligning a user query and video clips in cross-modal latent space and that with semantic concepts are two mainstream approaches for ad-hoc video search (AVS). However, the effectiveness of existing approaches is bottlenecked by the small sizes of available video-text datasets and the low quality of concept banks, which results in the failures of unseen queries and the out-of-vocabulary problem. This paper addresses these two problems by constructing a new dataset and developing a multi-word concept bank. Specifically, capitalizing on a generative model, we construct a new dataset consisting of 7 million generated text and video pairs for pre-training. To …


Try It Together - Qualitative Coding With Atlas.Ti, Danping Dong, Bryan Leow May 2024

Try It Together - Qualitative Coding With Atlas.Ti, Danping Dong, Bryan Leow

AI for Research Week

This hands-on session introduces Atlas.ti, a well-established qualitative data analysis tool for analyzing your transcripts and textual data. The session will cover coding data, extracting insights, creating visualizations, and exploring the tool's latest AI features.


Intelligent Solutions For Retroactive Anomaly Detection And Resolution With Log File Systems, Derek G. Rogers, Chanvo Nguyen, Abhay Sharma May 2024

Intelligent Solutions For Retroactive Anomaly Detection And Resolution With Log File Systems, Derek G. Rogers, Chanvo Nguyen, Abhay Sharma

SMU Data Science Review

This paper explores the intricate challenges log files pose from data science and machine learning perspectives. Drawing inspiration from existing methods, LAnoBERT, PULL, LLMs, and the breadth of recent research, this paper aims to push the boundaries of machine learning for log file systems. Our study comprehensively examines the unique challenges presented in our problem setup, delineates the limitations of existing methods, and introduces innovative solutions. These contributions are organized to offer valuable insights, predictions, and actionable recommendations tailored for Microsoft's engineers working on log data analysis.


Memories Of Recipes In Twentieth-Century Irish Cookbooks, Gary Thompson May 2024

Memories Of Recipes In Twentieth-Century Irish Cookbooks, Gary Thompson

Dublin Gastronomy Symposium

This paper analyses and categorises the ways in which authors and their publishers have chosen to include the author’s culinary, food and personal memories within the texts of twenty twentieth century Irish Cookbooks. Cookbooks are subjects of culinary nostalgia with the reading of a recipe capable of triggering in the reader a memory of a meal enjoyed, a dish cooked in times past by a loved one, or recollections of the disgust felt for a food hated in childhood. Independent from the reader, the culinary memories of the author can be captured at the time of publication in the text …