Welcome to DSML 2024

5th International Conference on Data Science and Machine Learning (DSML 2024)

December 21 ~ 22, 2024, Sydney, Australia



Accepted Papers
Email Performance Predictions Without Campaign History

Sourabh Khot1, Venkata Duvvuri1, Heejae Roh1, and Anish Mangipudi2, 1College of Professional Studies, Northeastern University, 2Langley High School, Mclean, Virginia

ABSTRACT

Email will remain a vital marketing tool in 2024. Email marketing involves sending commercial emails to a targeted audience. It currently produces a significant ROI (return on investment) in the marketing sector [1]. This research paper presents a comprehensive study on predicting email open rates, focusing specifically on the influence of subject lines. The open-rate prediction algorithm SLk relies on the semantic features of subject lines utilizing a seed dataset of 4500 anonymized subject lines from diverse business sectors. The algorithm integrates data preprocessing, tokenization, and a custom-built repository of power words and negative words to enhance prediction accuracy. In our experiments the actual open rate margin of error was tracking close to whats allowed as per input error giving confidence that SLk can be directionally used for optimizing subject lines performance without prior history. The findings suggest that precise manipulation of subject line features can significantly improve the efficacy of email campaigns.

Keywords

Email Marketing, Open Rate Prediction, Subject Line Analysis, Machine Learning, Natural Language Processing


Improved Productivity With Ai Models for Sql Tasks: a Case Study

Thanh Vu, Sara Keretna, Richi Nayak and Thiru, Telstra Group Limited and Queensland University of Technology, Australia

ABSTRACT

This study investigates the practical deployment of AI-based Text-to-SQL (T2S) models on a real-world telecommunication dataset, aiming to enhance employee productivity. Our experiment addresses the unique challenges in telecommunication datasets not explored in previous works using annotated datasets. Leveraging advanced retrieval augmented generative (RAG) models like Vanna AI and Llamaindex, we benchmark their performance on synthetic datasets such as SPIDER and BIRD with different LLM backbones and subsequently compare the best-performing model to human performance on our proprietary dataset. We propose the Productivity Gain Index (PGI) to quantify the dual aspects of productivity improvement—time efficiency and accuracy—by comparing AI performance with human analysts across various SQL tasks. Results indicate significant productivity gains, with AI-based tools demonstrating superior query processing and accuracy performance. This prominent gap signals the potential of AI-based tool applications in the actual company domain for improved productivity.

Keywords

Text-to-SQL, Large Language Models, Productivity Gain Index, RetrievalAugmented Generation, Artificial Intelligence Evaluation.


Federated Learning With Differential Privacy Based on Summary Statistics

Peng Zhang1 and Pingqing Liu2, 1Faculty of Science, Kunming University of Science and Technology, Kunming, China, 2School of Management and Economics, Kunming University of Science and Technology, Kunming, China

ABSTRACT

In data analytics, privacy preserving is receiving more and more attention, privacy concerns results in the formation of ”data silos”. Federated learning can accomplish data integrated analysis while protecting data privacy, it is currently an effective way to break the ”data silo” dilemma. In this paper, we build a federated learning framework based on differential privacy. First, for each local dataset, the summary statistics of the parameter estimates and the maximum L2 norm of the coefficient vector for the polynomial function used to approximate individual log-likelihood function are computed and transmitted to the trust center. Second, at the trust center, gaussian noise is added to the coefficients of the polynomial function which approximates the full log-likelihood function, and the parameter estimates under privacy is obtained from the noise/privacy objective function, and the estimator satisfies (ε, δ)-DP. In addition, theoretical guarantees are provided for the privacy guarantees and statistical utility of the proposed method. Finally, we verify the utility of the method using numerical simulations and apply our method in the study of salary impact factors.

Keywords

Differential Privacy, Federated Learning, Gauss Function Mechanism, Summary Statistics.


Blockchain-based Demand-supply Matchingsystem for IOT Device Data Distribution

Kenta Kawai, Wu Yuxiao, Yutaka Matubara, and Hiroaki Takada, Graduate School of Informatics, Nagoya University, Aichi 464-8601

ABSTRACT

The booming of IoT devices has attracted significant interest in data integration platforms that enable seamless utilization and control of sensor data across various applications. However, most existing platforms are centralized structure, aggregating data on specific companies servers. This centralization raises privacy concerns and imposes limitations on data sharing with third parties. To address these challenges, this paper proposes a decentralized demand-supply matching system for IoT device data distribution using blockchain technology. The paper details the requirements for the entire matching system, including both users and IoT devices, and introduces a system concept alongside a practical implementation. Evaluation experiments conducted on a prototype system demonstrate the feasibility and effectiveness of the proposed approach.

Keywords

Blockchain, Data Marketplace, Demand-Supply Matching, IoT Data.


Scalable Consensus for Blockchain Networks

Vivek Ramji, Stony Brook University, New York, USA

ABSTRACT

This paper presents a novel scalable consensus algorithm designed for blockchain networks, aimed at improving transaction throughput and reducing latency in distributed systems. The proposed algorithm leverages a hierarchical structure of nodes, where consensus is achieved through a multi-layered approach that balances workload across the network. By utilizing dynamic node selection and adaptive communication protocols, the algorithm ensures robustness against network partitions and Byzantine failures. Experimental results demonstrate significant improvements in scalability, with the algorithm achieving high transaction throughput even under varying network conditions. The proposed approach provides a viable solution for enhancing the efficiency of blockchain networks in real-world applications.

Keywords

Distributed System, Consensus Algorithm, Fault Tolerance, Blockchain Concensus.


AI Driven Pest Detection Prevention and Management System

Ann Vincent1, Bhavesh Patil1, Aditya Murbade1, Hemalata Mote1, and Shruti Mehata2, 1Department of Electronics and Telecommunication Engineering, Don Bosco Institute of Technology, India, 2Founder, C.E.O., BharatGodam Pvt. Ltd. India

ABSTRACT

The project aims to develop a solution for warehouse monitoring and managing pest activity in large-scale grain storage facilities. Utilizing image processing techniques, artificial intelligence, and deep learning algorithms, this system is designed to detect, identify, track, and alert the movement of pests, specifically rats, in real-time. Deep learning algorithms are used for real-time object detection and identification, enabling precise recognition of rats. Additionally, sensors complement visual detection by identifying movement and activity through non-visual means, ensuring comprehensive monitoring even in obstructed areas. The system consists of a network of strategically placed sensors and cameras to provide continuous, real-time monitoring of the warehouse environment. Upon detecting a rat, the system logs the event’s exact time and location and triggers visual and audible alerts. Real-time notifications are sent to warehouse managers via SMS, email, or a mobile app, ensuring prompt response and intervention. By harnessing the power of AI, this project promises to revolutionize pest management in grain storage warehouses, enhancing food security and reducing economic losses.

Keywords

Image Captioning, Computer Vision, Natural Language Processing, Deep Learning, Artificial Intelligence, rat detection, Web application, Convolutional Neural Network (CNN).


Relational Representation Augmented Graph Attention Network for Knowledge Graph Completion

E. Aili1, 2, H. Yilahun1, 2, S. Imam1, 3, and A. Hamdulla1, 2, 1School of Computer Science and Technology, Xinjiang University, Urumqi 830017, China, 2Xinjiang Key Laboratory of Multilingual Information Technology, Urumqi 830017, China, 3School of National Security Studies, Xinjiang University, Urumqi 830017, China

ABSTRACT

Knowledge Graph Completion (KGC) is a popular topic in knowledge graph construction and related applications, aiming to complete the structure of knowledge graph by predicting missing entities or relations and mining unknown facts in the knowledge graph. In the KGC task, graph neural network (GNN)-based methods have achieved remarkable results due to their advantage of effectively capturing complex relations among entities and generating more accurate and rich entity representations by aggregating information from neighboring nodes. These methods mainly focus on the representation of entities, and the representation of relations is obtained using simple dimensional transformations or initial embeddings. This treatment ignores the diversity and complex semantics of relations, and restricts the efficiency of the model in utilizing relational information in the reasoning process. In this work, we propose the relational representation augmented graph attention network, which effectively identifies and weights neighboring relations that actually contribute to the target relation by filtering out irrelevant information through an attention function based on information and spatial domain. Furthermore, we capture complex patterns and features in the relational embedding by means of feed-forward network consisting of a series of linear transformations and nonlinear activation functions. Experiments demonstrate the very advanced performance of RRA-GAT on the link prediction task on standard datasets FB15k-237 and WN18RR(e.g., improved the MRR metric on the WN18RR dataset by 7.8%).

Keywords

Knowledge Graph Completion, Knowledge Graph Embedding, Graph Neural Networks.


Chinese Military Named Entity Recognition Based on Adversarial Training and Deep Multi-granularity Dilated Convolutions

Qiuyan. Ji1, 2, H. Yilahun1, 2, S. Imam1, 3, and A. Hamdulla1, 2, 1School of Computer Science and Technology, Xinjiang University, Urumqi 830017, China, 2Xinjiang Key Laboratory of Multilingual Information Technology, Urumqi 830017, China, 3School of National Security Studies, Xinjiang University, Urumqi 830017, China

ABSTRACT

Named entity recognition (NER) in the military domain is crucial for information extraction and knowledge graph construction. However, military NER faces challenges such as fuzzy entity boundaries and lack of public corpora. These problems make existing NER methods ineffective when dealing with short texts and social media content. To address these challenges, we construct a military news dataset containing 11,892 Chinese military news sentences, with a total of 69,569 named entities annotated. Simultaneously, we propose a Robust Dilated-W squared NER (RDWS) model based on adversarial training and deep multi-granularity dilated convolution. The model first uses Bert-base-Chinese to extract character-level features, and then combines the fast gradient method (FGM) for adversarial training. Contextual features are captured by the BiLSTM layer, and these features are further processed using deep multi-granularity dilated convolution layers to better capture complex inter-lexical interactions. Experimental results show that the proposed method performs well on multiple datasets.

Keywords

named entity recognition, adversarial training, Chinese military news, convolution.


Optimizing and Improving Question Answering(Qa) System Performance using Language Heuristics and Knowledge Distillation

Prasanth Yadla, Department of Computer Science, NC State University, USA

ABSTRACT

Question Answering(QA) systems have emerged as powerful platforms for automatically answering questions asked by humans in natural language using either a pre-structured database or a collection of natural language documents. It comes under the intersection of Natural Language Processing and Information Retrieval. In this project, we intend to build upon the existing BERT implementation of QA system. We use Knowledge Distillation, a regularization technique to compress learned representation DL models. We also incorporate Data Augmentation techniques to test our model against a varied dataset. Lastly, we perform Post Processing of the inferences with Linguistic Knowledge to make the predictions more reliable and false-positive free.

Keywords

Natural Language Processing, Question Answering, Knowledge Distillation , Language Heuristics, Deep Learning.


An Integrated Deep Learning with Natural Language Processing Models for Sentiment Analysis and Classification using Arabic Tweets

Ebtesam Hussain Almansour, Department of Computer Science, Applied Collage, Najran University, Najran, Saudi Arabia

ABSTRACT

The growing acceptance of social media networks as a platform to share opinions on several features emerged opinion mining or sentiment analysis (SA) as an active investigation part. In recent times, SA has attracted significant attention owing to its various applications in different features of our lives. SA is one of the Natural Language Processing (NLP) that purposes to analyze and process data that is transcribed in human languages. Even though the Arabic language is the most extensively spoken language utilized for content sharing through social media, the SA on Arabic content is restricted owing to numerous challenges with the language’s morphologic structures, the dialects variabilities, and the absence of the proper corpora. In recent times, deep learning (DL) and machine learning (ML) have demonstrated extraordinary achievements in the field of SA for Arabic tweet classification in social media platforms. In this manuscript, we design and develop an Integrated Deep Learning with Natural Language Processing Models for Sentiment Analysis and Classification (IDLNLPM-SAC) technique. The IDLNLPM-SAC model presents a sentiment analysis and classification using Arabic tweets. The presented IDLNLPM-SAC model follows different levels of data preprocessing to transform the raw Arabic tweet data into a compatible format. For the process of word embedding, the latent semantic analysis (LSA) technique can be deployed. Besides, the hybrid of parallel temporal convolutional network–gated recurrent unit (PTCN-GRU) classifier can be implemented for the classification process. Eventually, the parameter choice of the PTCN-GRU algorithm can be implemented by the design of the improved marine predator algorithm (IMPA). The simulation evaluation of the IDLNLPM-SAC technique takes place using the Arabic tweets database. The experimental results pointed out the heightened solution of the IDLNLPM-SAC technique compared to recent approaches.

Keywords

Sentiment Analysis; Deep Learning; Arabic Tweet; Latent Semantic Analysis; Marine Predator Algorithm.


A Survey Paper Exploring It Outsourcing Models and Market Trends

Merita Bakiji, Faculty of Contemporary Sciences and Technologies , South East European University , Tetovo, North Macedonia

ABSTRACT

As a result of the great boom experienced by global business, rapid technological developments, IT Outsourcing came as a result of organizations attempts to reduce operational costs and increase efficiency through external expertise.Through this study, it is intended to explore the current models of IT Outsourcing, detailing their sustainability and suitability in different market environments.This goal is attempted to be achieved by relying on a comprehensive summary of existing literature, articles and existing studies on IT Outsourcing, industry reports, consultancy reports, technological trends and their impacts on the market.The study also analyzes the IT Outsourcing industry map in the Republic of North Macedonia revealing the IT Outsourcing market and trends.By synthesizing existing research and data, this paper presents a valuable resource for decision makers in IT outsourcing, by providing practical recommendations that can serve organizations that are constantly trying to adapt to with rapidly changing market conditions.

Keywords

IT Outsourcing, Artificial Intelligence, Market Trends, North Macedonia.


Wireless Computing: a Mathematical Approach

Arun Kumar Singh, School of MCS, PNG University of Technology, Lae, Papua New Guinea

ABSTRACT

The use of wireless interface is a cornerstone of new-generation communication systems and is widely applied in different domains including IoT, mobile devices and sensor ones. This research paper aims at studying wireless computing from a mathematical perspective and particular areas of discussion include signal propagation, wireless channel characterization, system capacity and error control. We investigate the simple wireless communication and expand mathematical models/theories and equations to analyze the nature of the wireless systems, uses in networking and optimization. Wireless computing has become one of the most important aspects of communication in present world where data transfer across different networks is possible without any physical connections. Wireless computing systems are systems that consist of parameters of ideal systems, and use aspects of signal processing, network optimization and information theory. In this case, we discuss on mathematical models utilized in wireless communication channels; propagation models, path loss equations and interference management. Further, the paper underscores some of the key issues with the use of graph theory, pointers to the queuing theory to organize through realistic algorithm with the general aim of improving the organization of network resources as a means toward scaling up wireless networks. This theory gives details on many of the advanced topics such as error-correcting codes, modulation schemes and cryptographic methods needed in secure communication in wireless computing environment. Hence, this research seeks to provide a mathematical approach in the design, analysis and optimization of wireless systems, which we hope will help in the development of next generation wireless technologies including 5G and IoT.

Keywords

Wireless Computing, Wireless Communication, Signal Propagation, Network Capacity, Error Correction, Mathematical Modeling.


Opioid Crisis and Data Analytics: Preventing Overdoses Through Predictive Models

Vedamurthy Gejjegondanahalli Yogeshappa, Manager/Automation Architect, Leading Health Management Company, Dallas, United States

ABSTRACT

The opioid problem continues to be something that is quite widespread in its effects on the population and contributes to thousands of deaths by overdose each year. Even after concerted efforts being made by governments and healthcare systems, deaths resulting from opioids continue to present a very difficult nut to crack. One perfect solution could be the deployment of data analytics to be able to prevent overdose incidents before they happen. This journal article focuses on the attempt to introduce a new concept in the healthcare and law enforcement areas for finding high-risk people and areas. It also talks about how the application of algorithms such as machine learning and natural language processing, among others, are of help in identifying abusive patterns, prescription anomalies or socioeconomic risks that come with prescription. The article describes the expected advantages of real-time monitoring, data aggregation from various sources, including EHRs, PDMPs, and social media, and the development of per-geography and demographic methods and models. The research also addresses ethical aspects of using data as well as privacy issues and a probability of bias in a predictive model, insisting on reporting all the methods used and frequent checks to avoid possible misapplications. Additionally, it assesses the involvement of healthcare provider implementation, data science, and policy in preventing the opioid crisis. In this paper, several advanced machine learning techniques, which include decision trees and random forests, as well as the more complex deep learning algorithms, show how the identification of effective early interventions, which are often hard to design, can help reduce overdose and enhance patient outcomes [18]. As with any analytical approach to a particular problem, we have strengths and weaknesses when applying data analytics to the opioid crisis. Machine learning algorithms themselves have been shown to be highly accurate at predicting those who may become opioid users; however, their implementation in practice entails embedding models into the current healthcare frameworks, stakeholder coordination, and addressing ethical issues. The conclusion insists on the further development of research in the sphere of predictive analytics in cases of opioid overdose, as well as the legal regulation of patient rights.

Keywords

Opioid crisis, Data Analytics, Predictive models, Machine learning, Healthcare data, Public health.


Competitive Study of Hybridized Yolo Pipeline Systems for Helmet Detection

Vaikunth, Dejey, Vishaal, and Balamurali, Department of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai, India

ABSTRACT

Helmet detection is crucial for advancing protection levels in public road traffic dynamics. This problem statement translates to an object detection task. Therefore, this paper compares recent You Only Look Once (YOLO) models in the context of helmet detection in terms of reliability and computational load. Specifically, YOLOv8, YOLOv9, and the newly released YOLOv11 have been used. Besides, a modified architectural pipeline that remarkably improves the overall performance has been proposed in this manuscript. This hybridized YOLO model (h-YOLO) has been pitted against the independent models for analysis. The models were tested using a range of standard object detection benchmarks such as recall, precision, and mAP (Mean Average Precision). In addition, training and testing times were recorded to provide the overall scope of the models in a real-time detection scenario.

Keywords

Object Detection, Traffic Safety, YOLO, Deep Learning, Hybrid Architecture, CNN.


The Power of Artificial Intelligence in Project Management: a Review and Evaluation Study

Heidrich Vicci, College of Business Florida International University, USA

ABSTRACT

Examining the Artificial Intelligence (AI) models can provide clear guidance for project management practice, even in outer areas that they may not have conceived. AI affords virtuous circles as symptom detection may afford novel datasets, diagnostic feedback for ML model building, and advocacy for the value and function of AI analysis of the diagnostic classifications. AI variables could also have direct predictive value as they are proposed to have some mechanism with the outcome, and AI has the potential to detect novel mechanisms. Finally, AI might use it to detect how context effects change the nature of the effects of other variables and use that to select custom actions within the nomothetic guidelines. (Sarkar et al.2022)(Wang et al., 2023)(Yathiraju2022)

Keywords

Artificial Intelligence (AI), AI models, Project Management (PM).


Application of Artificial Intelligence Information Technology in Body Shape Information Detection and Optimization Modeling Intervention of Li and Uyghur Students

Difeng Fan1 Zhiwei Li2 Qinqin Yang3 Wenhuan Huang1 Wenting Hao1, 1Hainan College of Economics and Business, Haikou, 571127, China, 2University of Sanya Instituer of physical,Sanya, 572022,China, 3Xinjiang Police Academy,urumqi, 830011, China

ABSTRACT

The bioinformatics information and physical health level of students of all nationalities are the issues that we have been concerned about for a long time, which is of great significance to improve the overall quality of all nationalities. Different people have different living and eating habits, so their health and body shape are very different. In order to explore the differences in the body shape of teenagers under the influence of different altitudes, climates, diets and living habits, this paper conducted data induction and intelligent analysis of the body shape of Uyghur students in Xinjiang and Li students in Hainan through artificial intelligence detection, information retrieval technology, investigation and interview, literature review, statistical analysis, data mining and other methods. The main factors affecting the difference of body shape between the two groups of students were determined, and on this basis, effective artificial intelligence intervention planning strategies for middle school physical education curriculum were put forward. Strengthen mental health intelligent control intervention education; An intelligent system architecture approach to improve the nutritional structure of students.

Keywords

Artificial intelligence; Information detection; Uyghur nationality; The Li nationality, living in Yunnan; Body shape; Optimal modeling.


A Study on Physical Testing and Intervention of Tajik and Tibetan Boys From the Perspective of Sports Psychology

Wenting Hao1, Yi Jiang2, Wenhuan Huang1, Difeng Fang1, YalingZhang2, 1Hainan Economic and Trade Vocational College,571127, 2Hainan Normal University,571158

ABSTRACT

Located at 37°46 22.51 N and 75°13 17.38" E, Tashkurgan Tajik Autonomous County, Xinjiang, China, is 3094.65 meters above sea level. Among Chinas 56 ethnic groups, the Tajiks are the only white people. Lhasa, Tibet is located at 29°38 53.80 north latitude, 91°08 21.56 east longitude, at an altitude of 3656.73 meters. High altitude areas, different latitude and longitude, constitute distinct geographical features, which to a certain extent caused the difference in the physical health of residents in different areas. Taking the body shape of Tajik and Tibetan students as the research object, this paper conducts a comparative study and statistical analysis on the body shape of Tajik and Tibetan students aged 7-18 from the perspective of sports psychology, and finds out the body shape characteristics of students living in high altitude areas with different longitudes and latitudes, with a view to adopting effective sports psychology intervention. To improve and optimize the physique and body shape of Tajik and Tibetan boys and improve their overall physical health.

Keywords

Sports psychology, physique, physique, Tajik, Tibetan


Professional Development for Teachers in Ict

Blondel Seumo, UAE

ABSTRACT

Professional development in Information and Communication Technology (ICT) is a cornerstone for equipping teachers with the skills and confidence necessary to navigate the demands of 21st-century education. This study examines the current state of ICT-focused professional development for teachers, analyzing its impact on teaching practices and student outcomes. Using a mixed-methods approach, the research collected data through surveys, interviews, classroom observations, and case studies from a diverse sample of educators. Findings reveal that while ICT training positively influences teaching effectiveness and student engagement, challenges such as limited access, lack of tailored training, and insufficient infrastructure persist, particularly in rural areas. Successful models from Singapore, Finland, and Kenya highlight the importance of systemic approaches and continuous support. The study underscores the need for context-specific training programs, ongoing mentorship, and strategic investment in ICT resources. It also emphasizes the transformative potential of ICT in fostering innovative, inclusive, and equitable education. By addressing barriers and leveraging global best practices, this research provides actionable recommendations for educators, policymakers, and stakeholders. These include integrating ICT into pre-service teacher education, adopting blended learning models for professional development, and ensuring equitable access to resources. The study concludes with a call to action to prioritize ICT professional development as a key driver of educational reform, bridging the digital divide and preparing students for the future.


AI-driven Software Testing: a Deep Learning Perspective

Chandra Shekhar Pareek, Independent Researcher, Berkeley Heights, New Jersey, USA

ABSTRACT

In the ever-evolving landscape of software development, traditional testing methodologies falter under the complexities of modern systems such as microservices and distributed architectures, coupled with accelerated release demands. This paper explores Deep Learning (DL) as a transformative force in software testing, proposing advanced frameworks leveraging neural networks, reinforcement learning, and generative models to automate test case generation, predict high-risk areas, facilitate fault localization, and optimize test prioritization—significantly reducing manual effort and resource usage. Key techniques include Convolutional Neural Networks (CNNs) for fault detection, Long Short-Term Memory (LSTM) networks for predictive test case creation, and autoencoders for anomaly detection, augmented by transfer learning and semi-supervised methods to overcome limited labeled datasets. Integrated seamlessly into CI/CD pipelines, these approaches deliver adaptive, self-optimizing testing strategies, prioritizing high-risk components dynamically. Despite challenges like computational overhead and data requirements, advancements in AI, edge computing, and XAI promise a paradigm shift, enhancing efficiency, accuracy, and agility in quality assurance.

Keywords

Deep Learning, Software Testing, Test Case Generation, Fault Detection, Predictive Analytics, Automation, Quality Assurance, Optimization.