Raja Babu

Data Scientist

GitHub

About

Highly accomplished Data Scientist with a proven track record of designing, developing, and deploying advanced AI/ML solutions to drive operational efficiency, enhance decision-making, and improve customer experience. Expertise spans large language models (LLMs), natural language processing (NLP), recommendation systems, anomaly detection, and data-driven insights, backed by significant quantitative achievements in reducing time, costs, and improving accuracy.

Work Experience

Data Scientist

Digitate - Tata Research Development and Design Centre

Jul 2022 - Jul 2024

Led the design, development, and deployment of cutting-edge AI/ML and data science solutions, focusing on enhancing enterprise operational efficiency and decision-making through advanced analytical models and natural language processing.

  • Designed and implemented a multi-agent chatbot utilizing LangGraph, enabling CIOs and business users to query SQL, NoSQL, graph databases, and CSVs data via natural language. Automated dynamic query generation and integrated real-time visualizations, improving operational efficiency by 40%.
  • Applied prompt tuning techniques to enhance LLM response quality and maintain consistency across diverse enterprise queries.
  • Developed and implemented a knowledge graph-augmented RAG system to improve customer service response generation. Modeled inter-issue and intra-issue relationships from historical support tickets, achieving a 28.6% reduction in median issue resolution time and improving BLEU and MRR scores.
  • Fine-tuned and quantized the open-source LLaMA 7B model using HuggingFace Transformers and QLoRA for efficient long-context retrieval in domain-specific tasks. Configured for local GPU-based inference, enabling offline deployment for document understanding and contextual response generation.
  • Developed a context-aware recommendation system to address insight fatigue in IT operations. Reduced insight discovery time by 67% and increased adoption of actionable insights by 25%, significantly improving operational decision-making.
  • Applied graph-based community detection algorithms to cluster and summarize insights in natural language. Reduced manual synthesis time by 50% and improved insight-driven decision-making by 30%, enhancing clarity and efficiency for IT operations teams.
  • Designed an analytics-driven solution to extract actionable insights from unstructured ticket descriptions using clustering algorithms and domain-specific knowledge. Achieved 84.37% accuracy for system-generated tickets and 81.63% accuracy for user-generated tickets, reducing manual analysis efforts and enhancing IT issue prioritization.
  • Designed and implemented an LSTM-based anomaly detection system using PyTorch, reducing false alerts by 45% and achieving a detection accuracy of 97%. Processed thousands of logs per second with real-time detection, enhancing system reliability and operational efficiency. Deployed the solution for production use.

Education

Electronics and Telecommunications

Sinhgad College of Engineering

9.04/10 GPA

Aug 2018 - Aug 2022

Publications

Addressing Insight Fatigue with Insight Summarization

Saneet S., Raja Babu, Uday C. Bhookya, M. Natu (Accepted at COMSNETS)

Jan 2025

Research focusing on strategies to combat insight fatigue through effective summarization techniques, accepted for publication at COMSNETS 2025.

Addressing AIOps Insight Fatigue with Insight Chains

Raja Babu, Uday C. Bhookya, Saneet S., M. Natu (Submitted to ECML PKDD)

Jan 2025

Exploration of AIOps methodologies to manage insight fatigue using interconnected insight chains, submitted to ECML PKDD 2025.

Data-Driven Insight Generation and Creation of Contextually Consistent Chains Thereof

Raja Babu, Uday C. Bhookya, M. Natu

Jan 2024

Patent application (Application Number: 202421093804, Status: Pending) detailing a data-driven approach for generating and creating contextually consistent insight chains.

Skills

Programming Languages

  • Python
  • C++
  • JavaScript
  • SQL

Frameworks & Libraries

  • Django
  • FastAPI
  • Angular
  • LangChain
  • LangGraph
  • HuggingFace Transformers
  • PEFT

Machine Learning & Data

  • PyTorch
  • TensorFlow
  • Scikit-learn
  • Pandas
  • Numpy
  • Scipy
  • OpenCV
  • YOLO

DevOps & Tools

  • Docker
  • Git
  • CI/CD
  • Offline AI Deployment

Advanced Modelling Techniques

  • Deep Learning
  • Supervised & Unsupervised Learning
  • NLP
  • Time Series Analysis
  • LLMs
  • Generative AI
  • Recommender Systems
  • AI-powered Automation