Senior AI Engineer (MLOps / AI-Ops)

Nguyen Duy Nam

Building production AI systems end-to-end — from problem framing and data pipelines to deployment, monitoring, and continuous improvement

AI
About Me

I have built Scalable AI Solutions

Nam Nguyen

Senior AI Engineer (MLOps / AI-Ops)

Senior AI Engineer with 4+ years delivering production AI systems across computer vision, NLP, and generative AI. Strong at translating business requirements into robust technical solutions, designing scalable architectures, and operating AI services reliably in real-world conditions.

Currently working at Success Software Services as Senior Machine Learning Engineer, where I lead system design and implementation for AI solutions including multi-turn conversational AI, smart retail/warehouse analytics, and large-scale multi-camera systems. Previously at FPT Software, delivered edge AI for smart driving, call center automation, and document OCR.

Career Path

Professional Experience

Jul 2025 - Present

Senior Machine Learning Engineer

Success Software Services
  • AI Smart Warehouse (Event-Driven AI Platform): Designed microservices and event-driven AI workflows with Kafka/RabbitMQ, GPU serving (Triton), and monitoring (Prometheus/Grafana), reducing manual product verification by 90% and enabling onboarding with 3 photos
  • Enterprise Smart Retail Analytics: Led end-to-end MLOps architecture for multi-store CV platform with automated data labeling, active learning, and A/B testing. Deployed across 10+ locations with 92% detection accuracy
  • AI Sales Assistant Chatbot: Built multi-turn conversational AI using LangGraph and Redis, increasing customer engagement by 40% and handling 1000+ daily inquiries
  • Live Streaming Platform: Created real-time engagement system with sub-second response, boosting viewer engagement by 45%
  • MCP Servers - Developer Productivity Tools: Partnered directly with CTO to architect and implement developer tooling infrastructure using Model Context Protocol (MCP). Led technical architecture design, requirements gathering and analysis, and hands-on development of core features enabling AI-powered developer workflows and IDE integrations
  • Strategic AI Architecture: Collaborated with Head of AI to design and propose enterprise-scale LLM integrations and distributed multi-camera surveillance infrastructure for large venues
  • Technologies: MCP (Model Context Protocol), LangGraph, Kubernetes, Docker, Kafka, RabbitMQ, Redis, NVIDIA Triton, Prometheus, Grafana, Next.js, FastAPI, Computer Vision, Active Learning, GitLab CI/CD
Jul 2021 - Jul 2025

AI Engineer

FPT Software, Quy Nhon City
  • AI for Smart Driving: Built collision prediction system with 0.92 mAP (vehicles), 0.88 mAP (pedestrians) and optimized edge inference to 80ms on Jetson Orin. Improved annotation efficiency by 86% through auto-labeling + active learning
  • MLOps Platform (Lead Engineer): Designed centralized MLOps ecosystem serving 5+ AI teams. Reduced debugging time by 30% through integrated monitoring (Prometheus/Grafana) and deployment time by 60% via automated CI/CD
  • Call Center NLP (MLOps Lead): Built end-to-end ML pipeline with Kubeflow on AWS EKS. Achieved 90% F1-score in call classification, reducing processing time by 75%
  • Construction OCR (AI Engineer): Developed OCR system for Japanese engineering drawings with 94% character accuracy, cutting document processing time by 65%
  • Leadership Impact: Led teams up to 8 engineers, established technical standards and mentoring programs
  • Technologies: PyTorch, Kubeflow, MLflow, Kubernetes, AWS EKS, Docker, FastAPI, Terraform, RabbitMQ, Celery
Mar 2020 - Aug 2020

Data Analyst Intern

SNAP Research Labs, Ho Chi Minh City
  • Developed Python-based data crawling pipelines for multi-market stock analysis
  • Gathered and annotated time-series data with focus on data security and anonymization
Featured Work

AI Projects

AI-Powered Collision Detection for Autonomous Vehicles

Collision Detection for Autonomous Vehicles

Real-time AI System with Auto-Labeling, 85% Accuracy, and Edge Deployment

PyTorch YOLOv8 Jetson AGX Orin MLOps
Automated Call Classification with NLP

Automated Call Classification System

Speech-to-Text & NLP achieving 90% F1-Score with MLOps Pipeline

RoBERTa Kubeflow AWS EKS Twilio
Enterprise MLOps Platform

Enterprise MLOps Platform

Centralized Infrastructure serving 5+ AI Teams with Enterprise Security

Kubeflow MLflow Kubernetes Prometheus
Conversational AI Sales Assistant

Conversational AI Sales Assistant

Multi-Turn Chatbot with Context Management handling 1000+ Daily Inquiries and +40% Engagement

LangGraph Redis Next.js LLM
Multi-Store Computer Vision Analytics

Enterprise Smart Retail Analytics

Real-time Customer Behavior Monitoring across 10+ Locations with 92% Detection Accuracy and Active Learning

Computer Vision Kubernetes Active Learning A/B Testing
Japanese Engineering Drawing OCR System

Engineering Drawing OCR System

Advanced OCR & Segmentation for Japanese Blueprints achieving 94% Accuracy

PaddleOCR PaddleSeg OCR DeepLabV3+
Real-time Live Stream Engagement System

Real-time Live Stream Engagement

AI-Powered Thank-You Messages with Sub-Second Response and Prompt Engineering

FastAPI LLM Real-time WebSocket
Automated MLOps Pipeline with AutoML

Automated MLOps Pipeline with AutoML

End-to-End Workflow with Hyperparameter Optimization reducing Manual Steps by 40%

Kedro MLflow Optuna PyCaret
Neural Machine Translation System

Neural Machine Translation System

mT5-Based Multilingual Model for Japanese-English-Vietnamese with Real-time Translation

mT5 Hugging Face NMT Transformers
Cross-lingual Speech Recognition

Cross-lingual Speech Recognition

Malay-English ASR System using wav2vec2-XLSR for Code-Switching

wav2vec2 XLSR-53 ASR Multilingual
Multimodal Fraud Detection System

Multimodal Fraud Detection System

Data Fusion Pipeline combining Video, Audio, Biometrics achieving 85% Accuracy

Multimodal AI Data Fusion Fraud Detection PyTorch
0
Projects Completed
0
Team Members Led
0
Years Experience
My Approach

MLOps-Driven AI Development

Problem Definition

Thoroughly understand the business requirements and define clear success metrics with stakeholders.

Data Engineering

Collect, clean, and prepare high-quality datasets. Implement robust data pipelines to ensure consistent flow.

Model Development

Design and experiment with various model architectures to find the optimal solution for the problem.

MLOps Implementation

Build robust CI/CD pipelines, monitoring systems, and deployment strategies for production-ready AI.

Continuous Improvement

Measure performance, gather feedback, and iteratively enhance the solution to deliver ongoing value.

Production-Ready MLOps Solutions

My methodology integrates MLOps best practices with business value. I specialize in architecting automated ML workflows using Kubeflow and MLflow, building robust CI/CD pipelines, and implementing monitoring systems that ensure reliable production deployments. This approach delivers not only high-performing models but also sustainable, scalable AI systems that align with organizational goals and accelerate time-to-market.

Discuss Your Project
My Expertise

Technical Skills

Hands-on across production AI systems: computer vision, NLP, and generative AI. I build end-to-end solutions (data → model → deployment → monitoring) with strong depth in reliability, scalable serving, and real-time/edge constraints.

Core Competencies

ML Frameworks & Tools

  • PyTorch, TensorFlow
  • Scikit-learn, Pandas, NumPy
  • OpenCV, Scikit-image
  • Hugging Face Transformers
  • Model evaluation & benchmarking
  • Production debugging & profiling

Generative AI & LLMs

  • LLM Fine-tuning (LoRA, QLoRA)
  • LangGraph, LangChain
  • Prompt Engineering
  • BLIP-2, Llama, Mistral, Ollama
  • RAG Applications
  • OpenAI API, ChatGPT, DALL-E 3

Infrastructure & DevOps

  • Kubernetes (CKA, CKAD)
  • Docker, Helm, Kubeadm
  • Linux System Administration
  • Git, GitLab CI/CD
  • FastAPI, Django REST, Flask
  • Kafka, RabbitMQ, Redis, Celery

MLOps & Deployment

  • MLOps pipelines (training → evaluation → deployment)
  • Docker & Kubernetes
  • CI/CD (GitLab, GitHub Actions)
  • Training pipelines & retraining automation
  • Prometheus, Grafana, Loki
  • Terraform & Infrastructure-as-Code
  • AWS EKS, On-Prem K8s
  • NVIDIA Triton, TensorRT, ONNX Runtime

Leadership

  • Team Management
  • Agile Methodologies
  • Technical Mentorship
  • Project Planning
  • Stakeholder Communication
Recognition

Awards & Achievements

2023

MLOps Marathon

Ranked 5th in the nationwide competition focused on ML pipeline development, model training optimization, and building resilient model serving APIs.

2023

Kaggle: Age-Related Conditions Challenge

Silver Medal (Rank 199/6430) in the ICR - Identifying Age-Related Conditions competition, using machine learning to predict medical conditions.

2022

IT Race - Ho Chi Minh Open University

Ranked 1st in this competitive event organized by the Faculty of Information Technology, showcasing expertise in mathematics, computer science, and related domains.

Professional Development

Certifications

Certified Kubernetes Administrator (CKA)

Linux Foundation

Kubernetes DevOps

Certified Kubernetes Application Developer (CKAD)

Linux Foundation

Kubernetes Cloud Native

Machine Learning Engineer Nanodegree

Udacity

ML Python

ML DevOps Engineer

Udacity

MLOps CI/CD

Cloud DevOps Engineer

Udacity

AWS DevOps

Data Engineering with AWS

Udacity

AWS Data Engineering

Data Streaming

Udacity

Kafka Real-time

Machine Learning

Coursera - Stanford University

ML Fundamentals
Get In Touch

Contact Me

Let's Connect

Feel free to reach out for collaboration, opportunities, or just to say hello. I'm always open to discussing new projects and ideas.