Technical Implementation

Portfolio

AI Systems Engineering Portfolio

Production deployments demonstrating end-to-end technical capability,
client satisfaction, and MLOps best practices.

From complex multi-stage pipelines to scalable production systems.

Featured Client Work

Production deployments with real business impact and client testimonials

Health Assistant GPT System

Client Industry: Healthcare / Diabetes Management
Project Duration: 3-4 weeks | Status: Production | Client Rating: 4.7/5 stars

Business Challenge

A healthcare provider needed an AI assistant to deliver personalized
glucose management guidance to hundreds of diabetic patients. The system
had to help patients understand how different meal ingredients affect
blood glucose levels, provide optimization strategies, and suggest
behavioral interventions - all without overwhelming medical staff with
repetitive consultation requests. The solution needed to be accurate,
cost-effective, and scalable to support a subscription-based revenue
model.

Technical Complexity

The project presented several significant challenges:

Unstructured Knowledge Base:

PDF documents containing glucose management guidelines lacked clear
structure for logical chunking. Used LLM-based preprocessing to extract
and organize sections systematically before embedding.

GPT Hallucination Problem:

Initial implementation with long prompts + knowledge chunks caused severe
accuracy issues. The model needed to simultaneously: classify query types
(meal vs greeting vs medical question), determine meal glucose impact
category (4 types: A-D based on spike potential), reject non-meal queries,
and generate personalized optimization advice with scientific backing.

The Whack-a-Mole Effect:

Prompt engineering to fix one accuracy issue created problems elsewhere.
After 2-3 weeks of iteration, the breakthrough came from architectural
redesign rather than prompt tuning.

Cost Optimization Need:

Production deployments demonstrating end-to-end technical capability,
client satisfaction, and MLOps best practices.

From complex multi-stage pipelines to scalable production systems.

Featured Client Work

Production deployments with real business impact and client testimonials

Health Assistant GPT System

Client Industry: Healthcare / Diabetes Management
Project Duration: 3-4 weeks | Status: Production | Client Rating: 4.7/5 stars

Business Challenge

Technical Complexity

The project presented several significant challenges:

Unstructured Knowledge Base:

PDF documents containing glucose management guidelines lacked clear
structure for logical chunking. Used LLM-based preprocessing to extract
and organize sections systematically before embedding.

GPT Hallucination Problem:

The Whack-a-Mole Effect:

Prompt engineering to fix one accuracy issue created problems elsewhere.
After 2-3 weeks of iteration, the breakthrough came from architectural
redesign rather than prompt tuning.

Cost Optimization Need:

Production deployments demonstrating end-to-end technical capability,
client satisfaction, and MLOps best practices.

From complex multi-stage pipelines to scalable production systems.

Featured Client Work

Production deployments with real business impact and client testimonials

Health Assistant GPT System

Client Industry: Healthcare / Diabetes Management
Project Duration: 3-4 weeks | Status: Production | Client Rating: 4.7/5 stars

Business Challenge

Technical Complexity

The project presented several significant challenges:

Unstructured Knowledge Base:

PDF documents containing glucose management guidelines lacked clear
structure for logical chunking. Used LLM-based preprocessing to extract
and organize sections systematically before embedding.

GPT Hallucination Problem:

The Whack-a-Mole Effect:

Prompt engineering to fix one accuracy issue created problems elsewhere.
After 2-3 weeks of iteration, the breakthrough came from architectural
redesign rather than prompt tuning.

Cost Optimization Need:

Every query sending full prompt + knowledge base to GPT was expensive and slow, especially when many queries weren't even about meals.

Context Handling:

Patients needed to ask follow-up questions within a session, requiring
conversation history management while controlling token usage.

Solution Architecture

Designed 3-step pipeline separating decision logic into modular stages:

Step 1: Query Classification

Lightweight GPT call determines if query is about food/drink
If not meal: Return hard-coded rejection response (minimal tokens)
If meal: Continue to Step 2
Cost savings: Invalid queries rejected early with short prompts and
without processing knowledge base

Step 2: Meal Type Classification

Determine glucose impact category (Type A: high spike, B: moderate,
C: lower, D: minimal)
If Type D (no spike): Return hard-coded response (no optimization needed)
If Types A-C: Pass meal type context to Step 3
Optimization: Reduces next step's complexity by providing
classification context

Step 3: Personalized Optimization

Full prompt with knowledge base chunks
Generate meal-specific tips: ingredient substitutions, portion control,
timing strategies
Include behavioral interventions: apple cider vinegar, pre/post-meal exercise
Contextual advice based on meal type from Step 2

Follow-up Enhancement (Contract 2):

Token-based authentication for production security
Boolean flag indicating query limit applicability
Context-aware responses using last 5 message pairs
Maintained cost efficiency through selective history inclusion

Production Deployment

Complete system delivered production-ready:

FastAPI backend deployed on Fly.io with Docker containerization
Streamlit admin interface for knowledge base management and prompt configuration
iOS and Android apps built by client (Flutter frontend)
Token counting and limit management
Comprehensive logging and error handling
Complete API documentation

Business Impact

Resource Optimization:

Automated repetitive patient queries, freeing medical staff for complex
cases requiring human judgment.

Response Speed:

Seconds vs hours/days waiting for doctor message responses.

Scalability:

Subscription model enabled through AI automation - hundreds of concurrent
patients supported without proportional staff increases.

Accuracy Improvement:

Modular pipeline eliminated hallucinations by isolating decision steps.
Each stage focused on single responsibility with clear success criteria.

Cost Efficiency:

Early rejection of invalid queries reduced average token usage by 40-60%
across all interactions.

Client Testimonial

"He took a complex GPT-based health project with multiple logic layers
and delivered a clean, production-ready backend that exceeded expectations.
We worked on a custom 3-step GPT pipeline with user query classification,
personalized response generation, and rule-based output handling. Mohannad
not only implemented this pipeline with precision, but also introduced smart
architectural improvements—like breaking the flow into modular steps for
better reliability and cost control. Beyond code, he was responsive,
thoughtful with feedback, and always open to iteration. He provided complete
deployment (on Fly.io), prompt management, and even Streamlit UIs for admin
testing—all with documentation and clean handoff materials. Would absolutely
work with him again. Highly recommended for any GPT, backend, or API-heavy build."

Tech Stack

OpenAI (GPT-4 Turbo), FastAPI, FAISS (vector search), PyMuPDF (PDF processing),
Streamlit (admin UI), Docker, Fly.io deployment, Python logging, Pydantic validation

Payment Info AI Assistant

Client Industry: Finance / FinTech
Project Duration: 3-4 weeks | Status: Production | Client Rating: 4.9/5 stars

Business Challenge

Finance team processing 40,000+ daily transactions across 100+ Excel columns needed natural language access to their data. Current process required manual Excel searching or SQL knowledge to query transaction records, find error patterns, calculate success rates, reference documentation for error codes, and many more. Time-consuming, error-prone, and created bottleneck for non-technical staff needing transaction insights.

Technical Complexity

The 100+ Column Problem:

Cannot send all column names, descriptions, and possible values to LLM without:

Exceeding token limits
Massive cost per query
Hallucination from information overload
Slow response times

Traditional approach of "send everything to LLM and let it figure it out" was impossible at this scale.

Structured + Unstructured Integration:

System needed to query Excel data (structured) AND PDF documentation (unstructured) in single coherent response. Error codes in transactions required looking up detailed explanations and resolution steps in separate PDF files.

Pandas Query Generation:

LLM needed to generate syntactically correct Pandas queries from natural language without seeing actual data - only column metadata and user intent.

AWS Bedrock Integration:

Claude model via AWS Bedrock required proper IAM configuration, request formatting, and error handling different from standard API calls.

Solution Architecture

Column Embedding + Semantic Retrieval (The Breakthrough):

Instead of sending all 100+ columns to LLM, implemented intelligent column selection:

Embedded all column names + descriptions + possible values during preprocessing
Created FAISS index of column embeddings
For each query: Semantically search columns using query text
Retrieve only top 10-15 most relevant columns
Send only relevant subset to LLM with user query

Result: LLM sees only columns actually needed for that specific query, not entire schema.

Two-Stage Pipeline: Structured → Unstructured Enrichment

Stage 1: Structured Data Query

User query + relevant columns sent to Claude
Claude generates Pandas query code
Execute query on DataFrame
Return structured results (transactions, aggregations, etc.)

Stage 2: Unstructured Context Enhancement

Take Stage 1 results (e.g., error codes found)
Search PDF documentation using error codes as queries
Retrieve relevant documentation chunks via FAISS
Send query + structured results + documentation to Claude
Claude generates insights combining both data sources

Example: "Show 5 failed transactions today" → Stage 1 returns error codes → Stage 2 looks up error code documentation → Final response includes both transaction data and error explanations with resolution steps.

Pandas Query Generation Approach:

Provided Claude with minimal schema (only relevant columns from semantic search)
Column datatypes and example values for proper query syntax
Clear constraints and validation rules
Error handling for malformed queries with retry logic

AWS Infrastructure:

Boto3 integration with Claude via Bedrock
IAM role-based access (no hardcoded credentials)
Proper request formatting for Bedrock API
Error handling for rate limits and service issues

Production Deployment

Complete system with admin capabilities:

FastAPI backend with comprehensive API
Deployed on AWS EC2
Streamlit admin interface for data exploration and system monitoring
FAISS indexes for both column metadata and documentation chunks
Synonym resolver for column name variations
Logging system tracking all queries and results
Statistics dashboard for usage monitoring
Health check endpoints

Business Impact

Democratized Data Access:

Non-technical staff could query complex transaction data without SQL knowledge or Excel expertise.

Time Savings:

Eliminated manual searching through 100+ column spreadsheets and cross-referencing PDF documentation.

AI-Enhanced Insights:

System didn't just return data - it provided context from documentation and suggested next steps based on error patterns.

Accuracy Improvement:

Semantic column search ensured LLM had exactly the information needed, reducing hallucinations from information overload.

Scalability:

Architecture supports adding new columns, new documentation, new data sources without redesigning core system.

Client Testimonial

"I loved working with Mohannad. Mohannad is very patient and knowledgeable. I look forward to working with him on similar projects on AI/LLM/ML projects."

Tech Stack

AWS Bedrock (Claude), AWS EC2, FAISS (vector search), FastAPI, Streamlit, Pandas, Boto3 (AWS SDK), scikit-learn (embeddings), RapidFuzz (synonym matching), NLTK (text processing), Docker

Additional Client Work

Other engagements and specialized implementations

Gulf Arabic Show Transcription System

Client Industry: Entertainment / Media Production

Client Need

Transcribe Arabic dialect audio from video content for a Gulf-region entertainment show.

Technical Approach

Implemented OpenAI Whisper model optimized for Arabic dialect recognition. Handled audio extraction from video files, chunking for API limits, and post-processing for dialect-specific corrections. Delivered timestamped transcripts with speaker diarization capabilities.

Challenges Solved

Arabic dialects vary significantly from Modern Standard Arabic. Fine-tuned preprocessing and validation to handle Gulf-specific pronunciation patterns and terminology. Implemented quality checks comparing multiple transcription passes for accuracy.

Production Status

Delivered transcription service used for content creation workflow.

Tech Stack

OpenAI Whisper, FFmpeg (audio processing), Python, FastAPI

GPT-4o Fine-tuning for Script Generation

Client Industry: Entertainment / Media Production

Client Need

Fine-tune GPT-4o on show transcripts to generate similar entertainment content scripts.

Technical Approach

preparing training dataset from transcribed content, formatting for OpenAI fine-tuning API requirements, implementing evaluation metrics for script quality, and designing iterative improvement process.

Next Steps

Audio generation using fine-tuned model outputs, video generation combining audio with visual assets, complete content generation pipeline from concept to final video.

Challenges

Maintaining Gulf Arabic dialect authenticity in generated scripts, preserving show's tone and style through fine-tuning process, validating generated content against quality standards.

Tech Stack

OpenAI Fine-tuning API, GPT-4o, Python, data preprocessing pipelines

Technical Demonstrations

Personal projects showcasing MLOps capabilities and system design

Meeting Notes Automator

Problem Solved

Turn messy meeting audio into structured summaries with actionable tasks, owners, and deadlines.

Technical Approach

Multi-stage NLP pipeline: Assembly AI transcription → text cleanup (filler removal, punctuation, spell check) → BART summarization → Mistral LLM task extraction. Streamlit UI for upload and FastAPI backend for extensibility.

Tech Stack

Assembly AI, BART, Mistral, FastAPI, Streamlit, Docker

FAQ Chatbot with Semantic Search

Problem Solved

Multilingual FAQ chatbot with document upload - users provide CSV of Q&A pairs, system answers queries through semantic search + LLM.

Technical Approach

FAISS vector similarity search on embedded FAQ answers, retrieval feeding context to Mistral LLM for natural language responses. Includes Arabic/English translation, confidence scoring, and fallback handling.

Tech Stack

Sentence Transformers, FAISS, Mistral, FastAPI, React, Docker

Document-Based Question Answering System

Problem Solved

Extract precise answers from large unstructured documents (PDF, DOCX, TXT) using semantic search and fine-tuned models.

Technical Approach

Upload document → chunk and embed → FAISS indexing → semantic retrieval of relevant chunks → RoBERTa-based answer extraction fine-tuned on SQuAD2 dataset. Returns answer, confidence score, and source chunk.

Tech Stack

RoBERTa (fine-tuned), Sentence-BERT, FAISS, FastAPI, Streamlit, Docker

Iris MLOps Pipeline

Problem Solved

Production-ready MLOps demonstration covering entire ML lifecycle from data ingestion to production deployment with monitoring.

Technical Approach

End-to-end pipeline with DVC data versioning, MLflow experiment tracking, BentoML model serving, Kubernetes deployment on AWS EKS, Prometheus/Grafana monitoring, GitOps workflow with GitHub Actions CI/CD, Terraform infrastructure as code.

Production Features

Auto-scaling, load balancing, health checks, circuit breakers, CloudWatch logging, comprehensive observability.

Tech Stack

Scikit-learn, BentoML, MLflow, DVC, Kubernetes (EKS), Terraform, Prometheus, Grafana, AWS

RAG System with MLOps

Problem Solved

Production RAG platform for document upload and natural language querying with multi-user support and full MLOps infrastructure.

Technical Approach

PDF processing → chunking → AWS Titan embeddings → Pinecone vector storage → Claude 3 Sonnet Q&A. Complete microservices architecture with Redis caching, rate limiting, JWT auth, Nginx load balancing, Prometheus metrics, PostgreSQL metadata storage.

MLOps Features

Docker containers, CI/CD pipeline, infrastructure automation, horizontal scaling, comprehensive monitoring, security hardening.

Tech Stack

AWS Bedrock (Claude, Titan), Pinecone, FastAPI, PostgreSQL, Redis, Nginx, Prometheus, Docker

Interested in Technical Implementation?

These projects demonstrate production-grade engineering with focus on scalability, monitoring, security, and maintainability.

For AI systems engineering, MLOps implementation, or technical consulting:

Book a Free Discovery Call

Note: All projects include actual client testimonials validating delivery quality and business impact, you can also find my Upwork profile in the Menu.

Let's Start

Book a 30-minute discovery call. I'll understand your operational challenges,

share relevant examples from my work, and explain what an engagement would look like.

No pressure. No sales pitch. Just a conversation about your operations.

Book a Free Discovery Call

Let's Start

Book a 30-minute discovery call. I'll understand your operational challenges,

share relevant examples from my work, and explain what an engagement would look like.

No pressure. No sales pitch. Just a conversation about your operations.

Book a Free Discovery Call

Let's Start

Book a 30-minute discovery call. I'll understand your operational challenges,

share relevant examples from my work, and explain what an engagement would look like.

No pressure. No sales pitch. Just a conversation about your operations.

Book a Free Discovery Call