AI Systems Engineering Portfolio
Every query sending full prompt + knowledge base to GPT was expensive and slow, especially when many queries weren't even about meals.
Context Handling:
Patients needed to ask follow-up questions within a session, requiring
conversation history management while controlling token usage.
Solution Architecture
Designed 3-step pipeline separating decision logic into modular stages:
Step 1: Query Classification
Lightweight GPT call determines if query is about food/drink
If not meal: Return hard-coded rejection response (minimal tokens)
If meal: Continue to Step 2
Cost savings: Invalid queries rejected early with short prompts and
without processing knowledge base
Step 2: Meal Type Classification
Determine glucose impact category (Type A: high spike, B: moderate,
C: lower, D: minimal)If Type D (no spike): Return hard-coded response (no optimization needed)
If Types A-C: Pass meal type context to Step 3
Optimization: Reduces next step's complexity by providing
classification context
Step 3: Personalized Optimization
Full prompt with knowledge base chunks
Generate meal-specific tips: ingredient substitutions, portion control,
timing strategiesInclude behavioral interventions: apple cider vinegar, pre/post-meal exercise
Contextual advice based on meal type from Step 2
Follow-up Enhancement (Contract 2):
Token-based authentication for production security
Boolean flag indicating query limit applicability
Context-aware responses using last 5 message pairs
Maintained cost efficiency through selective history inclusion
Production Deployment
Complete system delivered production-ready:
FastAPI backend deployed on Fly.io with Docker containerization
Streamlit admin interface for knowledge base management and prompt configuration
iOS and Android apps built by client (Flutter frontend)
Token counting and limit management
Comprehensive logging and error handling
Complete API documentation
Business Impact
Resource Optimization:
Automated repetitive patient queries, freeing medical staff for complex
cases requiring human judgment.
Response Speed:
Seconds vs hours/days waiting for doctor message responses.
Scalability:
Subscription model enabled through AI automation - hundreds of concurrent
patients supported without proportional staff increases.
Accuracy Improvement:
Modular pipeline eliminated hallucinations by isolating decision steps.
Each stage focused on single responsibility with clear success criteria.
Cost Efficiency:
Early rejection of invalid queries reduced average token usage by 40-60%
across all interactions.
Client Testimonial
"He took a complex GPT-based health project with multiple logic layers
and delivered a clean, production-ready backend that exceeded expectations.
We worked on a custom 3-step GPT pipeline with user query classification,
personalized response generation, and rule-based output handling. Mohannad
not only implemented this pipeline with precision, but also introduced smart
architectural improvements—like breaking the flow into modular steps for
better reliability and cost control. Beyond code, he was responsive,
thoughtful with feedback, and always open to iteration. He provided complete
deployment (on Fly.io), prompt management, and even Streamlit UIs for admin
testing—all with documentation and clean handoff materials. Would absolutely
work with him again. Highly recommended for any GPT, backend, or API-heavy build."
Tech Stack
OpenAI (GPT-4 Turbo), FastAPI, FAISS (vector search), PyMuPDF (PDF processing),
Streamlit (admin UI), Docker, Fly.io deployment, Python logging, Pydantic validation
Payment Info AI Assistant
Client Industry: Finance / FinTech
Project Duration: 3-4 weeks | Status: Production | Client Rating: 4.9/5 stars
Business Challenge
Finance team processing 40,000+ daily transactions across 100+ Excel columns needed natural language access to their data. Current process required manual Excel searching or SQL knowledge to query transaction records, find error patterns, calculate success rates, reference documentation for error codes, and many more. Time-consuming, error-prone, and created bottleneck for non-technical staff needing transaction insights.
Technical Complexity
The 100+ Column Problem:
Cannot send all column names, descriptions, and possible values to LLM without:
Exceeding token limits
Massive cost per query
Hallucination from information overload
Slow response times
Traditional approach of "send everything to LLM and let it figure it out" was impossible at this scale.
Structured + Unstructured Integration:
System needed to query Excel data (structured) AND PDF documentation (unstructured) in single coherent response. Error codes in transactions required looking up detailed explanations and resolution steps in separate PDF files.
Pandas Query Generation:
LLM needed to generate syntactically correct Pandas queries from natural language without seeing actual data - only column metadata and user intent.
AWS Bedrock Integration:
Claude model via AWS Bedrock required proper IAM configuration, request formatting, and error handling different from standard API calls.
Solution Architecture
Column Embedding + Semantic Retrieval (The Breakthrough):
Instead of sending all 100+ columns to LLM, implemented intelligent column selection:
Embedded all column names + descriptions + possible values during preprocessing
Created FAISS index of column embeddings
For each query: Semantically search columns using query text
Retrieve only top 10-15 most relevant columns
Send only relevant subset to LLM with user query
Result: LLM sees only columns actually needed for that specific query, not entire schema.
Two-Stage Pipeline: Structured → Unstructured Enrichment
Stage 1: Structured Data Query
User query + relevant columns sent to Claude
Claude generates Pandas query code
Execute query on DataFrame
Return structured results (transactions, aggregations, etc.)
Stage 2: Unstructured Context Enhancement
Take Stage 1 results (e.g., error codes found)
Search PDF documentation using error codes as queries
Retrieve relevant documentation chunks via FAISS
Send query + structured results + documentation to Claude
Claude generates insights combining both data sources
Example: "Show 5 failed transactions today" → Stage 1 returns error codes → Stage 2 looks up error code documentation → Final response includes both transaction data and error explanations with resolution steps.
Pandas Query Generation Approach:
Provided Claude with minimal schema (only relevant columns from semantic search)
Column datatypes and example values for proper query syntax
Clear constraints and validation rules
Error handling for malformed queries with retry logic
AWS Infrastructure:
Boto3 integration with Claude via Bedrock
IAM role-based access (no hardcoded credentials)
Proper request formatting for Bedrock API
Error handling for rate limits and service issues
Production Deployment
Complete system with admin capabilities:
FastAPI backend with comprehensive API
Deployed on AWS EC2
Streamlit admin interface for data exploration and system monitoring
FAISS indexes for both column metadata and documentation chunks
Synonym resolver for column name variations
Logging system tracking all queries and results
Statistics dashboard for usage monitoring
Health check endpoints
Business Impact
Democratized Data Access:
Non-technical staff could query complex transaction data without SQL knowledge or Excel expertise.
Time Savings:
Eliminated manual searching through 100+ column spreadsheets and cross-referencing PDF documentation.
AI-Enhanced Insights:
System didn't just return data - it provided context from documentation and suggested next steps based on error patterns.
Accuracy Improvement:
Semantic column search ensured LLM had exactly the information needed, reducing hallucinations from information overload.
Scalability:
Architecture supports adding new columns, new documentation, new data sources without redesigning core system.
Client Testimonial
"I loved working with Mohannad. Mohannad is very patient and knowledgeable. I look forward to working with him on similar projects on AI/LLM/ML projects."
Tech Stack
AWS Bedrock (Claude), AWS EC2, FAISS (vector search), FastAPI, Streamlit, Pandas, Boto3 (AWS SDK), scikit-learn (embeddings), RapidFuzz (synonym matching), NLTK (text processing), Docker
Additional Client Work
Other engagements and specialized implementations
Gulf Arabic Show Transcription System
Client Industry: Entertainment / Media Production
Client Need
Transcribe Arabic dialect audio from video content for a Gulf-region entertainment show.
Technical Approach
Implemented OpenAI Whisper model optimized for Arabic dialect recognition. Handled audio extraction from video files, chunking for API limits, and post-processing for dialect-specific corrections. Delivered timestamped transcripts with speaker diarization capabilities.
Challenges Solved
Arabic dialects vary significantly from Modern Standard Arabic. Fine-tuned preprocessing and validation to handle Gulf-specific pronunciation patterns and terminology. Implemented quality checks comparing multiple transcription passes for accuracy.
Production Status
Delivered transcription service used for content creation workflow.
Tech Stack
OpenAI Whisper, FFmpeg (audio processing), Python, FastAPI
GPT-4o Fine-tuning for Script Generation
Client Industry: Entertainment / Media Production
Client Need
Fine-tune GPT-4o on show transcripts to generate similar entertainment content scripts.
Technical Approach
preparing training dataset from transcribed content, formatting for OpenAI fine-tuning API requirements, implementing evaluation metrics for script quality, and designing iterative improvement process.
Next Steps
Audio generation using fine-tuned model outputs, video generation combining audio with visual assets, complete content generation pipeline from concept to final video.
Challenges
Maintaining Gulf Arabic dialect authenticity in generated scripts, preserving show's tone and style through fine-tuning process, validating generated content against quality standards.
Tech Stack
OpenAI Fine-tuning API, GPT-4o, Python, data preprocessing pipelines
Technical Demonstrations
Personal projects showcasing MLOps capabilities and system design
Meeting Notes Automator
Problem Solved
Turn messy meeting audio into structured summaries with actionable tasks, owners, and deadlines.
Technical Approach
Multi-stage NLP pipeline: Assembly AI transcription → text cleanup (filler removal, punctuation, spell check) → BART summarization → Mistral LLM task extraction. Streamlit UI for upload and FastAPI backend for extensibility.
Tech Stack
Assembly AI, BART, Mistral, FastAPI, Streamlit, Docker
FAQ Chatbot with Semantic Search
Problem Solved
Multilingual FAQ chatbot with document upload - users provide CSV of Q&A pairs, system answers queries through semantic search + LLM.
Technical Approach
FAISS vector similarity search on embedded FAQ answers, retrieval feeding context to Mistral LLM for natural language responses. Includes Arabic/English translation, confidence scoring, and fallback handling.
Tech Stack
Sentence Transformers, FAISS, Mistral, FastAPI, React, Docker
Document-Based Question Answering System
Problem Solved
Extract precise answers from large unstructured documents (PDF, DOCX, TXT) using semantic search and fine-tuned models.
Technical Approach
Upload document → chunk and embed → FAISS indexing → semantic retrieval of relevant chunks → RoBERTa-based answer extraction fine-tuned on SQuAD2 dataset. Returns answer, confidence score, and source chunk.
Tech Stack
RoBERTa (fine-tuned), Sentence-BERT, FAISS, FastAPI, Streamlit, Docker
Iris MLOps Pipeline
Problem Solved
Production-ready MLOps demonstration covering entire ML lifecycle from data ingestion to production deployment with monitoring.
Technical Approach
End-to-end pipeline with DVC data versioning, MLflow experiment tracking, BentoML model serving, Kubernetes deployment on AWS EKS, Prometheus/Grafana monitoring, GitOps workflow with GitHub Actions CI/CD, Terraform infrastructure as code.
Production Features
Auto-scaling, load balancing, health checks, circuit breakers, CloudWatch logging, comprehensive observability.
Tech Stack
Scikit-learn, BentoML, MLflow, DVC, Kubernetes (EKS), Terraform, Prometheus, Grafana, AWS
RAG System with MLOps
Problem Solved
Production RAG platform for document upload and natural language querying with multi-user support and full MLOps infrastructure.
Technical Approach
PDF processing → chunking → AWS Titan embeddings → Pinecone vector storage → Claude 3 Sonnet Q&A. Complete microservices architecture with Redis caching, rate limiting, JWT auth, Nginx load balancing, Prometheus metrics, PostgreSQL metadata storage.
MLOps Features
Docker containers, CI/CD pipeline, infrastructure automation, horizontal scaling, comprehensive monitoring, security hardening.
Tech Stack
AWS Bedrock (Claude, Titan), Pinecone, FastAPI, PostgreSQL, Redis, Nginx, Prometheus, Docker
Interested in Technical Implementation?
These projects demonstrate production-grade engineering with focus on scalability, monitoring, security, and maintainability.
For AI systems engineering, MLOps implementation, or technical consulting:
Note: All projects include actual client testimonials validating delivery quality and business impact, you can also find my Upwork profile in the Menu.

