Mohamad Zamini | Machine Learning Engineer

Hi, I'm Mohamad Zamini.

A

Self-driven and passionate about advancing machine learning, I am focused on pushing the boundaries of Large Language Models (LLMs) through innovative research and development. As a final-year PhD Candidate with hands-on experience in optimizing LLMs during my recent internship, I am committed to solving complex real-world problems with cutting-edge AI technology. My ambition is to contribute to the future of AI by developing scalable and efficient models that can transform industries and enhance human-computer interaction.

About

I am a PhD candidate in Computer Science with a focus on optimizing Multimodal Large Language Models (MLLMs) to enhance their reasoning capabilities. My work involves accelerating LLMs through advanced techniques such as pruning, ensuring performance is maintained or improved. I have hands-on experience with foundational models, having previously interned at Numenta Inc., and I am currently developing innovative approaches like Mixture of Depth (MoD), Mixture of Experts (MoE) for resamplers, and attention pruning to push the boundaries of MLLM efficiency and scalability.

Programming Languages: Python, C++
Databases: MySQL, MongoDB, PostgreSQL
Libraries & Frameworks: PyTorch, TensorFlow, Hugging Face Transformers, Keras, NumPy, Pandas, OpenCV
Model Optimization & Deployment: ONNX, TensorRT, TorchServe, FastAPI
Tools & Platforms: Git, Docker, Kubernetes, AWS, GCP, Azure, JIRA, Weights & Biases (wandb)

Seeking a challenging position that leverages my expertise in Machine Learning and Software Engineering, offering opportunities for professional development, innovative experiences, and personal growth.

Experience

Microsoft

Data Science Intern

Will be Updated soon.
Will be Updated soon.
Tools: Python, PyTorch

May 2025 - Aug 2025 | Redmond, WA

Numenta

Machine Learning Engineer Intern

Fine-tuned LLM models, including Mistral, LLaMA, and GPT, leveraging techniques such as activation sparsity and attention sparsity to optimize performance.
Applied techniques such as KWTA, dynamic context pruning, and KV caching to enhance model efficiency.
Tools: Python, PyTorch, Accelerate, GPT, llama

July 2024 - Sept 2024 | Redwood city, CA

TEVERRA

Digital Innovation Intern

During my internship at Petrolern as a Digital Innovation Intern, I gained experience in both machine learning and data compression techniques
I developed a semantic compression technique using a deep autoencoder to effectively map data tuples into a lower-dimensional representation
As a machine learning engineer, I built models for analyzing geothermal data and improved their performance through algorithmic optimization

June 2022 - Aug 2022 | Atlanta, GE

Lifeweb

NLP Engineer

Fine-tuned models like BART for summarization on Persian text data.
Implemented Matrix Factorization for topic modeling.
utilized BiLSTM-CRF Models for sequential tagging.
Tools: Python, Scikit-learn, NLTK

June 2018 - Aug 2019 | Tehran, Iran

Projects

                
Explainability analysis
                  SHAP vs Lime Vs ELI5
                
AccomplishmentsTools: Python, PyTorch
To explain the model's predictions, the project uses model interpretability tools such as SHAP (SHapley Additive exPlanations), Lime (Local Interpretable Model-agnostic Explanations), and Eli5 (Explain Like I'm 5). These tools provide insights into how the model makes decisions and highlight the importance of different features in predicting strokes..

Causal Inference
                  Causal Inference with Bayesian Networks
                
AccomplishmentsTools: Python, PyTorch

Bi-directional Autoregressive Transformers From scratch
                  A simple Bi-directional Autoregressive Transformers From scratch.
                
AccomplishmentsTools: Python, PyTorch
implement the tokenizer, the model and tune it just the way we want.

image captioning
                  VIT + GPT2 image captioning
                
AccomplishmentsIncorporated Convolution Neural Networks (CNN) for extracting image features and Long Short Term
                    Memory for extracting question
                    embeddings.

Attention-based Graph Neural Network
                  Multi-Label Text Classification using Attention-based Graph Neural Network.
                
AccomplishmentsMulti-Label-Text-Classification-using-Attention-based-Graph-Neural-Network.

GPT2 for writing Python code
                  explore how to finetune the GPT2 and create a Python Question answering mdoel like chatgpt
                
AccomplishmentsDevelop simple chatbot.