Associate ML Ops Specialist (Remote)
Key Responsibilities
- Manage the end-to-end machine learning model lifecycle, from data ingestion and training to production deployment.
- Implement and refine CI/CD pipelines for machine learning to ensure rapid, reliable model releases.
- Containerize ML applications with Docker and manage them in production using orchestration tools like Kubernetes.
- Monitor model performance, system health, and data integrity with tools like Prometheus and Grafana to proactively resolve issues.
- Collaborate with data scientists to provide the necessary infrastructure and tooling for model development.
- Automate operational tasks and infrastructure provisioning via Python scripting and Infrastructure as Code (IaC) tools like Terraform.
- Document workflows, system architectures, and operational procedures for team clarity and system maintainability.
Core Qualifications
- Bachelor’s degree in Computer Science, Engineering, Data Science, or a related technical field.
- Foundational programming skills in Python for scripting and automation.
- Knowledge of core MLOps concepts, including the ML lifecycle, model deployment, and performance monitoring.
- Academic or project-based exposure to containerization (Docker) and orchestration (Kubernetes).
- Proficiency in Linux/Unix environments and the command-line interface.
- Strong communication skills to articulate technical concepts clearly to stakeholders.
- Experience with at least one major cloud platform (AWS, GCP, Azure) is a significant plus.
