Our client is looking for a Senior Cloud Native OPS Engineer
Description
As a Cloud (ML)Ops Engineer, you’ll work at the intersection of cloud infrastructure, DevOps, and machine learning operations. Together with your team, you’ll help build a reliable, scalable, and secure platform that supports data scientists and analysts throughout their entire workflow.
This includes:
- hosting a multi-user Jupyter environment and a cloud IDE;
- providing frameworks for training, storing, serving, and monitoring custom models, primarily for high throughput batch processing;
- exposing models via APIs for low latency request-response use cases;
- enabling Generative AI initiatives.
Your responsibilities include
- Designing and building cloud-native platform services for AI models and data pipelines.
- Collaborating with colleagues and stakeholders across countries to develop technical solutions.
- Managing infrastructure using tools like Terraform, Docker, and Kubernetes on AWS.
- Automating workflows for data processing and model lifecycle management (Airflow, Spark, and Python)
- Ensuring platform reliability, performance, and cost-efficiency.
- Supporting colleagues in using the platform, including onboarding and troubleshooting.
- Contributing to the evolution of our MLOps practices.
Profile
As a senior Cloud Native OPS Engineer, you have over 5 years of technical system expertise to perform technical cloud engineering services :
- You configure AWS services and work with Terraform scripting (infrastructure as a code), AWS networking/gateways, AWS Landing Zone setup, lambda and container services;
- You evaluate and translate requirements into design;
- You evaluate design benefits and trade-offs;
- You validate design compliance and support deployment of the design to ensure the requirements are met;
- You use development tools to efficiently solve technical or business challenges, incl. technology evolution, capacity management, and performance optimization;
- You innovate to present new ideas which improve an existing system/process/service;
- You maintain knowledge of existing technology documents via technical writing;
- You perform (complex) incident resolution and root cause analyses;
- On duty call for the systems you are responsible for, can be required.
What do we expect from you?
You have a strong interest in cloud, data and AI, and eager to learn about new developments in the field.
- Education or experience: Master’s degree in ICT, Engineering Sciences or Business Engineering with a focus on Informatics, or equivalent experience.
Technical skills
- Proficient in Python and the broader data science ecosystem.
- Experience with cloud infrastructure (preferably AWS).
- Familiar with Docker and Kubernetes.
- Skilled in infrastructure as code (Terraform).
- Experience with CI/CD tools like Jenkins or GitHub Actions.
- Knowledge of big data tools such as Spark.
Next to a proven experience in system software and cloud infrastructure, you have the following core competences : Adaptive, Analytical thinking, Collaborating, Flexible, IT Infrastructure, Result driven, Software development.