Platform & DevOps Engineer Senior

01/07/2026

Our client is looking for a Platform & DevOps Engineer

Service Description

The Platform Engineer in the Consumer Centricity Platform Operations team is responsible for the reliable, secure, and stable operation of the organization’s high-availability cloud platform, built on Kubernetes and composed of multiple in-house platform components.

The role focuses on platform lifecycle management, day-2 operations, incident response, and operational excellence, ensuring that customer-facing Web UIs and APIs remain available, performant, and secure 24/7.

The Platform Engineer acts as a technical custodian of the platform, providing a stable foundation on which service teams can safely deploy and operate their workloads.

Primary Objectives

  • Maintain platform availability and reliability in accordance with SLOs/SLAs
  • Ensure operational readiness of all environments (DEV / TEST / ACC / PROD)
  • Provide 24/7 operational coverage for critical platform services (via on-call)
  • Ensure the platform is observable, secure, well-controlled and documented
  • Execute platform changes, upgrades, and maintenance in a predictable and low-risk manner

Key Responsibilities

Kubernetes & Runtime Operations

  • Operate Kubernetes primitives and platform add-ons:
    • Ingress controllers
    • Service discovery
    • Workload identity
  • Troubleshoot Kubernetes-related failures:
    • Pod lifecycle issues
    • Networking problems
    • Resource starvation
  • Controlled rollouts with rollback plans

Reliability & 24/7 Incident Response

  • Participate in the 24/7 on-call rotation for critical services (incident responder)
  • Lead or contribute to:
    • Incident triage and mitigation
    • Root Cause Analysis (RCA)
    • Post-incident action tracking and follow-up
  • Maintain and improve runbooks and operational procedures

Observability & Monitoring

  • Operate (and use) the open-source observability platform
  • Ensure effective observability across the platform:
    • Metrics, logs, and distributed traces
    • Actionable alerts
    • Reduced false positives
  • Support incident analysis through correlation and telemetry inspection

Change, Release & Maintenance Management

  • Plan and execute platform changes
  • Follow structured change management practices
  • Stakeholder communication
  • Ensure platform changes are documented and auditable

Security & Compliance (Operational Focus)

  • Operate platform security controls:
    • RBAC
    • Network boundaries
    • Secret management
  • Apply security updates and patches to platform components
  • Support vulnerability remediation efforts
  • Provide operational evidence for audits and security reviews

Automation & Operational Improvement

  • Automate repetitive operational tasks where appropriate
  • Reduce operational risk through standardization and documented procedures
  • Platform as Code approach (GitOps)

Requirements

Technical Skills

Kubernetes (Deep Production Expertise)

  • Multi-cluster architecture & lifecycle management
  • RBAC & least-privilege design
  • Network policies & traffic segmentation
  • Stateful workloads & storage strategy (CSI, PV/PVC)
  • Autoscaling (HPA/VPA) & resource tuning
  • Pod Security Standards
  • Admission controllers
  • Performance & reliability troubleshooting
  • Cluster-level debugging (networking, DNS, scheduling, OOM, crash loops)

GitOps & Continuous Delivery

  • ArgoCD (advanced usage)
  • App-of-Apps pattern
  • Sync waves & hooks
  • Drift detection & reconciliation
  • Multi-environment promotion workflows
  • Git-based deployment strategy with version management
  • Declarative platform design with PR-driven changes
  • YAML-based CI/CD pipelines with Harness.io
  • Secure secret handling in CI/CD (with HashiCorp)

Packaging & Configuration

  • Helm (advanced chart authoring)
  • Reusable library charts
  • OCI-based registries
  • Values layering strategy
  • Kustomize overlays for multi-environment isolation and strategic patches

Container & Artifact Management

  • Docker (secure multi-stage builds, optimization)
  • Harbor (RBAC, replication, vulnerability scanning)
  • JFrog Artifactory (Docker & Helm registry management)
  • Artifact versioning & promotion strategy

Secrets & Security

  • HashiCorp Vault for dynamic secrets with CSI integration
  • Image vulnerability scanning integration
  • Supply chain security awareness
  • TLS & certificate lifecycle management
  • RBAC governance

Observability & Reliability

  • OpenTelemetry (metrics, logs, traces)
  • Prometheus or VictoriaMetrics (recording rules, HA setup)
  • Loki (log aggregation & LogQL)
  • Tempo (distributed tracing)
  • Grafana (advanced dashboards & alerting)
  • SLI/SLO design & error budget thinking
  • Alert noise reduction strategy

Networking (Advanced)

  • TCP/IP & DNS fundamentals
  • TLS & mTLS concepts
  • Kubernetes Services, Ingress & Reverse Proxy concepts
  • East-west vs north-south traffic
  • API routing & traffic management
  • Network Policies implementation

Automation

  • Advanced Bash scripting
  • Infrastructure automation mindset

Nice to Have

  • Kong API Gateway (API routing, plugins, authentication, rate limiting)
  • Redis (operational knowledge: deployment, persistence, clustering, backups)
  • PostgreSQL (migrations, backups, HA basics, Kubernetes deployment patterns)
  • MongoDB (replica sets, backups, Kubernetes deployment patterns)
  • Kargo on top of ArgoCD for release orchestration

Operational Skills

  • Proven experience in production operations or platform support roles
  • Ability to work calmly and methodically under pressure
  • Strong troubleshooting skills across distributed systems
  • Clear written and verbal communication during incidents and changes
  • Flexibility to balance daily operations with long term changes

Ways of Working

  • Structured, risk-aware, and detail-oriented
  • Comfortable with operational responsibility and accountability
  • Strong collaboration with Development teams, Security teams, Product teams
  • Documentation-first mindset for operational knowledge

Positioning vs Other Roles

  • Not a pure SRE role: focus is stability and operations, not reliability engineering
  • Not a pure DevOps engineer embedded in product teams
  • The role is the operational owner of the platform, in all environments, ensuring they runs safely and predictably

Additional Information

  • Location: Brussels
  • Onsite presence: By default, a physical presence on site is required for 2 days per week
  • Work regime: Fulltime

Job Application

This field is hidden when viewing the form
Upload your resume in .pdf, .doc or .docx format
Accepted file types: pdf, doc, docx, Max. file size: 25 MB.

Job specifications

ID: 13564

Duration: 03-08-2026 - 31-12-2027

Location: Brussels

Type: Freelance

Viktor Feyt

IT Recruitment Consultant