01/07/2026

Our client is looking for a Platform & DevOps Engineer

Service Description

The Platform Engineer in the Consumer Centricity Platform Operations team is responsible for the reliable, secure, and stable operation of the organization’s high-availability cloud platform, built on Kubernetes and composed of multiple in-house platform components.

The role focuses on platform lifecycle management, day-2 operations, incident response, and operational excellence, ensuring that customer-facing Web UIs and APIs remain available, performant, and secure 24/7.

The Platform Engineer acts as a technical custodian of the platform, providing a stable foundation on which service teams can safely deploy and operate their workloads.

Primary Objectives

Maintain platform availability and reliability in accordance with SLOs/SLAs
Ensure operational readiness of all environments (DEV / TEST / ACC / PROD)
Provide 24/7 operational coverage for critical platform services (via on-call)
Ensure the platform is observable, secure, well-controlled and documented
Execute platform changes, upgrades, and maintenance in a predictable and low-risk manner

Key Responsibilities

Kubernetes & Runtime Operations

Operate Kubernetes primitives and platform add-ons:
- Ingress controllers
- Service discovery
- Workload identity
Troubleshoot Kubernetes-related failures:
- Pod lifecycle issues
- Networking problems
- Resource starvation
Controlled rollouts with rollback plans

Reliability & 24/7 Incident Response

Participate in the 24/7 on-call rotation for critical services (incident responder)
Lead or contribute to:
- Incident triage and mitigation
- Root Cause Analysis (RCA)
- Post-incident action tracking and follow-up
Maintain and improve runbooks and operational procedures

Observability & Monitoring

Operate (and use) the open-source observability platform
Ensure effective observability across the platform:
- Metrics, logs, and distributed traces
- Actionable alerts
- Reduced false positives
Support incident analysis through correlation and telemetry inspection

Change, Release & Maintenance Management

Plan and execute platform changes
Follow structured change management practices
Stakeholder communication
Ensure platform changes are documented and auditable

Security & Compliance (Operational Focus)

Operate platform security controls:
- RBAC
- Network boundaries
- Secret management
Apply security updates and patches to platform components
Support vulnerability remediation efforts
Provide operational evidence for audits and security reviews

Automation & Operational Improvement

Automate repetitive operational tasks where appropriate
Reduce operational risk through standardization and documented procedures
Platform as Code approach (GitOps)

Requirements

Technical Skills

Kubernetes (Deep Production Expertise)

Multi-cluster architecture & lifecycle management
RBAC & least-privilege design
Network policies & traffic segmentation
Stateful workloads & storage strategy (CSI, PV/PVC)
Autoscaling (HPA/VPA) & resource tuning
Pod Security Standards
Admission controllers
Performance & reliability troubleshooting
Cluster-level debugging (networking, DNS, scheduling, OOM, crash loops)

GitOps & Continuous Delivery

ArgoCD (advanced usage)
App-of-Apps pattern
Sync waves & hooks
Drift detection & reconciliation
Multi-environment promotion workflows
Git-based deployment strategy with version management
Declarative platform design with PR-driven changes
YAML-based CI/CD pipelines with Harness.io
Secure secret handling in CI/CD (with HashiCorp)

Packaging & Configuration

Helm (advanced chart authoring)
Reusable library charts
OCI-based registries
Values layering strategy
Kustomize overlays for multi-environment isolation and strategic patches

Container & Artifact Management

Docker (secure multi-stage builds, optimization)
Harbor (RBAC, replication, vulnerability scanning)
JFrog Artifactory (Docker & Helm registry management)
Artifact versioning & promotion strategy

Secrets & Security

HashiCorp Vault for dynamic secrets with CSI integration
Image vulnerability scanning integration
Supply chain security awareness
TLS & certificate lifecycle management
RBAC governance

Observability & Reliability

OpenTelemetry (metrics, logs, traces)
Prometheus or VictoriaMetrics (recording rules, HA setup)
Loki (log aggregation & LogQL)
Tempo (distributed tracing)
Grafana (advanced dashboards & alerting)
SLI/SLO design & error budget thinking
Alert noise reduction strategy

Networking (Advanced)

TCP/IP & DNS fundamentals
TLS & mTLS concepts
Kubernetes Services, Ingress & Reverse Proxy concepts
East-west vs north-south traffic
API routing & traffic management
Network Policies implementation

Automation

Advanced Bash scripting
Infrastructure automation mindset

Nice to Have

Kong API Gateway (API routing, plugins, authentication, rate limiting)
Redis (operational knowledge: deployment, persistence, clustering, backups)
PostgreSQL (migrations, backups, HA basics, Kubernetes deployment patterns)
MongoDB (replica sets, backups, Kubernetes deployment patterns)
Kargo on top of ArgoCD for release orchestration

Operational Skills

Proven experience in production operations or platform support roles
Ability to work calmly and methodically under pressure
Strong troubleshooting skills across distributed systems
Clear written and verbal communication during incidents and changes
Flexibility to balance daily operations with long term changes

Ways of Working

Structured, risk-aware, and detail-oriented
Comfortable with operational responsibility and accountability
Strong collaboration with Development teams, Security teams, Product teams
Documentation-first mindset for operational knowledge

Positioning vs Other Roles

Not a pure SRE role: focus is stability and operations, not reliability engineering
Not a pure DevOps engineer embedded in product teams
The role is the operational owner of the platform, in all environments, ensuring they runs safely and predictably

Additional Information

Location: Brussels
Onsite presence: By default, a physical presence on site is required for 2 days per week
Work regime: Fulltime

Job Application

Mobile phone

Your Email Address

This field is hidden when viewing the form

Job ID

Upload Your Resume(Required)

Upload your resume in .pdf, .doc or .docx format

Accepted file types: pdf, doc, docx, Max. file size: 25 MB.

CAPTCHA

Job specifications

ID: 13564

Duration: 03-08-2026 - 31-12-2027

Location: Brussels

Type: Freelance

Viktor Feyt

IT Recruitment Consultant

vfe.ext@brainbridge.be

Apply for this position

Platform & DevOps Engineer Senior