Inktech logo
Inktech

Senior SRE

Remote & RelocationFull-timeSeniorWorldDevelopment
```html

Key Responsibilities

  • Develop and maintain the VictoriaMetrics + Grafana + Alertmanager stack;
  • Configure metrics, dashboards, and alerts for microservices and databases;
  • Maintain centralized logging (Fluent Bit, Fluentd, Elasticsearch, Kibana);
  • Define SLI/SLOs and participate in incident analysis (postmortems);
  • Help operate and support EKS clusters (dev/prod);
  • Work with autoscaling (Karpenter or Cluster Autoscaler);
  • Configure storage (EBS/EFS), load balancers, and network policies;
  • Troubleshoot pod, node, and networking issues;
  • Maintain Argo CD and Helm charts for services;
  • Ensure correct deployments and environment health;
  • Automate routine operations via GitLab CI/CD;
  • Monitor PostgreSQL (CloudNativePG) and Redis;
  • Set up DB monitoring and alerts (replication, lag, failovers);
  • Participate in testing backup and restore procedures;
  • Work with Vault (secrets, tokens, access management);
  • Configure cert-manager and automated certificate renewals;
  • Help set up OAuth2 Proxy / Zitadel for services;
  • Maintain Terraform/OpenTofu modules for infrastructure (EKS, S3, IAM);
  • Work with multi-environment configurations and remote state;
  • Write and review simple infrastructure changes via Git.

Requirements

  • Hands-on experience with Kubernetes and understanding of core objects (Pods, Deployments, Services, Ingress);
  • Ability to read/write Helm values and work with Argo CD;
  • Understanding of monitoring/metrics concepts (Prometheus/VictoriaMetrics, Grafana, Alertmanager);
  • Ability to work with logs and perform incident root cause analysis.

Desirable

  • Experience with PostgreSQL and Redis (backups, replication, monitoring);
  • Knowledge of Terraform or OpenTofu;
  • Experience with Kafka, ClickHouse, or Vault;
  • Understanding of SLI/SLOs and SRE practices.

What We Offer

  • High salary (plus performance bonuses and salary revision regularly);
  • Work schedule: Mon-Fri (9h with 1h lunch break), flexible start 8:00-10:00;
  • 24 days holiday leave;
  • Exciting work challenges that allow you to grow to your full potential;
  • A strong team of like-minded professionals who will be by your side to accomplish ambitious projects, stimulate your professional development and bring experience.

Who You Are

We are looking for a candidate who is passionate about site reliability engineering, has a strong technical background, and is eager to tackle complex challenges in a collaborative environment.

Tech Stack

  • Kubernetes
  • VictoriaMetrics
  • Grafana
  • Alertmanager
  • Fluent Bit
  • Fluentd
  • Elasticsearch
  • Kibana
  • GitLab CI/CD
  • PostgreSQL
  • Redis
  • Vault
  • Terraform/OpenTofu

Team Description

You will be part of a strong team of like-minded professionals who are dedicated to achieving ambitious projects and fostering professional development.

```

Ready to apply for this role?

Apply Now →

Related jobs

Apply Now →