Cookies & analytics consent
We serve candidates globally, so we only activate Google Tag Manager and other analytics after you opt in. This keeps us aligned with GDPR/UK DPA, ePrivacy, LGPD, and similar rules. Essential features still run without analytics cookies.
Read how we use data in our Privacy Policy and Terms of Service.
🤖 15+ AI Agents working for you. Find jobs, score and update resumes, cover letter, interview questions, missing keywords, and lots more.

EPAM Systems • Romania
Role & seniority: Lead ELK Observability Platform Expert / Tester
Stack/tools: ELK stack (Elasticsearch, Logstash, Kibana, Beats); ingestion pipelines; dashboards in Kibana; CI/CD integrations; cloud platforms (AWS, Azure, GCP); Docker/Kubernetes; Linux, Bash, Python; security (TLS, RBAC, encryption); observability tools (Prometheus, Grafana, OpenTelemetry)
Design, maintain, and optimize ELK components for large-scale, high-availability environments; implement data retention, indexing, and performance tuning
Build and optimize ingestion pipelines (logs, metrics, traces); create real-time dashboards and visualizations; ensure observability across systems
Integrate observability with CI/CD and cloud platforms; implement security/compliance, backup/DR strategies, and disaster recovery; troubleshoot ingestion, indexing, and visualization issues
5–10 years ELK stack experience (Elasticsearch, Logstash, Kibana, Beats)
Strong knowledge of observability concepts (logs, metrics, traces) and related tools (Prometheus, Grafana, OpenTelemetry)
Linux administration, scripting (Bash/Python); cloud experience (AWS/Azure/GCP); Docker/Kubernetes
Indexing strategies, cluster scaling, performance tuning for Elasticsearch; security practices (TLS, RBAC, encryption)
Analytical, troubleshooting, communication, and collaboration; advanced English (C1)
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential. We are seeking a Lead ELK Observability Platform Expert / Tester to design, maintain and optimize ELK stack solutions for large-scale, high-availability environments, ensuring robust observability, data management and performance monitoring across diverse systems. Responsibilities Design and maintain ELK stack components (Elasticsearch, Logstash, Kibana, Beats) for large-scale, high-availability environments Develop and optimize ingestion pipelines for logs, metrics and traces from diverse sources Implement data retention, indexing strategies and performance tuning for Elasticsearch clusters Create dashboards and visualizations in Kibana to provide real-time insights into system health and application performance Integrate observability tools with CI/CD pipelines and cloud platforms (AWS, Azure, GCP) Ensure security and compliance for data stored and processed within the observability stack Troubleshoot and resolve issues related to data ingestion, indexing and visualization Optimize cluster architecture (roles, shards, ILM, snapshots, CCR) Implement backup/restore strategies and disaster recovery plans Requirements Expertise in ELK stack with 5-10 years of experience (Elasticsearch, Logstash, Kibana, Beats) Knowledge of observability concepts (logs, metrics, traces) and related tools such as Prometheus, Grafana or OpenTelemetry Proficiency in Linux administration and scripting with Bash or Python Experience with cloud platforms (AWS, Azure or GCP) and container orchestration using Docker or Kubernetes Knowledge of indexing strategies, cluster scaling and performance tuning for Elasticsearch Familiarity with security practices including TLS, RBAC and data encryption Strong analytical, troubleshooting and problem-solving skills with the ability to work in a fast-paced environment Excellent communication and collaboration skills Advanced proficiency in English (C1) Nice to have Experience with machine learning features in Elasticsearch Familiarity with Infrastructure as Code using Terraform or Ansible Experience with monitoring and observability using Prometheus, Grafana or OpenTelemetry We offer Full access to cutting-edge tools and technologies Competitive compensation depending on experience and skills
All-around Social package: professional & soft skills training, medical & family care programs, sports Free English classes Unlimited access to LinkedIn learning solutions Continuous experience exchange with experts and professionals worldwide Friendly team and comfortable working environment Engineering, corporate, and social events within and outside the Company Flexible working schedule Opportunities for self-realization