Observability Engineer

Stella Contracting, Inc

Chicago, IL, USA

Published: 6/14/2022

Technology

Full Time

Job Description

We are seeking a skilled and detail-oriented Observability Engineer to join our remote team. In this role, you will be responsible for designing, implementing, and maintaining observability solutions that ensure high availability, performance, and reliability of our systems. Your work will empower teams with real-time insights through metrics, logs, and tracing, helping to drive faster incident response and better system understanding.

Key Responsibilities:

Develop and maintain observability platforms (e.g., Prometheus, Grafana, OpenTelemetry, ELK, Datadog, New Relic)

Design and implement monitoring strategies across distributed systems

Collaborate with DevOps, SRE, and engineering teams to define SLIs, SLOs, and dashboards

Create automated alerts and integrations to improve incident detection and resolution

Analyze performance data and logs to identify trends, bottlenecks, and areas for optimization

Ensure observability tools are reliable, secure, and scalable

Provide documentation and training to empower teams to use observability tools effectively

Qualifications:

2+ years of experience in observability, monitoring, or site reliability engineering

Strong knowledge of monitoring, logging, and tracing tools and best practices

Experience with cloud infrastructure (AWS, GCP, or Azure)

Proficiency in scripting or automation (e.g., Python, Bash, Terraform)

Excellent problem-solving skills and ability to work independently in a remote setting