Published on April 24, 2025. Modified on May 22, 2025.
We are seeking a highly skilled and experienced Datadog Administrator to manage and optimize our Datadog platform.
In this role, you will be responsible for configuring, deploying, and maintaining Datadog to effectively monitor our enterprise IT landscape.
The ideal candidate will have expertise in various Datadog features, including Infrastructure Monitoring, Application Performance Monitoring (APM), Log Management, and Digital Experience Monitoring to ensure optimal system performance, availability, and security.
Responsibilities :
- Administer the Datadog platform, leveraging features like Infrastructure Monitoring, APM, Logs, and Digital Experience Monitoring (RUM, Synthetic Monitoring) to monitor the Enterprise IT landscape.
- Deploy and configure Datadog agents across diverse environments (Cloud, On-Premises, Hybrid).
- Set up and manage Datadog integrations with various services and tools (AWS, Azure, Kubernetes, Docker, ITSM, AIOps, etc.).
- Configure Datadog dashboards, monitors, and alerts for comprehensive system visibility and health monitoring.
- Set up anomaly detection and alerting mechanisms to proactively identify and resolve issues.
- Develop solutions to monitor new applications and services effectively.
- Integrate Datadog with ITSM tools, AIOps platforms, and other enterprise solutions.
- Leverage Datadog for monitoring Saa S-based applications.
- Optimize log management, metric collection, and Application Performance Monitoring (APM) for effective troubleshooting and system health analysis.
- Review error logs and correlate them to various architectural components and services, including log format correction and normalization.
- Utilize Datadog for reporting, dashboard creation, and analytics.
- Perform API backend calls from Datadog for custom metric ingestion and business observability.
- Demonstrate strong troubleshooting and analytical skills.
- Communicate effectively with stakeholders and cross-functional teams to coordinate and resolve issues.
- Document processes, requirements, and operating procedures for internal knowledge sharing.
- Apply ITIL best practices in daily operations, including Incident Management, Problem Management, and Event Management.
- Maintain user access controls and enforce security best practices within the Datadog platform.
- Ensure compliance with security policies, governance standards, and industry regulations.
Required Skills & Experience :
- 5+ years of experience in Datadog administration, observability, and monitoring solutions.
- Hands-on experience in configuring monitoring, dashboards, alerts, and integrations within Datadog.
- Strong understanding of log management, metric collection, APM, and distributed tracing.
- Experience in working with cloud platforms (AWS, Azure, Google Cloud) and containerized environments (Docker, Kubernetes).
- Proficiency in Infrastructure as Code (Ia C) tools like Terraform for monitoring automation.
- Strong scripting skills (Python, Shell, or Power Shell) for automation and custom integrations.
- Knowledge of ITSM tools, AIOps platforms, and cloud-native monitoring solutions.
- Familiarity with CI/CD pipelines and Dev Ops methodologies.
- Working knowledge of ITIL frameworks and best practices in incident and problem management.
- Strong security and compliance mindset to maintain governance standards.
- Excellent communication, collaboration, and documentation skills.
- Certifications in Datadog, AWS, or Azure Monitoring Services.
- Experience in observability and performance tuning for large-scale enterprise environments.
- Exposure to APM tools like New Relic, App Dynamics, or Splunk.
- Familiarity with machine learning-based anomaly detection in observability platforms.