
ODA129 - SYSTEM ADMINISTRATOR - INCIDENT & MONITORING

ODA129 - SYSTEM ADMINISTRATOR - INCIDENT & MONITORING
IT
Junior
Remote
Full-time
Responsibilities
As a System Administrator - Incident & Monitoring, you play a central role in ensuring the stability, performance, and reliability of our infrastructure and databases.
You are responsible for handling critical incidents, monitoring systems in real-time, and driving investigations, while also mentoring team members and improving operational practices.
Your role sits at the intersection of operations, investigation, and leadership, where your ability to act quickly, think analytically, and support others makes a real impact.
Event Management & Proactive Intervention:
Actively monitor infrastructure, databases, and customer environments.
Analyze alerts, prioritize critical events, and take immediate corrective actions.
Detect anomalies and proactively intervene before impacting services.
Incident Management, Investigation & Escalation:
Lead the resolution of critical and time-sensitive incidents (performance issues, replication failures, outages, security alerts).
Conduct deep investigations and root cause analysis RCA.
Act as a key escalation point during your shift to ensure proper coordination across teams.
Maintain clear communication and ownership during incidents, especially in high-pressure situations.
Cross-Shift Ownership & Continuity:
Ensure proper handover between shifts, maintaining high-quality communication and documentation.
Follow up on ongoing incidents to ensure resolution and accountability across shifts.
Maintain full visibility on critical issues and action plans.
Problem Management & Continuous Improvement:
Analyze recurring incidents and define long-term corrective actions. Participate in post- mortems and contribute to improving processes, tooling, and system resilience.
System Performance Analysis:
Monitor and analyze performance, identifying bottlenecks and inefficiencies. Optimize queries, configurations, and system resources. Support capacity planning and scalability initiatives.
Service Continuity & Reliability:
Ensure backup integrity, failover readiness, and disaster recovery mechanisms. Act quickly on failures and coordinate recovery actions.
Change, Patch & Release Management:
Contribute to patching and release cycles, assessing risks, and monitoring impacts. Validate system stability post-deployment within operational constraints.
Automation & Process Optimization:
Automate monitoring, alerting, and operational tasks. Improve efficiency and reduce manual intervention.
Mentoring, Coaching & Team Support:
Provide real-time guidance to team members during shifts.
Support junior profiles in developing troubleshooting and incident management skills.
Act as a referent and support pillar during complex situations.
Training & Knowledge Management:
Contribute to training materials and incident response procedures. Maintain and improve the knowledge base.
Reporting & Data-Driven Insights:
Analyze trends, recurring issues, and operational data.
Provide actionable insights to improve system performance and reliability.
Requirements
2+ years of experience in system administration.
Strong experience in incident management, monitoring, and troubleshooting.
Experience handling critical incidents autonomously is a strong plus.
Key Skills & Competencies:
Strong analytical and problem-solving mindset.
Ability to perform under pressure in a real-time operational environment.
High sense of ownership and accountability.
Strong communication, especially for handover and incident coordination.
Mentoring mindset and team-oriented approach.
Adaptability to shifting priorities and operational demands.
Preferred Technologies:
Operating Systems: Linux
Databases: MySQL, MariaDB, PostgreSQL, Percona
Monitoring & Observability: Prometheus, Grafana, ELK Stack
Web Servers: Nginx, Apache
Containers & Virtualization: Docker, Kubernetes
Benefits
Working location: Remote full-time (9am – 5pm VNT)
Salary range: Up to USD 1,400 gross
Infomation
Offered Salary
1,000 $ - 1,400 $
Skills

