Job Description
Title: Site Reliability Engineer
Location: Falls Church, VA
Salary: $110,000 - $130,000 / Year
Job Type: Full-Time | Exempt
No sponsorship available
BENEFITS
• Health, Dental, Vision Insurance
• 401(k) with immediate vesting
• Tuition Assistance
• Public Service Loan Forgiveness (PSLF) eligibility
• Generous Paid Time Off
• Dog-friendly office
• Onsite gym
• Health Savings Account (HSA) / Flexible Spending Account (FSA)
• Employee Assistance Program (EAP)
• Life and Disability Insurance
• Pet Insurance
• Trade Publication / Subscription Reimbursement
• Paid Holidays, Vacation, and Sick Leave
• Parental Leave
Job Description
We are seeking a Site Reliability Engineer (SRE) to help establish and shape a reliability engineering practice from the ground up. This is a unique opportunity to join a mission-driven environment and play a key role in ensuring the reliability, scalability, and performance of AWS-hosted business applications.
As part of a cross-functional engineering team, you will work to improve observability, automate operational processes, and lead incident response and continuous improvement efforts. This role is ideal for a mid-level engineer with cloud and software engineering experience who is eager to deepen their expertise in site reliability engineering, learn from senior staff, and help build a culture of reliability.
ESSENTIAL DUTIES AND RESPONSIBILITIES
• Define and implement service-level indicators (SLIs) and service-level objectives (SLOs) for cloud-based applications.
• Build, configure, and maintain monitoring, alerting, and dashboarding solutions using AWS CloudWatch, X-Ray, and third-party tools such as DataDome.
• Leverage advanced AWS observability tools (e.g., CloudWatch Synthetics, Contributor Insights) to proactively monitor system health.
• Contribute to the development and implementation of a structured on-call support process.
• Implement, monitor, and maintain site protection and bot mitigation solutions to defend against automated attacks and ensure application availability.
• Investigate incidents, security events, and operational anomalies, perform root cause analysis, and lead postmortem processes.
• Identify operational inefficiencies (“toil”) and automate workflows using AWS Lambda and CloudFormation.
• Assist in maintaining and enhancing CI/CD pipelines and deployment processes.
• Collaborate with development, QA, cloud, and DevOps teams to ensure reliability, scalability, and security are embedded into system designs.
• Document systems, processes, incident findings, compliance activities, and reliability best practices.
• Stay current with AWS, SRE, and observability trends and recommend improvements.
• Evaluate and support the rollout of new AWS services and features.
• Perform other related duties as assigned.
KNOWLEDGE & SKILLS
• Strong analytical, troubleshooting, and problem-solving abilities.
• Hands-on experience with AWS CloudWatch (metrics, logs, dashboards, alarms).
• Familiarity with AWS X-Ray for distributed tracing.
• Experience with CloudWatch Synthetics and Contributor Insights for proactive testing and analysis.
• Knowledge of AWS CloudTrail for auditing and investigations.
• Experience using AWS Athena for log analysis.
• Proficiency with AWS CloudFormation.
• Experience automating workflows with AWS Lambda or similar tools.
• Understanding of AWS services such as API Gateway, CloudFront, and Elastic Load Balancer (ELB).
• Experience with site protection or bot mitigation tools (e.g., DataDome, Cloudflare).
• Scripting or programming experience in Python, Bash, or Node.js.
• Excellent communication and documentation skills.
• Growth-oriented and eager to adopt emerging tools and practices.
REQUIREMENTS
• Bachelor’s degree in computer science, engineering, or related field (or equivalent experience).
• 3+ years of experience in cloud engineering, DevOps, infrastructure, or observability (AWS required).
• Experience applying SRE principles (prior SRE experience preferred).
• Background in monitoring, incident response, or reliability in production environments.
• Experience working in Agile, cross-functional teams.
• Passion for building and improving reliability practices.
...established and growing private practice of general dentists and specialists in Wakefield, RI is looking for an experienced DENTAL ASSISTANT. Are you a dedicated and detail-oriented dental assistant looking to join a dynamic and supportive dental team? At South...
..., and leadership programs...and more! This is an experienced athletic training position with clinical responsibilities to one or more... ...). MINIMUM QUALIFICATIONS: Nationally certified athletic trainer (ATC). Georgia state licensure. Bachelor's degree in a health...
...Release Manager API Gateway (Contract) Location: Greenwood Village, CO (Hybrid 23 days onsite/week) Rate: $60$62 per hour Contract Type: Long-term W2 Interview Format: One-round, 60-minute interview (WebEx or onsite) Overview Our Fortune 500 client...
...& Pet Company (NASDAQ:CENT and CENTA), is a leading innovator, marketer and producer of quality branded products for the pet, lawn and... ...Product Development, Operations, Purchasing, Finance, Agency, and Legal to champion projects Act as an advocate for a positive work...
...Senior Vice President, Global Site Reliability Engineering (SRE) & DevOps, Data & Technology About the Company Thriving marketing & advertising agency Industry Marketing and Advertising Type Privately Held, Private Equity-backed Founded 2012 Employees...