Professional Summary

IT Professional with 8+ years of experience in architecting enterprise-scale cloud infrastructure and automation solutions for Fortune 500 financial services and high-growth technology companies. Proven expertise in Kubernetes orchestration, Infrastructure as Code, and mission-critical incident management across hybrid cloud environments. Led infrastructure optimization initiatives saving $120K+ annually while maintaining 99.9% uptime SLAs in hybrid environments.

Professional Experience

Senior Site Reliability Engineer

Collabera Inc | Bank of America January 2025 - August 2025
  • Oversaw enterprise incident lifecycle in a regulated financial environment, sustaining 99.9% SLA compliance across 100+ OpenShift clusters and 2,000+ UNIX systems.
  • Conducted vulnerability assessments and coordinated remediation, closing 1,200+ findings in six months and reducing security exposure by 40%.
  • Architected automated incident response workflows with ServiceNow ITSM, decreasing MTTR from 4 hours to 90 minutes and saving ~1,000 engineering hours annually.
  • Supported mission-critical trading and banking applications processing $50M+ daily transactions with zero missed cutoffs during tenure.
  • Collaborated with Security, Network, and Development teams, contributing to 3 successful SOX audits with no major findings.

Senior Site Reliability Engineer

TEKsystems | Startup AI Infrastructure Solutions December 2023 - December 2024
  • Led Site Reliability Engineering for the Observability platform, managing Mimir and OpenTelemetry API endpoints to ensure scalable metrics ingestion and reliable time-series storage supporting millions of active series.
  • Automated multi-region AWS deployments with Harness.io and Terraform, cutting provisioning times from 4 hours to 30 minutes.
  • Deployed GPU compute automation with Ansible for CUDA drivers and Slurm scheduling, reducing GPU job queue times by 60%.
  • Implemented SAST/DAST in GitLab pipelines, detecting 300+ vulnerable dependencies and reducing remediation time from 3 weeks to 5 days.
  • Optimized Kubernetes workloads, lowering AWS spend by 25% (~$750K annually) while maintaining SLAs.
  • Facilitated knowledge sharing and cross-functional collaboration through documentation, training sessions, and participation in internal communities of practice, fostering a culture of learning and innovation within the organization

Senior Systems Engineer

TEKsystems | Disney Parks & Resorts July 2022 - December 2023
  • Engineered AWS infrastructure with ECS Fargate, API Gateway, and Lambda to serve 40,000+ daily park visitors with 99.99% uptime.
  • Standardized Terraform modules for multi-region workloads, cutting provisioning time by 50% and enabling 60+ applications to migrate to IAC.
  • Managed SNS/SQS and RabbitMQ systems processing 50M+ messages daily with 99.99% delivery reliability.
  • Coordinated cost optimization initiatives, reducing cloud spend by 20% (~$2.5M annually) while maintaining PCI compliance.
  • Directed incident management across global teams, reducing average resolution time by 35%, and mentored 8 engineers in incident postmortem practices.

DevOps/SRE Engineer

Evercast LLC April 2021 - July 2022
  • Designed AWS infrastructure with Terraform to support 24/7 real-time streaming sessions for 10K+ users.
  • Optimized EKS deployments, cutting production rollout times from 30 to 12 minutes and increasing release frequency by 3x
  • Implemented Grafana/Prometheus stack with automated alerting, reducing mean time to detection (MTTD) by 70%.
  • Applied Cloud Custodian policies, achieving $120K annual cost savings through automated resource lifecycle management.
  • Restructured IAM policies, eliminating 95% of overly-permissive roles and improving audit compliance.

AWS Cloud Engineer

Ukietech Inc January 2017 - March 2021
  • Designed and documented multi-tier managed services in AWS, supporting 50+ business applications and 100K+ monthly users.
  • Deployed Terraform-based infrastructure patterns, reducing environment setup from 2 weeks to 2 days.
  • Implemented SCA tooling in Jenkins pipelines, flagging 250+ open-source vulnerabilities and reducing remediation from 2 months to 10 days.
  • Supported enterprise documentation in Confluence and Jira, creating 100+ knowledge base articles for cross-team use.

Education

Bachelor of Technology and Management in Public Administration

University of Technology of Moldova 2014

AWS Solutions Architect Associate 9YX1DXDDCEFEQMC9

AWS Cloud Engineering 2020

AWS Cloud Engineering

Ziyotek Institute of Technology 2020

Notable Projects

Monitoring & Observability Solution

Architected comprehensive monitoring stack using Prometheus, Grafana, and Mimir. Established alerting mechanisms and dashboards that reduced mean time to resolution (MTTR) by 60% and improved system reliability across 50+ microservices.

Technologies: Prometheus, Grafana, Mimir, Loki, AlertManager, Python, Kubernetes, Rancher, Terraform
View Project

Enterprise Infrastructure Modernization

Led cross-functional team in migrating legacy on-premises infrastructure to AWS cloud, resulting in 40% cost reduction and 99.9% uptime. Implemented Infrastructure as Code using Terraform and Ansible, enabling rapid deployment and disaster recovery capabilities.

Technologies: AWS, Terraform, Ansible, Kubernetes, CI/CD
View Project

Cloud Cost Optimization

Led initiative to optimize cloud spending, implementing auto-scaling, resource tagging, and cost monitoring. Achieved 35% reduction in monthly cloud costs while maintaining performance.

Technologies: AWS Cost Management, Auto-scaling, Monitoring, Terraform, Cloud Custodian
View Project

Personal Website

Developed a personal website using HTML, CSS, and JavaScript to showcase my skills and projects. The website is hosted AWS S3 and CloudFront. Integrated with K3s and Raspberry Pi. https://www.gricascloud.com

Technologies: HTML, CSS, JavaScript, GitHub Pages, K3s, Raspberry Pi
View Project