• E-ISSN:

    2454-9584

    P-ISSN

    2454-8111

    Impact Factor 2024

    6.713

    Impact Factor 2023

    6.464

  • E-ISSN:

    2454-9584

    P-ISSN

    2454-8111

    Impact Factor 2024

    6.713

    Impact Factor 2023

    6.464

  • E-ISSN:

    2454-9584

    P-ISSN

    2454-8111

    Impact Factor 2024

    6.713

    Impact Factor 2023

    6.464

INTERNATIONAL JOURNAL OF INVENTIONS IN ENGINEERING & SCIENCE TECHNOLOGY

International Peer Reviewed (Refereed), Open Access Research Journal
(By Aryavart International University, India)

Paper Details

Orchestrating Adaptive Resilience and Continuity Restoration in Cloud-Native Environments

Amar Gurajapu

Principal Member of Technical Staff, Network Systems, AT&T, Middletown, New Jersey, United States

Anurag Agarwal

Senior Software Engineer, Network Systems, AT&T, Middletown, New Jersey, United States

1 - 6 Vol. 12, Issue 1, Jan-Dec, 2026
Receiving Date: 2025-12-02;    Acceptance Date: 2026-01-04;    Publication Date: 2026-01-10
Download PDF

http://doi.org/10.37648/ijiest.v12i01.001

Abstract

Cloud-native services must tolerate node failures, network partitions, and entire-region outages without violating SLAs. We survey Adaptive Resilience Mechanisms (ARMs) including pod-level checkpointing, self-healing circuits, and dynamic redundancy—and Continuity Restoration Strategies (CRSs) such as geo-replication with automated DNS switchover. Then we present an AI-driven framework that fuses real-time telemetry, anomaly detection via LSTM autoencoders, failure classification, and Infrastructure-as-Code orchestration. A two-region Kubernetes prototype achieves a Restoration Time Objective (RTO) under 3 minutes and a Continuity Point Objective (CPO) under 5 seconds, improving data continuity by 40 % and availability by 10 %.

Keywords: Infrastructure as Code; Terraform; CI/CD Resource Allocation; Kubernetes Checkpoint/Restore; Geo-Replication; DNS Switchover; Restoration Time Objective (RTO); Continuity Point Objective (CPO)

    References

  1. Eskandani, N., Koziolek, H., Hark, R., & Linsbauer, S. (2024). The state of container checkpointing with CRIU: A multi-case experience report. In 2024 IEEE 21st International Conference on Software Architecture Companion (ICSA-C) (pp. 54–59). IEEE. https://doi.org/10.1109/ICSA-C63560.2024.00015
  2. Gur, A. (2025, October 28). Best practices for monitoring Kubernetes clusters: Reliability and minimise operational overhead. ResearchGate. https://www.researchgate.net/publication/399121579
  3. Lee, Y., Park, C., Kim, N., Ahn, J., & Jeong, J. (2024). LSTM-autoencoder based anomaly detection using vibration data of wind turbines. Sensors, 24(9), 2833. https://doi.org/10.3390/s24092833
  4. NVIDIA. (n.d.). What is a random forest? NVIDIA Data Science Glossary. https://www.nvidia.com/en-us/glossary/random-forest/
  5. Serverless Inc. (2025). Serverless container framework documentation. https://www.serverless.com/containers/docs
  6. Sun, Z. (2025, June 4). Autoencoders for time series anomaly detection: A visual and practical guide. Medium. https://medium.com/@injure21/autoencoder-for-time-series-anomaly-detection-021d4b9c7909
  7. Ternary & Ternary Team. (2025, March 11). Anomaly detection comparison in AWS vs. Azure vs. Google Cloud. Ternary. https://ternary.app/blog/anomaly-detection-comparison-aws-vs-azure-vs-gcp/
Back

Disclaimer: Indexing of published papers is subject to the evaluation and acceptance criteria of the respective indexing agencies. While we strive to maintain high academic and editorial standards, International Journal of Inventions in Engineering & Science Technology does not guarantee the indexing of any published paper. Acceptance and inclusion in indexing databases are determined by the quality, originality, and relevance of the paper, and are at the sole discretion of the indexing bodies.