network observability

A Holistic Approach to Network Observability: Beyond the “Five Steps”

In a recent article on BetaNews, Song Pang outlines five steps to achieve network observability: Network Discovery and Data Accuracy, Network Visualizations, Network Design and Assurance, Automation, and Observability. While these steps provide a solid foundation, we believe there are alternative approaches that can be more effective, especially in today’s rapidly evolving network environments. Here, we propose a different set of steps and actions to achieve network observability, explaining why this approach might be superior with clear examples and historical facts.

BetaNews approach focuses on accurate data from logs, traces, traffic paths, and SNMP. We suggest getting a wider system’ view: instead of just focusing on traditional data sources, integrate data from a wider array of sources including cloud services, IoT devices, and user behavior analytics. This holistic view ensures that no part of the network is overlooked.

(C) Image copyright PacketAI & DALL-E
Advanced Automated Network Monitoring Image copyright (C) 2024 PacketAI and DALL-E

For example, back in 2016, a major retail company faced a significant data breach because their network monitoring only covered traditional data sources. By integrating data from IoT devices and user behavior analytics, they could have detected the anomaly earlier.

Real-Time Anomaly Detection with AI

BetaNews approach emphasizes network visualizations and manual baselines. This is great as a start, but you should consider implementing an AI-driven real-time anomaly detection. AI can learn normal network behavior and detect deviations instantly, reducing the time to identify and resolve issues.
In 2020, a financial institution implemented AI-driven anomaly detection, which reduced their mean time to resolution (MTTR) by 40% compared to their previous manual baseline approach.

Proactive Incident Response

BetaNews did not suggest that, but you should be ahead of any network issue. Develop a proactive incident response strategy that includes automated responses to common issues. This reduces downtime and ensures quicker recovery from incidents. A tech company in 2018 implemented automated incident response for their network. This proactive approach reduced their downtime by 30% during network outages.

Continuous Improvement and Feedback Loops

Establish continuous improvement and feedback loops. Regularly review and update network policies and configurations based on the latest data and trends.
In 2019, a healthcare provider adopted continuous improvement practices for their network observability. This led to a 25% improvement in network performance over a year.

User-Centric Observability

While BetaNews approach ends with achieving observability, you can focus on user-centric observability. Ensure that the network observability strategy aligns with user experience and business goals. This ensures that the network not only functions well but also supports the overall objectives of the organization.
A global e-commerce company in 2021 shifted their focus to user-centric observability. This alignment with business goals led to a 20% increase in customer satisfaction and a 15% boost in sales.

Common Mistakes in Network Monitoring

While striving for network observability, it’s crucial to be aware of common mistakes that can undermine your efforts:
Many teams adopt a reactive stance, addressing threats only after they occur. This can leave networks vulnerable to evolving threats. A proactive approach, constantly updating antivirus and cybersecurity practices, is essential.

  • Focusing solely on devices and neglecting applications can lead to incomplete visibility.
  • Monitoring both devices and applications ensures a comprehensive view of network performance and potential vulnerabilities.
  • Failing to monitor network logs can result in missed signs of breaches or performance issues. Regular log analysis is crucial for early detection of anomalies.
  • Not anticipating network expansion can lead to scalability issues. Planning for growth ensures that the network can handle increased traffic and new devices.
  • Using outdated tools can leave networks exposed to new types of threats. Regularly updating and upgrading monitoring tools is vital to maintain robust security.

Conclusion

While the five steps outlined by BetaNews provide a structured approach to network observability, the alternative steps proposed here offer a more comprehensive, proactive, and user-centric strategy. By integrating diverse data sources, leveraging AI, implementing proactive incident response, establishing continuous improvement practices, and focusing on user experience, organizations can achieve a higher level of network observability that not only ensures network performance but also supports business objectives.

A Holistic Approach to Network Observability: Beyond the “Five Steps” Read More »

OpenTelemetry and eBPF: A Comparative Analysis in Modern Observability

In the realm of observability and application performance monitoring, two technologies have emerged as significant players: OpenTelemetry and eBPF (extended Berkeley Packet Filter). Both offer unique approaches to monitoring, but they operate at different layers of the stack and come with their own sets of strengths and weaknesses.

What is OpenTelemetry?

OpenTelemetry is an open-source observability framework that provides a standardized way to collect telemetry data from applications. It includes a collection of APIs, SDKs, and tools designed to capture traces, logs, and metrics from distributed systems. The primary goal of OpenTelemetry is to offer a vendor-neutral solution for observability, making it easier for organizations to monitor their applications using a consistent approach.

Strengths of OpenTelemetry:
  • Standardization: Provides a unified, vendor-neutral way to gather observability data.
  • Comprehensive: Covers logs, metrics, and traces, offering a broad scope of monitoring capabilities.
  • Integration: Supports a wide range of integrations with existing tools and platforms.
Weaknesses of OpenTelemetry:
  • Performance Overhead: Can introduce significant performance overhead, especially in high-traffic environments.
  • Complexity: The broad scope and numerous features can make it complex and challenging to implement effectively.
  • Feature Creep: The addition of features to accommodate various enterprise needs has led to bloat and inefficiency.

What is eBPF?

eBPF is a technology that allows programs to run in the Linux kernel without modifying the kernel source code. It is used for a variety of purposes, including observability, security, and networking. eBPF programs can collect data directly from the operating system, providing real-time, low-overhead insights into system behavior.

Strengths of eBPF:
  • Low Overhead: Runs in the kernel, resulting in minimal performance impact.
  • Real-Time Monitoring: Provides real-time insights into system behavior and performance.
  • Security: eBPF programs are sandboxed and must pass validation checks, enhancing security.
Weaknesses of eBPF:
  • Complexity: Requires deep knowledge of the Linux kernel and eBPF programming.
  • Limited Adoption: Still relatively niche compared to more established observability tools.
  • Kernel Dependency: Only works on Linux-based systems, limiting its applicability in heterogeneous environments.

When to Use OpenTelemetry vs. eBPF

Use OpenTelemetry When:
  • You need a standardized, vendor-neutral way to collect observability data across a wide range of applications and services.
  • You require comprehensive monitoring that includes logs, metrics, and traces.
  • You are looking for a solution that integrates well with existing observability tools and platforms.
Use eBPF When:
  • You need real-time, low-overhead monitoring directly from the operating system.
  • You are focused on performance and security, and can leverage the advanced capabilities of eBPF.
  • Your environment is primarily Linux-based, and you have the expertise to implement and manage eBPF programs.

Conclusion

Both OpenTelemetry and eBPF offer valuable capabilities for modern observability, but they serve different purposes and operate at different layers of the stack. OpenTelemetry provides a comprehensive, standardized approach to collecting observability data, while eBPF offers real-time, low-overhead insights directly from the kernel. Understanding the strengths and weaknesses of each can help organizations choose the right tool for their specific needs.

OpenTelemetry and eBPF: A Comparative Analysis in Modern Observability Read More »

The Impact of AWS’s Native Kubernetes Network Policies on K8s-Based Operations, DevOps, and Developers

AWS has announced the introduction of native Kubernetes Network Policies for Amazon Elastic Kubernetes Service (EKS), a significant enhancement that promises to streamline network security management for Kubernetes clusters. This new feature is poised to have a profound impact on typical Kubernetes (K8s)-based operations, DevOps practices, and developers. Let’s explore how this development will shape the landscape.

Enhanced Security and Compliance

One of the most immediate benefits of AWS’s native Kubernetes Network Policies is the enhanced security it brings to Kubernetes clusters. Network policies allow administrators to define rules that control the traffic flow between pods, ensuring that only authorized communication is permitted. This granular control is crucial for maintaining a secure environment, especially in multi-tenant clusters where different applications and services coexist.

For DevOps teams, this means a significant reduction in the complexity of managing network security. Previously, implementing network policies often required third-party solutions or custom configurations, which could be cumbersome and error-prone. With native support from AWS, teams can now leverage built-in tools to enforce security policies consistently across their clusters.

Simplified Operations

The introduction of native network policies simplifies the operational aspects of managing Kubernetes clusters. By integrating network policy enforcement directly into the AWS ecosystem, administrators can now manage security settings through familiar AWS interfaces and tools. This integration reduces the learning curve and operational overhead associated with third-party network policy solutions.

For typical K8s-based operations, this means more streamlined workflows and fewer dependencies on external tools. Operations teams can focus on optimizing cluster performance and reliability, knowing that network security is robustly managed by AWS’s native capabilities.

Improved Developer Productivity

Developers stand to benefit significantly from the introduction of native Kubernetes Network Policies. With security policies managed at the infrastructure level, developers can concentrate on building and deploying applications without worrying about the intricacies of network security. This separation of concerns allows for faster development cycles and more efficient use of resources.

Moreover, the ability to define and enforce network policies programmatically aligns well with modern DevOps practices. Developers can include network policy definitions as part of their infrastructure-as-code (IaC) scripts, ensuring that security configurations are version-controlled and consistently applied across different environments.

Key Impacts on DevOps Practices

1. Automated Security Enforcement: DevOps teams can automate the enforcement of network policies using AWS tools and services, ensuring that security configurations are applied consistently across all stages of the CI/CD pipeline.
2. Enhanced Monitoring and Auditing: With native support, AWS provides integrated monitoring and auditing capabilities, allowing teams to track policy compliance and detect potential security breaches in real-time.
3. Seamless Integration with AWS Services: The native network policies are designed to work seamlessly with other AWS services, such as AWS Identity and Access Management (IAM) and AWS CloudTrail, providing a comprehensive security framework for Kubernetes clusters.

Challenges and Considerations

While the introduction of native Kubernetes Network Policies offers numerous benefits, it also presents certain challenges. Teams must ensure that they are familiar with the new features and best practices for implementing network policies effectively. Additionally, there may be a need for initial investment in training and updating existing infrastructure to leverage the new capabilities fully.

Conclusion

AWS’s introduction of native Kubernetes Network Policies marks a significant advancement in the management of Kubernetes clusters. By enhancing security, simplifying operations, and improving developer productivity, this new feature is set to transform typical K8s-based operations and DevOps practices. As organizations adopt these native capabilities, they can expect to see more streamlined workflows, robust security enforcement, and accelerated development cycles.

What are your thoughts on this new feature? How do you think it will impact your current Kubernetes operations?

The Impact of AWS’s Native Kubernetes Network Policies on K8s-Based Operations, DevOps, and Developers Read More »

Comparing New Relic’s New AI-Driven Digital Experience Monitoring Solution with Datadog

In the ever-evolving landscape of digital experience monitoring, two prominent players have emerged with innovative solutions: New Relic and Datadog. Both companies aim to enhance user experiences and optimize digital interactions, but they approach the challenge with different strategies and technologies. Let’s dive into what sets them apart.

New Relic’s AI-Driven Digital Experience Monitoring Solution

New Relic recently launched its fully-integrated, AI-driven Digital Experience Monitoring (DEM) solution, which promises to revolutionize how businesses monitor and improve their digital interactions. Here are some key features:

1. AI Integration: New Relic’s solution leverages artificial intelligence to provide real-time insights into user interactions across all applications, including AI applications. This helps identify incorrect AI responses and user friction points, ensuring a seamless user experience.
2. Comprehensive Monitoring: The platform offers end-to-end visibility, allowing businesses to monitor real user interactions and proactively resolve issues before they impact the end user.
3. User Behavior Analytics: By combining website performance monitoring, user behavior analytics, real user monitoring (RUM), session replay, and synthetic monitoring, New Relic provides a holistic view of the digital experience.
4. Proactive Issue Resolution: Real-time data on application performance and user interactions enable proactive identification and resolution of issues, moving from a reactive to a proactive approach.

Datadog’s Offerings

Datadog focuses on providing comprehensive monitoring solutions for infrastructure, applications, logs, and more. Here are some highlights:

1. Unified Monitoring: Datadog offers a unified platform that aggregates metrics and events across the entire DevOps stack, providing visibility into servers, clouds, applications, and more.
2. End-to-End User Experience Monitoring: Datadog provides tools for monitoring critical user journeys, capturing user interactions, and detecting performance issues with AI-powered, self-maintaining tests.
3. Scalability and Performance: Datadog’s solutions are designed to handle large-scale applications with high performance and low latency, ensuring that backend systems can support seamless digital experiences.
4. Security and Compliance: With enterprise-grade security features and compliance with industry standards, Datadog ensures that data is protected and managed securely.

Key Differences

While both New Relic and Datadog aim to enhance digital experiences, their approaches and focus areas differ significantly:

• Focus Area: New Relic is primarily focused on monitoring and improving the front-end user experience, while Datadog provides comprehensive monitoring across the entire stack, including infrastructure and applications.

• Technology: New Relic leverages AI to provide real-time insights and proactive issue resolution, whereas Datadog focuses on providing scalable and secure monitoring solutions.

• Integration: New Relic’s solution integrates various monitoring tools to provide a comprehensive view of the digital experience, while Datadog offers a unified platform that aggregates metrics and events across the full DevOps stack.

Conclusion

Both New Relic and Datadog offer valuable solutions for enhancing digital experiences, but they cater to different aspects of the digital ecosystem. New Relic’s AI-driven DEM solution is ideal for businesses looking to proactively monitor and improve user interactions, while Datadog’s robust monitoring offerings provide comprehensive visibility across infrastructure and applications. By leveraging the strengths of both platforms, businesses can ensure a seamless and optimized digital presence.

What do you think about these new offerings? Do you have a preference for one over the other?

Comparing New Relic’s New AI-Driven Digital Experience Monitoring Solution with Datadog Read More »

Network Monitoring for Cloud-Connected IoT Devices

One of the emerging trends in network monitoring is the integration of cloud computing and Internet of Things (IoT) devices. Cloud computing refers to the delivery of computing services over the internet, such as storage, processing, and software. IoT devices are physical objects that are connected to the internet and can communicate with other devices or systems. Examples of IoT devices include smart thermostats, wearable devices, and industrial sensors.

Cloud-connected IoT devices pose new challenges and opportunities for network monitoring. On one hand, cloud computing enables IoT devices to access scalable and flexible resources and services, such as data analytics and artificial intelligence. On the other hand, cloud computing introduces additional complexity and risk to the network, such as latency, bandwidth consumption, and security threats.

Therefore, network monitoring for cloud-connected IoT devices requires a comprehensive and proactive approach that can address the following aspects:

  • Visibility: Network monitoring should provide a clear and complete view of the network topology, status, and performance of all the devices and services involved in the cloud-IoT ecosystem. This includes not only the physical devices and connections, but also the virtual machines, containers, and microservices that run on the cloud platform. Network monitoring should also be able to detect and identify any anomalies or issues that may affect the network functionality or quality.
  • Scalability: Network monitoring should be able to handle the large volume and variety of data generated by cloud-connected IoT devices. This requires a scalable and distributed architecture that can collect, store, process, and analyze data from different sources and locations. Network monitoring should also leverage cloud-based technologies, such as big data analytics and machine learning, to extract meaningful insights and patterns from the data.
  • Security: Network monitoring should ensure the security and privacy of the network and its data. This involves implementing appropriate encryption, authentication, authorization, and auditing mechanisms to protect the data in transit and at rest. Network monitoring should also monitor and alert on any potential or actual security breaches or attacks that may compromise the network or its data.
  • Automation: Network monitoring should automate as much as possible the tasks and processes involved in network management. This includes using automation tools and scripts to configure, deploy, update, and troubleshoot network devices and services. Network monitoring should also use automation techniques, such as artificial intelligence and machine learning, to perform predictive analysis, anomaly detection, root cause analysis, and remediation actions.

Solutions for Network Monitoring for Cloud-Connected IoT Devices

There are many solutions available for network monitoring for cloud-connected IoT devices. Some of them are native to cloud platforms or specific IoT platforms, while others are third-party or open-source solutions. Some of them are specialized for certain aspects or layers of network monitoring, while others are comprehensive or integrated solutions. Some of them are:

  • Domotz: Domotz is a cloud-based network and endpoint monitoring platform that also provides system management functions. This service is capable of monitoring security cameras as well as network devices and endpoints. Domotz can monitor cloud-connected IoT devices using SNMP or TCP protocols. It can also integrate with various cloud platforms such as AWS, Azure, and GCP.
  • Splunk Industrial for IoT: Splunk Industrial for IoT is a solution that provides end-to-end visibility into industrial IoT systems.  Splunk Industrial for IoT can collect and analyze data from various sources such as sensors, gateways, and cloud services. Splunk Industrial for IoT can also provide dashboards, alerts, and insights into the performance, health, and security of cloud-connected IoT devices.
  • Datadog IoT Monitoring: Datadog IoT Monitoring is a solution that provides comprehensive observability for cloud-connected IoT devices. Datadog IoT Monitoring can collect and correlate metrics, logs, traces, and events from various sources such as sensors, gateways, cloud services. Datadog IoT Monitoring can also provide dashboards, alerts, and insights into the performance, health, and security of cloud-connected IoT devices.
  • Senseye PdM: Senseye PdM is a solution that provides predictive maintenance for industrial IoT systems. Senseye PdM can collect and analyze data from various sources such as sensors, gateways, and cloud services. Senseye PdM can also provide  dashboards, alerts, and insights into the condition, performance, and reliability of cloud-connected IoT devices.
  • SkySpark: SkySpark is a solution that provides analytics and automation for smart systems. SkySpark can collect and analyze data from various sources such as sensors, gateways, and cloud services. SkySpark can also provide dashboards, alerts, and insights into the performance, efficiency, and optimization of cloud-connected IoT devices.

Network monitoring for cloud-connected IoT devices is a vital and challenging task that requires a holistic and adaptive approach. Network monitoring can help to optimize the performance, reliability, and security of the network and its components. Network monitoring can also enable new capabilities and benefits for cloud-IoT applications, such as enhanced user experience, improved operational efficiency, and reduced costs.

Network Monitoring for Cloud-Connected IoT Devices Read More »

Cloud Native Security: Cloud Native Application Protection Platforms

Back in 2022, 77% of interviewed CIOs stated that their IT environment is constantly changing. We can only guess that this number, would the respondents be asked today, will be as high as 90%+. Detecting flaws and security vulnerabilities becomes more and more challenging in 2023 since the complexity of typical software deployment is exponentially increasing year to year. The relatively new trend of Cloud Native Application Protection Platforms (CNAPP) is now supported by the majority of cybersecurity companies, offering their CNAPP solutions for cloud and on-prem deployments.

CNAPP rapid growth is driven by cybersecurity threats, while misconfiguration is one of the most reported reasons for security breaches and data loss. While workloads and data move to the cloud, the required skill sets of IT and DevOps teams must also become much more specialized. The likelihood of an unintentional misconfiguration is increased because the majority of seasoned IT workers still have more expertise and got more training on-prem than in the cloud. In contrast, a young “cloud-native” DevOps professional has very little knowledge of “traditional” security like network segmentation or firewall configuration, which will typically result in configuration errors.

Some CNAPP are proud to be “Agentless” eliminating the need to install and manage agents that can cause various issues, from machine’ overload to agent vulnerabilities due to security flows and, guess what, due to the agent’s misconfiguration. Agentless monitoring has its benefits but it is not free of risks. Any monitored device should be “open” for such monitoring, typically coming from a remote server. If an adversary was able to fake a monitoring attempt, he can easily get access to all the monitored devices and compromise the entire network. So “agentless CNAPP” does not automatically mean a better solution than a competing security platform. Easier for maintenance by IT staff? Yes, it is. Is it more secure? Probably not.

Cloud Native Security: Cloud Native Application Protection Platforms Read More »

Predictive Networks: is it the Future?

Post-chatGPT Update as of May 26th, 2023:
Cisco and their EVP Liz Centoni have probably never been so wrong before in their useless predictions!

“Predictive Network” is a cool term but it goes down to some things that Cisco EVP Liz Centoni does not consider cool or trending anymore: Artificial Intelligence (AI) and Machine Learning (ML), which collect and analyze millions of network events, delivering problem-solving solutions. AI-based Predictive Networks, that by the way, are one of Liz’s 2023 “trends” predictions are contradicting her statement that

The cloud and AI are no longer frontiers

Obviously, Cisco’s EVP and Chief Strategy Officer Centoni refers to Cisco’s own Predictive Network product which, quoting Cisco now

 rely on a predictive engine in charge of computing (statistical, machine learning) models of the network using several telemetry sources

So how exactly AI is “no longer the frontier” Liz, if machine learning powers Predictive Networks that you predict to become a 2023 trend?

Predictive Networks: is it the Future? Read More »

Full Stack IT Observability Will Drive Business Performance in 2023

Cisco predicts that 2023 will be shaped by a few exciting trends in technology, including network observability with business correlation. Cisco’s EVP & Chief Strategy Officer Liz Centoni is sure that

To survive and thrive, companies need to be able to tie data insights derived from normal IT operations directly to business outcomes or risk being overtaken by more innovative competitors

and we cannot agree more.

Proper intelligent monitoring of digital assets along with distributed tracing should be tightly connected to the business context of the enterprise. Thus, any organization can benefit from actionable business insights while improving online and digital user experience for customers, employees, and contractors. Additionally, fast IT response based on artificial intelligence data analysis of monitored and collected network and assets events can prevent or at least provide fast remediation for the most common security threat that exists in nearly any modern digital organization: misconfiguration. 79% of firms have already experienced a data breach in the past 2 years, while 67% of them pointed to security misconfiguration as the main reason.

Misconfiguration of most software products can be timely detected and fixed with data collection and machine learning of network events and configuration files analyzed by network observability and network monitoring tools. An enterprise should require its IT departments to reach full stack observability, and connect the results with the business context. It is particularly important since we know that 99% of cloud security failures are customers’ mistakes (source: Gartner). Business context should be widely adopted as a part of the results delivered by intelligent observability and cybersecurity solutions.

Full Stack IT Observability Will Drive Business Performance in 2023 Read More »

Observability and Protection for Cloud Native Applications

Banks and other financial institutions are moving to the cloud. It is a slow process but the trend is here. Cloud computing business models give financial organizations flexibility to deploy pay-as-you-go cloud services. Furthermore, the cloud comes with built-in scalability so businesses react to market changes quickly. Pay-as-you-go infrastructure drastically reduces costs for banks and financial services institutions (BFSI), but then other questions raise. The first of these questions would be “is it secure to move my data and services to the cloud?”. Here network observability and AI-based network monitoring come to help, and particularly because financial institutions need to be compliant with regulations such as the PIPEDA.

MarketAndMarket report predicts that the market for cloud-native protection platforms will reach $19.3 billion by 2027. This is more than double from $7.8 billion in 2022 as estimated by the marketing firm. BFSI and other enterprises move to the cloud. This requires intelligent network observability and security solutions based on artificial intelligence and machine learning and thus such a rapid market growth at 19.9% CAGR in 2022-2027 seems to be a very reasonable assumption. Today AI-based observability and security solutions analyze hundreds of thousands of events a day. We should expect that the next generation of these software solutions will create and analyze a few orders of magnitudes of events daily, scaling up to tens to hundreds of millions of events a day for an average cloud-based BFSI organization. The report names a few market leaders, among them Check Point (Israel), Trend Micro (Japan), Palo Alto Networks (US), CrowdStrike (US), Fortinet (US), Forcepoint (US), Proofpoint (US), Radware (Israel), Zscaler (US).

Observability and Protection for Cloud Native Applications Read More »

Cloud Monitoring Market Size Estimations

According to a marketing study, the global IT infrastructure monitoring market is supposed to grow at 13.6% CAGR reaching USD $64.5 in 2031. Modern IT infrastructure becomes increasingly more complex and requires new skills from IT personnel, often blurring the borders between IT staff, DevOps, and development teams. With the continued move from on-prem deployments to the enterprise cloud, IT infrastructure goes to the cloud as well, and thus IT teams have to learn basic cloud-DevOps skills, such as scripting, cloud-based scaling, events creation, and monitoring. Furthermore, no company today offers a complete monitoring solution that can monitor any network device and software component.

Thus, IT teams have to build their monitoring solutions piece by piece, using various mostly not interconnected systems, developed by different, often competing vendors. For some organizations, it also comes to compliance, such as GDPR or ISO requirements, and to SLAs that obligate the IT department to timely detect, report, and fix any issue with their systems. In this challenging multi-system and multi-device environment, network observability becomes the key to enterprise success. IT organizations keep increasing their budgets seeking to reach the comprehensive cloud and on-prem monitoring for their systems and devices, and force the employees to run network and device monitoring software on their personal devices, such as mobile phones and laptops. This trend also increases the IT spend on cybersecurity solutions such as SDR and network security analysis with various SIEM tools.

Cloud Monitoring Market Size Estimations Read More »