AWS vs Azure: Serverless Observability and Monitoring

Serverless computing is a cloud service model that allows developers to run code without provisioning or managing servers. Serverless applications are composed of functions that are triggered by events and run on demand. Serverless computing offers many benefits, such as scalability, performance, cost-efficiency, and agility.

However, serverless computing also introduces new challenges for observability and monitoring. Observability is the ability to measure and understand the internal state of a system based on the external outputs. Monitoring is the process of collecting, analyzing, and alerting on the metrics and logs that indicate the health and performance of a system.

Observability and monitoring are essential for serverless applications because they help developers troubleshoot issues, optimize performance, ensure reliability, and improve user experience. However, serverless applications are more complex and dynamic than traditional applications, making them harder to observe and monitor.

Some of the challenges of serverless observability and monitoring are:

  • Lack of visibility: Serverless functions are ephemeral and stateless, meaning they are created and destroyed on demand, and do not store any data or context. This makes it difficult to track the execution flow and dependencies of serverless functions across multiple services and platforms.
  • High cardinality: Serverless functions can have many variations based on input parameters, environment variables, configuration settings, and runtime versions. This creates a high cardinality of metrics and logs that need to be collected and analyzed.
  • Distributed tracing: Serverless functions can be triggered by various sources, such as HTTP requests, messages, events, timers, or other functions. This creates a distributed tracing problem, where developers need to correlate the traces of serverless functions across different sources and services.
  • Cold starts: Serverless functions can experience cold starts, which are delays in the execution time caused by the initialization of the function code and dependencies. Cold starts can affect the performance and availability of serverless applications, especially for latency-sensitive scenarios.
  • Cost optimization: Serverless functions are billed based on the number of invocations and the execution time. Therefore, developers need to monitor the usage and cost of serverless functions to optimize their resource allocation and avoid overspending.

AWS and Azure are two of the leading cloud providers that offer serverless computing services. AWS Lambda is the serverless platform of AWS, while Azure Functions is the serverless platform of Azure. Both platforms provide observability and monitoring features for serverless applications, but they also have some differences and limitations.

In this article, we will compare AWS Lambda and Azure Functions in terms of their observability and monitoring capabilities, including their native features and third-party software reviews and recommendations.

Native Features

Both AWS Lambda and Azure Functions provide native features for observability and monitoring serverless applications. These features include:

  • Metrics: Both platforms collect and display metrics such as invocations, errors, duration, memory usage, concurrency, and throughput for serverless functions. These metrics can be viewed on dashboards or queried using APIs or CLI tools. Metrics can also be used to create alarms or alerts based on predefined thresholds or anomalies.
  • Logs: Both platforms capture and store logs for serverless functions. These logs include information such as start and end time, request ID, status code, error messages, custom print statements, etc. Logs can be viewed on consoles or queried using APIs or CLI tools. Logs can also be streamed or exported to external services for further analysis or retention.
  • Tracing: Both platforms support distributed tracing for serverless functions. Distributed tracing allows developers to track the execution flow and latency
    of serverless functions across different sources and services. Tracing can help identify bottlenecks errors, failures or performance issues in serverless applications.

Both platforms use open standards such as OpenTelemetry or W3C Trace Context for tracing. However, there are also some differences between AWS Lambda and Azure Functions in terms of their native features for observability and monitoring.

Some of these differences are:

  • Metrics granularity: AWS Lambda provides metrics at a 1-minute granularity by default while Azure Functions provides metrics at a 5-minute granularity by default
    However, both platforms allow users to change the granularity to a lower or higher level depending on their needs
  • Metrics aggregation: AWS Lambda aggregates metrics by function name function version or alias (if specified), region (if specified) or globally (across all regions). Azure Functions aggregates metrics by the function name (or function app name), region (if specified) or globally (across all regions).
  • Logs format: AWS Lambda logs are formatted as plain text with a timestamp prefix. Azure Functions logs are formatted as JSON objects with various fields such as timestamp, level, message, category, functionName, invocationId, etc.
  • Logs retention: AWS Lambda logs are stored in Amazon CloudWatch Logs service for 90 days by default (or longer if specified by users). Azure Functions logs are stored in Azure Monitor service for 30 days by default (or longer if specified by users)
  • Tracing integration: AWS Lambda integrates with AWS X-Ray service for tracing. AWS X-Ray provides a web console and an API for viewing traces and analyzing the performance of serverless applications on AWS. Azure Functions integrates with Azure Application Insights service for tracing. Azure Application Insights provides a web console and an API for viewing traces and analyzing the performance of serverless applications on Azure.

AWS vs Azure: Serverless Observability and Monitoring Read More »

Cloud Native Security: Cloud Native Application Protection Platforms

Back in 2022, 77% of interviewed CIOs stated that their IT environment is constantly changing. We can only guess that this number, would the respondents be asked today, will be as high as 90%+. Detecting flaws and security vulnerabilities becomes more and more challenging in 2023 since the complexity of typical software deployment is exponentially increasing year to year. The relatively new trend of Cloud Native Application Protection Platforms (CNAPP) is now supported by the majority of cybersecurity companies, offering their CNAPP solutions for cloud and on-prem deployments.

CNAPP rapid growth is driven by cybersecurity threats, while misconfiguration is one of the most reported reasons for security breaches and data loss. While workloads and data move to the cloud, the required skill sets of IT and DevOps teams must also become much more specialized. The likelihood of an unintentional misconfiguration is increased because the majority of seasoned IT workers still have more expertise and got more training on-prem than in the cloud. In contrast, a young “cloud-native” DevOps professional has very little knowledge of “traditional” security like network segmentation or firewall configuration, which will typically result in configuration errors.

Some CNAPP are proud to be “Agentless” eliminating the need to install and manage agents that can cause various issues, from machine’ overload to agent vulnerabilities due to security flows and, guess what, due to the agent’s misconfiguration. Agentless monitoring has its benefits but it is not free of risks. Any monitored device should be “open” for such monitoring, typically coming from a remote server. If an adversary was able to fake a monitoring attempt, he can easily get access to all the monitored devices and compromise the entire network. So “agentless CNAPP” does not automatically mean a better solution than a competing security platform. Easier for maintenance by IT staff? Yes, it is. Is it more secure? Probably not.

Cloud Native Security: Cloud Native Application Protection Platforms Read More »

Machine Learning for Network Security, Detection and Response

Cybersecurity is the defense mechanism used to prevent malicious attacks on computers and electronic devices. As technology becomes more advanced, it will require more complex skills to detect malicious activities and computer networks’ flaws. This is where machine learning can help.

Machine learning is a subset of artificial intelligence that uses algorithms and statistical analysis to make assumptions about a computer’s behavior. It can help organizations address new security challenges, such as scaling up security solutions, detecting unknown and advanced attacks, and identifying trends and anomalies. Machine learning can also help defenders more accurately detect and triage potential attacks, but it may bring new attack surfaces of its own.

Machine learning can be used to detect malware in encrypted traffic, find insider threat, predict “bad neighborhoods” online, and protect data in the cloud by uncovering suspicious user behavior. However, machine learning is not a silver bullet for cybersecurity. It depends on the quality and quantity of the data used to train the models, as well as the robustness and adaptability of the algorithms.

A common challenge faced by machine learning in cybersecurity is dealing with false positives, which are benign events that are mistakenly flagged as malicious. False positives can overwhelm analysts and reduce their trust in the system. To overcome this challenge, machine learning models need to be constantly updated and validated with new data and feedback.

Another challenge is detecting unknown or zero-day attacks, which are exploits that take advantage of vulnerabilities that have not been discovered or patched yet. Traditional security solutions based on signatures or rules may not be able to detect these attacks, as they rely on prior knowledge of the threat. Machine learning can help to discover new attack patterns or adversary behaviors by using techniques such as anomaly detection, clustering, or reinforcement learning.

Anomaly detection is the process of identifying events or observations that deviate from the normal or expected behavior of the system. For example, machine learning can detect unusual network traffic, login attempts, or file modifications that may indicate a breach.

Clustering is the process of grouping data points based on their similarity or proximity. For example, machine learning can cluster malicious domains or IP addresses based on their features or activities, and flag them as “bad neighborhoods” online.

Reinforcement learning is the process of learning by trial and error, aiming to maximize a cumulative reward. For example, machine learning can learn to optimize the defense strategy of a system by observing the outcomes of different actions and adjusting accordingly.

Machine learning can also leverage statistics, time, and correlation-based detections to enhance its performance. These indicators can help to reduce false positives, identify causal relationships, and provide context for the events. For example, machine learning can use statistical methods to calculate the probability of an event being malicious based on its frequency or distribution. It can also use temporal methods to analyze the sequence or duration of events and detect anomalies or patterns. Furthermore, it can use correlation methods to link events across different sources or domains and reveal hidden connections or dependencies.

Machine learning is a powerful tool for cybersecurity, but it also requires careful design, implementation, and evaluation. It is not a one-size-fits-all solution, but rather a complementary approach that can augment human intelligence and expertise. Machine learning can help to properly navigate the digital ocean of incoming security events, particularly where 90% of them are false positives. The need for real-time security stream processing is now bigger than ever.

Machine Learning for Network Security, Detection and Response Read More »

Gartner: “it is the user, not the cloud provider” who causes data breaches

Gartner’s recommendations on cloud computing strategy open the rightful discussion on the roles and responsibilities of different actors involved in cloud security. How many security and data breaches happen due to Cloud Service Providers (CSP) flaws, and how many of them are caused by CSP’s customers and human beings dealing with the cloud on a daily base? Gartner predicts that through 2025 99% of cloud security failures will be the customer’s fault. Such a prediction can only be based on the current numbers that obviously should demonstrate that the vast majority of breaches come due to CSP clients’ issues.

Among other reason, the first place is taken by data breaches coming from misconfiguration of the cloud environment and security flaws in software that were missed by DevOps and IT teams working in the cloud.

While the workloads and data keep moving to the cloud, DevOps and IT teams often lack the required skill sets to properly configure and maintain cloud-based software. The likelihood of an unintentional misconfiguration is increased because the majority of seasoned IT workers have significantly more expertise and training with on-premises security than they do with the cloud. While younger, less experienced workers may be more acclimated to publishing data to the cloud, they may not be as familiar with dealing with security, which might result in configuration errors.

Some of the team members have near heard of the Roles Based Access Control (RBAC) principle and will have real trouble working in the cloud like AWS being required to properly set up IAM users and IAM roles for each software component and service. These DevOps and IT engineers need to take intensive training to close the cloud security gap. Until it is done the enterprise will keep struggling from improper configuration, production failures and periodic security breaches.

Simple solutions like a firewall can add an additional degree of security for data and workloads, either for on-prem, hybrid, or pure cloud deployments. And yet, even simple things like that add another dimension of IT complexity and risk due to possible misconfiguration because of a human mistake or a vulnerable historical software package.

Gartner: “it is the user, not the cloud provider” who causes data breaches Read More »

Predictive Networks: is it the Future?

Post-chatGPT Update as of May 26th, 2023:
Cisco and their EVP Liz Centoni have probably never been so wrong before in their useless predictions!

“Predictive Network” is a cool term but it goes down to some things that Cisco EVP Liz Centoni does not consider cool or trending anymore: Artificial Intelligence (AI) and Machine Learning (ML), which collect and analyze millions of network events, delivering problem-solving solutions. AI-based Predictive Networks, that by the way, are one of Liz’s 2023 “trends” predictions are contradicting her statement that

The cloud and AI are no longer frontiers

Obviously, Cisco’s EVP and Chief Strategy Officer Centoni refers to Cisco’s own Predictive Network product which, quoting Cisco now

 rely on a predictive engine in charge of computing (statistical, machine learning) models of the network using several telemetry sources

So how exactly AI is “no longer the frontier” Liz, if machine learning powers Predictive Networks that you predict to become a 2023 trend?

Predictive Networks: is it the Future? Read More »

Full Stack IT Observability Will Drive Business Performance in 2023

Cisco predicts that 2023 will be shaped by a few exciting trends in technology, including network observability with business correlation. Cisco’s EVP & Chief Strategy Officer Liz Centoni is sure that

To survive and thrive, companies need to be able to tie data insights derived from normal IT operations directly to business outcomes or risk being overtaken by more innovative competitors

and we cannot agree more.

Proper intelligent monitoring of digital assets along with distributed tracing should be tightly connected to the business context of the enterprise. Thus, any organization can benefit from actionable business insights while improving online and digital user experience for customers, employees, and contractors. Additionally, fast IT response based on artificial intelligence data analysis of monitored and collected network and assets events can prevent or at least provide fast remediation for the most common security threat that exists in nearly any modern digital organization: misconfiguration. 79% of firms have already experienced a data breach in the past 2 years, while 67% of them pointed to security misconfiguration as the main reason.

Misconfiguration of most software products can be timely detected and fixed with data collection and machine learning of network events and configuration files analyzed by network observability and network monitoring tools. An enterprise should require its IT departments to reach full stack observability, and connect the results with the business context. It is particularly important since we know that 99% of cloud security failures are customers’ mistakes (source: Gartner). Business context should be widely adopted as a part of the results delivered by intelligent observability and cybersecurity solutions.

Full Stack IT Observability Will Drive Business Performance in 2023 Read More »

Black Hats Matter

It is 2023 and we are a few years now since “black lives matter” movement started.

black lives matter

Databases in a cluster are not called “master” and “slave” anymore, and the “blacklist” word for the list of issues is also vaporizing from the lexicons of all reasonable people. The biggest corporations in the US and worldwide are working on fixing the misuse of improper terms in their technology and documentation.

Yet, this Dark Reading article by Jonathan Care decided to bring “Black” word back to mark the evil in their article’s title on December 30, 2022. Yep, it is just a few weeks ago, and that’s why it is triple weird my dear DarkReading editors!

Do not be confused: the word “Black” is used in this article in a very negative context representing the vulnerability of various APIs due to security issues.

DarkReading should be ashamed

The above screenshot represents the original Dark Reading article by Jonathan Care as of December 2022.

Dark Reading should do a much better job while selecting their authors and proofreading texts. This title smells bad! And IMHO this particular article should be removed from the public domain. Shame on you, Dark Reading and Mr. Jonathan Clare, the writer.

Jonathan Care

Please meet Jonathan Care, a Contributing Writer at Dark Reading whose article voted against basic modern community principles like diversity and inclusion. He should have thought twice about his article’s title, but he did not. Probably, he is not exactly a “thinker” type, is he?

Black Hats Matter Read More »

Observability and Protection for Cloud Native Applications

Banks and other financial institutions are moving to the cloud. It is a slow process but the trend is here. Cloud computing business models give financial organizations flexibility to deploy pay-as-you-go cloud services. Furthermore, the cloud comes with built-in scalability so businesses react to market changes quickly. Pay-as-you-go infrastructure drastically reduces costs for banks and financial services institutions (BFSI), but then other questions raise. The first of these questions would be “is it secure to move my data and services to the cloud?”. Here network observability and AI-based network monitoring come to help, and particularly because financial institutions need to be compliant with regulations such as the PIPEDA.

MarketAndMarket report predicts that the market for cloud-native protection platforms will reach $19.3 billion by 2027. This is more than double from $7.8 billion in 2022 as estimated by the marketing firm. BFSI and other enterprises move to the cloud. This requires intelligent network observability and security solutions based on artificial intelligence and machine learning and thus such a rapid market growth at 19.9% CAGR in 2022-2027 seems to be a very reasonable assumption. Today AI-based observability and security solutions analyze hundreds of thousands of events a day. We should expect that the next generation of these software solutions will create and analyze a few orders of magnitudes of events daily, scaling up to tens to hundreds of millions of events a day for an average cloud-based BFSI organization. The report names a few market leaders, among them Check Point (Israel), Trend Micro (Japan), Palo Alto Networks (US), CrowdStrike (US), Fortinet (US), Forcepoint (US), Proofpoint (US), Radware (Israel), Zscaler (US).

Observability and Protection for Cloud Native Applications Read More »

Cloud Monitoring Market Size Estimations

According to a marketing study, the global IT infrastructure monitoring market is supposed to grow at 13.6% CAGR reaching USD $64.5 in 2031. Modern IT infrastructure becomes increasingly more complex and requires new skills from IT personnel, often blurring the borders between IT staff, DevOps, and development teams. With the continued move from on-prem deployments to the enterprise cloud, IT infrastructure goes to the cloud as well, and thus IT teams have to learn basic cloud-DevOps skills, such as scripting, cloud-based scaling, events creation, and monitoring. Furthermore, no company today offers a complete monitoring solution that can monitor any network device and software component.

Thus, IT teams have to build their monitoring solutions piece by piece, using various mostly not interconnected systems, developed by different, often competing vendors. For some organizations, it also comes to compliance, such as GDPR or ISO requirements, and to SLAs that obligate the IT department to timely detect, report, and fix any issue with their systems. In this challenging multi-system and multi-device environment, network observability becomes the key to enterprise success. IT organizations keep increasing their budgets seeking to reach the comprehensive cloud and on-prem monitoring for their systems and devices, and force the employees to run network and device monitoring software on their personal devices, such as mobile phones and laptops. This trend also increases the IT spend on cybersecurity solutions such as SDR and network security analysis with various SIEM tools.

Cloud Monitoring Market Size Estimations Read More »

Strategies to Combat Emerging Gaps in Cloud Security

As cloud clients input 2023 with a hybrid presence in multiple clouds, they work on prioritizing techniques to fight rising gaps in cloud security.

Most big agencies are getting access to cloud offerings in numerous public clouds, whilst preserving organization structures and personal clouds of their company’s facts centers.

One of the ways of closing these gaps in security could be adopting deep observability. We have already reviewed a few Deep Observability providers such as Gigamon. While Gigamon probably can be considered a current market leader in this relatively new and small market with under $2B annual market size, they still should watch out for the newcomers who come with shiny new products and great technologies under the hood.

CtrlStack is one of these startups and they recently got a second round of funding from Lightspeed VC, led by Kearny Jackson and Webb Investment Network.

The delivery of features and applications by today’s digital-first companies and developers is accelerating. Teams from information technology operations and software development must collaborate closely to do this, forming a practice known as DevOps. When events occur, they may involve any number of digital environment systems, including operations, infrastructure, code, or any combination of modifications made to any of them.

The CtrlStack platform connects cause and effect to make troubleshooting easier and incident root cause analysis faster by tracking relationships between components in a customer’s systems. Developers and engineers can solve problems quickly by giving DevOps teams the tools they need.

By forming an understanding graph of all of the infrastructure, interconnected offerings, and impact, CtrlStack can supply the full picture while capturing the devices’ modifications and relationships throughout the whole device stack. Using CtrlStack product DevOps groups can view dependencies, measure the impact of modifications and examine occasions in actual time.

Key capabilities of the platform encompass an occasion timeline that permits groups to browse and clear out out extrade occasions, without having to sift via log documents or survey users, and a visual representation that offers insights into operational data. Both of those capabilities additionally force dashboards for builders and DevOps groups.

Developers can also access their dashboards that give visibility for any modifications to code commits, configuration documents, or function flags, – all in one click. DevOps groups get a dashboard for root reason evaluation that permits them to seize all of the context for the time being they came about with a searchable timeline of dependencies displaying the whole impacted topology and impacted metrics.

Strategies to Combat Emerging Gaps in Cloud Security Read More »