Skip to content

My perspective: Unpacking SD-WAN

Lily Lyu 27 august, 2024

After several years of working with SD-WAN, and still curious about what’s coming next in this segment, it is probably interesting to share some thoughts for broader discussions. No matter you are MSP, integrators or enterprises, if you are thinking of building a SD-WAN network, this article may provide you some facts about SD-WAN.

 

The Promised Solution

Back in 2016/2017, SD-WAN was considered as a practical use case of the fancy concept - SDN Software-Defined Network and it was very promising by introducing centralized control and automation.

The market was promised the following benefits by adopting SD-WAN:

  • Centralized control, management and auto-provisioning - software-defined !

  • Diverse hardware options incl. uCPE and off-the-shelf hardware - cost-saving !

  • Decoupled underlay and overlay with whatever broadband (internet, MPLS, mobile etc.) - cost-saving !

  • Encrypted overlay VPN

  • Improved performance - application-based traffic steering over multiple underlays

  • Direct internet breakout to SaaS/Cloud

  • Easier cloud integration

  • Native zero-touch provisioning (ZTP) solution

Business-driven/application-based policies was the focus and highlighted by every vendor. Compared to traditional QoS using special bits in Layer-2 frame or Layer-3 packets, SD-WAN enables layer-7 application awareness. QoS can now be applied at application or application group level, which ensures that mission-critical applications always have the necessary resources for their operation.

Security and WAN optimization were also popular features in discussions.

Different CPE options were proposed. uCPE was often in discussion back then to onboard various VNFs (Virtualized Network Function) into the SD-WAN fabric and service chain.

Not like today when many players are already using SD-WAN and trying to add more balls into the game, the focus back then was to share real-life deployment cases and experiences and prove to the market that SD-WAN was real and not just in marketing presentations.

Security was not in the center of the game back then, and SASE-term was only invented by Gartner in 2019.

 

What You Should Consider When Choosing SD-WAN

Not "one size fits all"

The promised benefits have been proved to be valid over the past few years. So if you are seeking for any of the above benefits and an alternative way of building your network, SD-WAN is a smart option.

However, it is also important to understand that there are cases where MPLS/IPVPN may still be required.

For example, for organizations that needs end-to-end Quality of Service (QoS) for sensitive applications, SD-WAN may not be suitable if the underlay is internet-based (best effort). MPLS underlay will be required in this case. The cost-saving by using internet may not be applicable in this case. However, SD-WAN can still be considered as the overlay solution on top of MPLS, and at the same time provides you all the flexibilities SD-WAN comes with.

In practice, SD-WAN is suitable for:

  • Sites that needs multiple underlay and intelligent path selection, for example, one or two fixed connections plus mobile backup

  • Sites that needs fast delivery

  • Sites that are not permanent

  • Sites that are crossing continents while MPLS is not cost-efficient and WAN optimization is crucial

  • Organizations that wish for an automated service delivery stack and API integration

What Do ‘You’ Actually Need

Which segment do your company falls into, is also important for SD-WAN solution selection:

  • MSP

  • Integrators

  • Enterprises

There are factors which may be crucial for MSPs and integrators when evaluating the SD-WAN fabric, for example:

  • Licensing model - Pay-As-You-Grow or pre-ordered licenses

  • CPE options and lead time - vendor-manufactured CPE or 3rd party CPEs

  • Virtualization options of the SD-WAN fabric - both CPE and the control&management plane

  • Complexity of setting up the control&management plane

  • Multi-tenancy of the control&management plane

  • High availability/Disaster Recovery of the control&management plane - how long can you run the network if something happens to the control&management plane

  • Scalability of the control&management plane - how many tenants and devices can be technically supported on the platform, and what is the recommended number in reality to secure a stable operation

  • Hardware performance and management overhead - SD-WAN VPNs are typically established on top of encrypted tunnel, which means certain bandwidth of the underlay pipe will be used for overhead.

  • Native integration for accessing public cloud and what are the options

  • Templating and variables to simplify large scale deployment

  • Role-based access control

  • API integration to OSS/BSS systems

For small and medium enterprises, some of the above factors may not be very relevant, for example, multi-tenancy or scalability.

Some classic networking features may need to be considered to support diverse use cases:

  • High availability

  • Dynamic routing protocols

  • Multi-VPN

  • Flexible topology per VPN

  • IPv6 support

  • Layer-2

  • Wireless LAN

There are also cloud-delivered SD-WAN options, which may be suitable if you have limited number of sites and less advanced requirements. In this case, there is no need to establish an on-prem control&management plane. You will just need the SD-WAN CPEs as well as a tenant in the vendor cloud.

 

 

Lily article 2

The ‘Internet’ Factor

SD-WAN allows you to decouple underlay and overlay, and gives you opportunity to utilize internet (fixed or mobile) as underlay and an agile approach for traffic steering.

Since internet comes with a native ALLOW-ANY behavior, you may need to consider how you would like to secure your underlay and overlay.

Traditionally the private networks visit internet via a centralized egress gateway and NAT and typically a firewall is placed at the central location. In SD-WAN, Direct Internet Access (DIA) is a common breakout method and users will access the internet from each distributed edge locations instead of traverse all the way to a central location. This would also reduce the bandwidth and load on HUB locations when more and more business services and applications are hosted in cloud.

Therefore, each edge location will need protection against threats from internet. Stateful firewall and NAT may not be enough. Next Generation Firewall (NGFW) at edge should be considered.

The overlay is typically encrypted VPN so user data is protected. However, the underlay internet connection itself may still need additional security against threats from internet, for example, DDoS

 

The "Zero" Factor

How "Zero" the ZTP process is

Every vendor promises ZTP as the preferred SD-WAN onboarding method, which is also a big benefit by using SD-WAN. When evaluating the ZTP solution, details should be considered, and the ZTP mechanism should be part of the Proof of Concept with consideration of how you would use it in real deployment.

The following questions should be asked when you evaluate the ZTP process:

  • Is the ZTP solution native from vendor solution and available globally, or do the organizations need to setup any local infrastructure for devices onboarding. Typically there is no need to setup any management elements on-prem. A DHCP-based internet connection and device factory-default configuration should be enough to onboard a brand-new device to the tenant’s management plane. If it is a global infrastructure, consider the registration point, latency, response time and status visibility

  • What is the authentication method between the device and the management plane? Typical methods are via device serial number and device certificates. Consider how you will enroll the serial numbers and how smooth and automatic this process can be.

  • When the management plane is not reachable in an ongoing ZTP task, how will the device proceed? Will it retry every certain interval, or will it stop after a certain period and then you need to reboot the device to initiate the process again? Besides, how do you tell if a device on site is onboarding as expected?

  • How much bandwidth do you have for the underlay connection? SW upgrade is typically part of the onboarding process if you choose to do the auto upgrade, and the size of software package compared to the underlay connection should be considered as onboarding time per site could be an important factor.

  • How easy it is to reset the device and reinitiate the ZTP process if the ZTP process fails?

What else?

  • Evaluate ZTP process when underlay is MPLS instead of internet

  • NAT traversal should be a built-in feature for locations with NAT in the path

 

When things don’t go as expected …

It is wise to evaluate the ZTP process in a very detailed method and establish your own ZTP manual covering ordering, logistics, onboarding and service validation. The manual should consider the remediation methods when things do not go right.

A robust ZTP solution should have a clean and straightforward ZTP process with lowest error rate and user-friendly restoration methods. It is costly if the CPE turns into a black box without giving any clear indication of error, while the engineer stands on site wondering what is happening with the onboarding after 30 minutes waiting.

 

The ‘Auto’ Factor

In SD-WAN solution, site to site VPNs are typically established automatically after onboarding, following the predefined templates in the management plane. It can be a big advantage compared to manual deployment of traditional VPN solutions.

 

Configuration building blocks

SD-WAN configuration management typically provides you options like templates and groups. It can greatly simplify the SD-WAN setup if utilized properly:

  • What do you perceive as unified configuration

  • How do you group the locations sharing unified configuration

  • How do you add site-specific configurations

  • How much flexibility do you have in terms of site-specific variables

With a proper designed templates and groups, automation can be achieved on top, for example, via script.

 

Further look at the ‘Auto’ factor

SD-WAN solution typically uses site to site monitors for underlay and application quality. It is normally low volume and should not be a critical factor for fixed underlay connections. It is also typically conducted with some predefined intervals.

However, if the underlay is mobile, you may want to look at the data volume overtime.

How the site to site VPNs and site to site monitors are established over a mobile connection should be evaluated, and the candidate SD-WAN solution should provide enough flexibility to fine tune the data usage.

 

The ‘Box’

In SD-WAN solution, a Customer Premise Equipment (CPE) is the demarcation point between LAN and WAN/ISP infrastructure. Functionality wise, it typically provides switching, routing, cellular access as well as security features in one box.

 

What type of CPE would you choose

There are different options in the market and these options are typically related to where the SD-WAN vendors come from.

You can consider the classic way - deploy SD-WAN using routers from your trusted networking vendors and run SD-WAN on top. This is a valid option as you may have years of experience already with the hardware models. SD-WAN capability and management would need to be evaluated in this option, if it covers your business needs.

If you choose a SD-WAN vendor with a native focus on SD-WAN software, you may have options to use both the vendor-manufactured CPE as well as 3rd party CPEs running the vendor’s SD-WAN software. In this case, in contract to white box, grey-box solution can also be a valid option. In this approach, the 3rd party CPEs can be certified by the SD-WAN vendor and used just as the vendor-manufactured CPEs. In short, you may have more flexibility in terms of hardware cost, lead time and vendor local presence.

Please also note that the hardware cost factor should be considered together with the licensing model as different vendor has different approaches here.

Another option is uCPE (Universal Customer Premises Equipment ). Several factors may need to be considered:

  • Resource required to run various VNFs on top of the CPE

  • Service chain performance with added VNFs

  • Licensing of the VNFs - how to you purchase these VNFs and licenses

  • Management of the VNFs - will it be the unified management or via 3rd party portal

 

Yes, Things Can Go Wrong

It’s normally not a crisis when things go wrong, as it is part of our life being network engineers.

However, the operations of a SD-WAN network is not necessarily as straightforward as you may think.

Some solutions gives you automatic IPSec VPN with basic routing and security options.

Some solutions, however, introduce centralized controller function into the SD-WAN fabric and a bunch of add-on features. These features are of great use for customized deployment scenarios, but they may also look ‘overkill’ for organizations that are new to SD-WAN.

Where do you start

Here I’d like to touch the ‘auto’ factor from a different angle.

In SD-WAN, automation is typically adopted, for example, in ZTP, Overlay VPN and SD-WAN control&management. And the application-based traffic steering automatically follows the business policy when the path quality meet defined criteria.

To do proper troubleshooting, you will need to take a deeper look at the SD-WAN fabric and understand what is automated and how it is automated. The SD-WAN architecture of candidate solutions and how the control&management plane interacts with the managed devices, should be reviewed thoroughly. This will enable you to perform efficient troubleshooting and integration in the right angle.

How to you proceed

CLI is the classic and natural method all network engineers would like to start with in troubleshooting - to view the interface status, the optical module status, the recent alarms and the routing table etc. However, we have been presented now the GUI-based approach from SD-WAN, persuading that you can use the GUI-based tools instead.

It is often true. While in SD-WAN you do need to use the management GUI instead of CLI to perform service provisioning, so you have all your configuration on the GUI. You do also need to get familiar with the analytics and troubleshooting tools, which gives you a lot of useful information, for example, link/application quality statistics, logs and alarms.

However, over time I have experienced that CLI is still an important method for in-depth troubleshooting. Though the GUI provides you a lot of information, you sometimes still need to interact with the OS directly via CLI to locate the root cause. And again, you will need in-depth understanding of the SD-WAN fabric to interpret the output of CLI.

 

As a user, do you have the right tools

SD-WAN is typically vendor-developed solutions covering hardware, software and centralized control&management plane. You do have APIs to do integration and options to buy 3rd party CPEs. However the software for the control&management plane is typically not that open, meaning that it typically works only between the vendor certified CPEs and control&management plane. On the other hand, it is often the vendors that have the most hardcore knowledge of how the solution works at the very bottom.

Thus, there are different aspects that should be considered:

  • Easy access to documentation and knowledge base about how the SD-WAN fabric works

  • Built-in troubleshooting tools

  • Tech support and response time

  • A clear roadmap and communication method

 

Organization Transition
What SD-WAN does not change

SD-WAN provides an alternative way of building the network and how the network can be delivered. It provides built-in automation in terms of device onboarding and VPN enabling. It does introduce some vendor proprietary mechanisms or protocols especially between managed devices and the centralized control&management plane.

However, it does not change the very essence of networking

  • Switching is still switching

  • Routing is still routing

  • VPN is still VPN

  • Security is still security

All the classis networking knowledge and skills are still very valuable in SD-WAN.

You will need, however, to adapt to the SD-WAN way of doing things and follow the provisioning logic.

What SD-WAN requires

  • SD-WAN knowledge enabling

The SD-WAN architecture itself needs proper knowledge enabling, especially for the network and integration teams. A transition is required in terms of user habit, delivery and operation.

  • Application-based policies

As mentioned earlier, SD-WAN enables layer-7 application awareness. Policies like QoS are now be applied at application or application group level, which ensures that mission-critical applications always have the necessary resources for their operation. This difference should be considered in contrast to traditional QoS.

  • Security, not just networking

SD-WAN typically brings networking and security together in the solution package, especially when internet is the underlay type. Knowledge and skills should be prepared for this shift. In routing and switching, traffic is allowed by default unless specifically denied by ACL, if there is a valid route. However, in firewall world, traffic is typically denied implicitly unless specifically allowed, even though there is a valid route. You will need to look over Layer 4 and build your services up the applications layer when security is introduced.

  • Cloud

Organizations nowadays typically have a hybrid infrastructure. SD-WAN solution provides an agile approach to connect your on-prem infrastructure to cloud. This is typically done via a VPN or vCPE approach, but the integration and validation may still require certain level knowledge and skills.

 

Where is the boundary

Management

When it comes to underlay, as said the enterprises can choose whatever underlay available to build SD-WAN. It raises some management demarcation questions:

  • Do you wish to operate the SD-WAN solution yourself, or purchase a managed SD-WAN solution?

  • Do you prefer a local managed SD-WAN solution, or a cloud-delivered SD-WAN solution?

Different options may put different demands on the organization. As described above, the organization should evaluate own business needs and prepare the necessary knowledge and skills.

 

What to put in the ‘SD-WAN’ basket

Another question is the boundary of SD-WAN solutions. Over the years, more and more integration and consolidation has been observed:

  • SD-WAN only

  • SD-WAN and LAN

  • SD-WAN and Security

  • SD-WAN and SSE - Full SASE solution

The industry is evolving fast in a dynamic landscape. There is not necessarily a one-size-fits-all approach. So it is important to identify your own business needs and evaluate the candidate solutions following your own criteria.

 

What NOW

SD-WAN market has been very interesting and dynamic over the years. We have observed different SD-WAN startups been acquired by big telco vendors, and new players and solutions joining the journey along the way. A lot of new names run into the market without networking peoples knowing them from before.

We have been very familiar with the big-name networking vendors, but in SD-WAN the game is more dynamic and diverse. Some vendors come from WAN optimization, some come from IT and software, some come from white box manufacturing and some come from network security or cloud security. Some, as a very interesting approach, provides SD-WAN on top of open source.

I remember being curious every year about what would be the new keywords in SD-WAN events, after SD-WAN has been a more and more mature solution and adopted by many organizations. Over the years, we have been educated different keywords like uCPE, cloud, AI/ML and 5G. Nowadays, SASE is probably the most popular keyword in the SD-WAN domain.

And yes, it is happening NOW.

 

SASE

Gartner has introduced this term in 2019 and hot discussions have been ongoing since then with an attempt to bring SD-WAN and SSE together and provide consistent data protection and security policies.

The security vendors seem to play a key role here as the wind blows towards security.

You may consider a single-vendor SASE covering both SD-WAN and SSE, but the list may not be long. Alternatively, you could consider a mixed vendor selection with integration between SD-WAN and SSE. Many vendors have established partnership to be better positioned in the SASE landscape.

This trend will indeed bring additional requirements to the organizations that are interested in this solution. You will need to establish your organization’s security framework and prepare for the potential SASE shift. An important question is how do you define ‘Trust’, as in SASE term ‘zero-trust’ is the fundamental principle, no matter it is internal or external users or devices. Least privilege will be granted and user activities will be monitored.

SSE brings in a different model of security delivery than the traditional on-prem security by introducing the following modules:

  • ZTNA

  • CASB

  • SWG

  • FWaaS

  • Enterprise DLP

Additionally, you may want to bring in Endpoint Security to complete your security framework as it provides device security postures which could be valuable input to your SASE solution fabric. Since SASE normally brings in a client agent as well and it is a natural move to consider integration.