The Problem with Perimeter Security
For decades, network security followed a castle-and-moat architecture: a hardened perimeter (firewalls, DMZs, VPN gateways) protecting a trusted interior. If you were inside the perimeter, you had broad access. If you were outside, you did not. This model breaks down for three reasons that are now obvious but were once controversial:
First, cloud workloads have no meaningful perimeter. An Azure Storage account is accessible from any network by default. A SQL Database exposes a public endpoint. The interior/exterior distinction vanishes when your infrastructure spans dozens of regions and hundreds of subscriptions.
Second, lateral movement is the primary attack vector in modern breaches. Once an attacker compromises a single workload (phishing, supply chain, misconfiguration), they move laterally through the flat internal network. A perimeter-only model offers no resistance to this movement.
Third, the blast radius of a compromised identity is unbounded in a perimeter model. A single set of leaked credentials can access every resource the network can reach.
Zero Trust replaces implicit trust with explicit verification at every layer: identity, device, network, application, and data. In network security specifically, this means every flow must be explicitly authorized, every resource must be isolated by default, and every access decision must be logged and auditable.
Layer 1: Network Security Perimeter (NSP)
Network Security Perimeter is Azure's newest network security primitive. It creates a logical boundary around PaaS resources that are deployed outside your virtual networks. The core insight behind NSP: Private Link secures the data plane (how you connect to resources), but NSP secures the control plane (who can access resources and from where).
How NSP Works
When you create a Network Security Perimeter and associate PaaS resources with it in enforced mode, all public network access is denied by default. Resources within the same perimeter can communicate freely with each other (intra-perimeter traffic), but any traffic crossing the perimeter boundary requires an explicit access rule.
This is the critical difference from traditional NSG-based security: NSP operates at the PaaS resource level, not the virtual network level. You can secure an Azure Storage account, Key Vault, or Azure AI Search instance without deploying it into a VNet or configuring Private Endpoints for every consumer.
NSP prevents a common attack pattern: a compromised workload writing sensitive data to an attacker-controlled storage account. Because the attacker's storage account is outside the perimeter, the write fails. Traditional NSGs cannot prevent this because they operate at the IP/port level, not the resource identity level.
NSP Components
| Component | Purpose | Scope |
|---|---|---|
| Perimeter | Top-level logical boundary defining the trusted resource group | Subscription / Resource Group |
| Profile | Collection of access rules applied to associated resources | Per-perimeter |
| Access Rule | Inbound/outbound rules allowing traffic across the boundary | Per-profile |
| Resource Association | Binding a PaaS resource to a perimeter (with access mode) | Per-resource |
| Diagnostic Settings | Access logs and metrics for audit and compliance | Per-perimeter |
Access Modes: Transition vs. Enforced
NSP supports two access modes. In Transition mode (formerly called Learning mode), the perimeter logs all traffic that would be denied in enforced mode without actually blocking it. This gives you visibility into existing access patterns before cutting off traffic. In Enforced mode, all traffic except intra-perimeter and explicitly allowed flows is denied.
The recommended deployment pattern: associate resources in Transition mode, analyze access logs for 2-4 weeks, create access rules for legitimate traffic, then switch to Enforced mode. This avoids the outage risk of blocking traffic you did not know existed.
NSP is now generally available in all Azure public cloud regions and Azure Government regions. Private Endpoint traffic is allowed without explicit access rules when both the endpoint and the resource are within the same perimeter.
Layer 2: Private Endpoints and Private Link
If NSP secures the control plane, Private Link secures the data plane. A Private Endpoint creates a network interface inside your VNet with a private IP address that maps to a specific PaaS resource. Traffic between your workload and the resource traverses the Microsoft backbone network; it never touches the public internet.
Why Private Endpoints Matter for Zero Trust
Consider a standard Azure Storage account. Without Private Endpoints, your application connects to mystorageaccount.blob.core.windows.net which resolves to a public IP. Even with firewall rules restricting access to your VNet, the traffic still transits a public endpoint. DNS poisoning, BGP hijacking, or a misconfigured firewall rule could expose this path.
With a Private Endpoint, the same FQDN resolves to 10.0.1.5 (a private IP in your VNet). The connection is entirely private. There is no public attack surface. If you then disable public access on the storage account, the resource becomes invisible to the internet entirely.
Private Endpoint vs. Service Endpoint
| Capability | Private Endpoint | Service Endpoint |
|---|---|---|
| Traffic path | Private IP in your VNet, over backbone | Still uses public IP of the service |
| DNS resolution | Private IP (requires DNS zone) | Public IP (unchanged) |
| On-premises access | Works over ExpressRoute / VPN | VNet traffic only |
| Data exfiltration protection | Maps to a specific resource instance | Allows access to any instance of the service type |
| Cost | Per-endpoint hourly + data processing | Free |
| Recommendation | Use for production workloads | Legacy; use for cost-sensitive dev/test |
Deploy Private Endpoints for all PaaS resources in production. Combine with Azure Policy to deny creation of resources without Private Endpoints (Microsoft.Network/privateEndpoints audit/deny policy). Disable public access on the resource after Private Endpoint connectivity is confirmed.
Layer 3: Network Security Groups and Service Tags
Network Security Groups (NSGs) are the foundational traffic filter in Azure. They operate at Layer 4 (TCP/UDP), allowing you to define allow/deny rules based on source, destination, port, and protocol. In a Zero Trust model, NSGs enforce the principle of least privilege at the network layer: deny all traffic by default, then explicitly allow only the flows your application requires.
Service Tags: Managed IP Prefix Groups
Service Tags replace hardcoded IP addresses in NSG rules with dynamically managed IP prefix groups. Microsoft maintains these groups and updates them automatically as Azure service IP ranges change. This eliminates the operational burden of tracking and rotating IP addresses manually.
Consider a common scenario: your application needs to call Azure Key Vault. Without Service Tags, you would need to look up the current IP ranges for Key Vault in your region, create NSG rules with those IPs, and update them whenever Microsoft changes the ranges. With Service Tags, you write one rule: allow outbound to AzureKeyVault.WestUS2. Done.
Critical Service Tags for Zero Trust
| Service Tag | Scope | Use Case |
|---|---|---|
VirtualNetwork |
VNet address space + peered VNets | Intra-VNet communication |
AzureLoadBalancer |
Azure health probes | Required for LB health checks |
Internet |
All public IPs outside Azure | Deny inbound from Internet (default deny) |
Storage.<Region> |
Azure Storage IPs in a region | Allow access to regional storage |
Sql.<Region> |
Azure SQL IPs in a region | Database connectivity |
AzureMonitor |
Log Analytics, App Insights endpoints | Telemetry egress |
Service Tags simplify IP-based ACLs but are not a complete security solution. A Service Tag for Storage includes all Azure Storage accounts, including attacker-controlled ones. Combine Service Tags with Private Endpoints (which pin to a specific resource instance) and NSP (which prevents cross-boundary access) for proper data exfiltration protection.
IPAM and Custom Service Tags
Azure IP Address Manager (IPAM) provides centralized IP address planning, allocation, and tracking across your Azure environment. For organizations operating at scale, IPAM solves the problem of IP address sprawl: overlapping address spaces, exhausted subnets, and inconsistent allocation across teams.
In the context of Zero Trust, IPAM enables you to create custom service tags based on your own IP address pools. This allows NSG rules that reference your internal service boundaries rather than relying solely on Microsoft-managed tags. For example, you can create a service tag for your "payment processing" subnet range and reference it across all NSGs in your environment, ensuring that only explicitly designated networks can reach payment infrastructure.
# Register a custom IP prefix for use as a service tag
az network custom-ip prefix create \
--name PaymentServices \
--resource-group rg-networking \
--cidr 10.50.0.0/16 \
--zone 1
# Use in NSG rules across your environment
az network nsg rule create \
--nsg-name nsg-web-tier \
--name AllowToPayment \
--priority 200 \
--direction Outbound \
--access Allow \
--source-address-prefixes VirtualNetwork \
--destination-address-prefixes 10.50.0.0/16 \
--destination-port-ranges 443
Layer 4: Azure Virtual Network Manager (AVNM)
Individual NSG rules work at the subnet or NIC level. They are managed by the team that owns the resource. This creates a fundamental governance problem: a team can modify or delete their NSG rules, bypassing security policies set by the platform team. Azure Virtual Network Manager solves this by introducing Security Admin Rules that operate at a higher precedence than NSG rules and cannot be overridden by resource owners.
Security Admin Rules vs. NSG Rules
| Property | Security Admin Rules (AVNM) | NSG Rules |
|---|---|---|
| Evaluation order | Evaluated first (higher priority) | Evaluated after admin rules |
| Override capability | Cannot be overridden by resource owners | Can be modified by anyone with NSG write permissions |
| Scope | Management group, subscription, or network group | Subnet or NIC |
| Actions | Allow, Deny, Always Allow | Allow, Deny |
| Use case | Platform-level guardrails | Application-level access control |
Common AVNM Patterns
Block high-risk ports globally: Create an admin rule denying inbound SSH (22) and RDP (3389) from the Internet across all network groups. Individual teams cannot create NSG rules to allow these ports, even if they have full NSG write permissions.
Enforce network segmentation: Block traffic between production and development network groups. This prevents accidental cross-environment communication even if VNet peering is misconfigured.
Always Allow for exceptions: The "Always Allow" action permits traffic regardless of subsequent deny rules. Use this sparingly for infrastructure services that must remain reachable (Azure Monitor, Key Vault) even when other deny rules are in place.
Layer 5: Azure Policy for Continuous Enforcement
The previous layers define the security controls. Azure Policy ensures those controls remain in place. Without policy enforcement, security configurations degrade over time: someone disables a firewall rule for debugging, a new resource deploys without a Private Endpoint, an NSG gets deleted during a migration.
Essential Zero Trust Policies
| Policy | Effect | What It Prevents |
|---|---|---|
| Storage accounts should disable public access | Deny | Creating storage with public endpoints |
| SQL servers should use Private Link | Audit / Deny | Databases accessible from the internet |
| Subnets should have an NSG | DeployIfNotExists | Subnets without traffic filtering |
| Network interfaces should not have public IPs | Deny | VMs with direct internet exposure |
| VNet peering should only connect approved VNets | Deny | Unauthorized cross-boundary connectivity |
| Key Vault should disable public access | Deny | Secrets accessible from untrusted networks |
The DeployIfNotExists effect is particularly powerful for Zero Trust: rather than just blocking non-compliant resources, it automatically remediates them. If a subnet is created without an NSG, the policy creates and attaches a default-deny NSG automatically.
Layer 6: External Scanning and Continuous Validation
A Zero Trust posture is only as strong as your ability to verify it. External scanning validates that your controls work as intended by testing your infrastructure from an attacker's perspective. Internal configuration audits can miss gaps that external probing reveals.
What to Scan
- Public endpoints: Identify any resources with public IPs or enabled public access that should be private.
- DNS exposure: Enumerate public DNS records for your domains and verify they resolve to Private Endpoints where expected.
- Certificate validity: Expired or misconfigured TLS certificates can force fallback to insecure connections.
- Open ports: Scan public-facing IPs for open ports that should not be exposed (SSH, RDP, database ports).
Integrate external scanning into your CI/CD pipeline. Before a deployment promotes to production, validate that no new public endpoints were introduced. After deployment, run a post-deployment scan to confirm the deployed state matches the expected state.
Layer 7: Shift Left and Infrastructure as Code
Security controls deployed reactively are always behind the threat. Shift Left means security validation happens at the earliest possible stage: during code review, in CI/CD pipelines, and before infrastructure changes reach production.
IaC Security Validation Pipeline
# Example: GitHub Actions step for Terraform security scanning
- name: Run tfsec
uses: aquasecurity/tfsec-action@v1.0.0
with:
soft_fail: false
# Azure Policy compliance check before deployment
- name: Check Policy Compliance
run: |
az policy state trigger-scan --resource-group $RG
az policy state list --resource-group $RG \
--filter "complianceState eq 'NonCompliant'" \
--query "[].{policy:policyDefinitionName, resource:resourceId}"
Security gates in the pipeline should check for:
- Resources deployed without Private Endpoints
- NSGs with overly permissive rules (0.0.0.0/0 source on sensitive ports)
- Subnets without attached NSGs
- Storage accounts or databases with public access enabled
- Secrets or connection strings hardcoded in templates
Layer 8: Attestation and Ownership Reviews
Technical controls drift over time. Teams change, projects end, resources become orphaned, and permissions accumulate beyond what is needed. Regular attestation reviews verify that the security posture you built in layers 1-7 still reflects current reality.
What to Attest
- Resource ownership: Every resource has a current owner who is accountable for its security posture.
- Access permissions: Every identity (user, service principal, managed identity) with network access has a current justification.
- NSG rules: Every allow rule in every NSG has a documented purpose and an owner who can explain why it exists.
- Exception rules: Every NSP access rule and AVNM "Always Allow" rule is reviewed quarterly and removed when no longer needed.
Automate as much of this as possible. Azure Resource Graph queries can identify orphaned resources, unused permissions, and stale configurations. Build dashboards that surface compliance drift before it becomes a security incident.
Putting It All Together
Zero Trust network security in Azure is not a single product or a one-time configuration. It is a layered architecture where each layer addresses a specific threat vector, and the layers reinforce each other:
No single layer is sufficient. An attacker who bypasses your NSG (layer 3) still faces Private Endpoint isolation (layer 2) and NSP boundaries (layer 1). A misconfigured AVNM rule (layer 4) is caught by Azure Policy (layer 5) and validated by external scanning (layer 6). The layers create redundancy through diversity.
Implementation Roadmap
The following staircase shows how to sequence your Zero Trust adoption. Each phase builds on the previous one, and the layers within each phase can be deployed in parallel.
Operational Recommendations
- Start with visibility. Deploy NSP in Transition mode and Azure Policy in Audit mode before enforcing. Understand your current traffic patterns before restricting them.
- Automate everything. Manual security processes do not scale and they drift. Every control should be expressed as code, deployed through pipelines, and validated continuously.
- Measure compliance, not just configuration. A deployed NSG is not security. A validated, tested NSG that blocks the traffic it should block is security. Test your controls from the attacker's perspective.
- Plan for failure. Every security control will eventually be misconfigured, bypassed, or degraded. The layered approach ensures no single point of failure compromises your entire posture.
- Iterate continuously. Zero Trust is not a destination. New services, new attack vectors, and new Azure features require ongoing adaptation. Build review cycles into your operational rhythm.