Azure-in-bullet-points

Understanding Cloud Architect Technology Solutions

Design and Connectivity Patterns

https://docs.microsoft.com/Azure/architecture/patterns/
Partitioning workloads
- Modularize application to functional units.
  - Each module
    - Handles portion of application’s overall functionality
    - Represents set of related concerns.
  - Why?
    - Easier to design both current & future iterations of your application.
    - Modules can also be tested & distributed and otherwise verified in isolation.
Load balancing
- Application traffic or load is distributed among various endpoints by using algorithms.
- Allows
  - Multiple instances of the website can be created
  - They can behave in a predictable manner
  - Flexibility to grow or shrink the number of instances in application without changing the expected behaviour
- Load balancing strategy considerations
  - Physical vs Virtual Load balancers
    - Use Virtual Load Balancers (hosted in VM’s) if company requires a very specific configuration.
  - Load balancing algorithm
    - round robin => Selects next instance for each request based on a predetermined order that includes all of the instances.
    - random choice
  - Configurations
    - Affinity/stickiness: If subsequent requests from the same client machine should be routed to the same service instance.
    - Required when application has state.
Transient fault handling
- Leads to more resilient applications.
- Implemented in .NET lib’s (entity framework, Azure SDK etc)
- Transient errors=> occur due to temporary interruptions in the service or to excess latency.
- Many are self-healing and can be resolved with retry policy
- Retry policy
  - Retry when a temporary failure occurs.
  - A break in the circuit => abort retries if it’s a serious issue.
Queues
- Provides a degree of consistency regardless of the behaviour of the modules.
- Direct method invocation
  - Connection is severed on transient errors
  - Use 3rd party queue to persist the requests beyond a temporary failure.
    - Allows you to audit failing requests independently.
Retry pattern
- Cloud applications must be sensitive to transient faults.
  - E.g. loss of network connectivity, the temporary unavailability of a service, timeouts that arise when a service is busy.
- They’re typically self-correcting, if the action that triggered a fault is repeated after a suitable delay, it’s likely to be successful.
  - DB with too many concurrent requests can have throttling (fails until workload is eased). Fixes itself after some delay.
- Solution: Retry for temporarily fails.
  - Remote service => retry after short wait.
    - Fails again => Limit attempts to avoid brute forcing retry again until maximum tries are reached.
      - to spread requests from multiple instances of the application as evenly as possible.
Competing consumers pattern
- Sudden large number of requests may cause unpredictable workload.
- Single consumer => risk of being flooded, or messaging system being overloaded.
- Solution: asynchronous messaging with variable quantities of message producers and consumers
  - Business logic in the application is not blocked while the requests are being processed.
  - Handle fluctuating workloads => system can run multiple instances of the consumer service.
Cache-aside pattern
- Problem: Cached data consistency
  - A strategy is needed to ensure that the data is up-to-date & handle situations where the data in cache has become stale.
- Solution: read-through and write-through caching
  - Cache-aside => Effectively loads data into the cache on demand if it’s not already available in the cache.
    - Not in cache? Fetch & add it to cache, modifications on cache=> write to data store.
Sharding pattern
- Problem: hosting large volumes of data in a traditional singe-instance store
- Some limitations
  - Storage space: Upgrading disks is not easy.
  - Computing resources: It’s not possible to always increase more
  - Network bandwith: Network traffic might exceed
  - Geography: Reduce latency of data access for different across regions.
- Scaling up can postpone affects but only temporary solution
- Solution: partitioning data horizontally across many nodes
  - Divide data store into horizontal partitions, or shards.
  - Shard: Same schema but distinct subset of data.
  - Sharding can be in data access code, or storage system with transparent sharding
    - Abstracting physical location => High level of control over which shard contain which data.
      - Easier to migrate between shard without touching application logic.
      - Tradeoff => Additional data access overhead to determine the location of each data item as it’s retrieved
  - For optimal performance & scalability
    - Split data in a way that’s appropriate for the types of queries the application performs.
    - Sharding schema will exactly match requirements of every query.
    - E.g.:
      - In multitenant system => You lookup with tenant id + e.g. tenant’s name, Tenant’s name = sharding key

Hybrid Networking

Site-to-site connectivity (Site-to-site VPN)
- Between your on-premises site <=> VNet in Azure via IPsec tunnel.
- Resources on local network can communicate with resources on Azure VNet
  - No need for separate connection for each client computer in local network.
- Requires VPN device.
- E.g.:
  - IT Pros and Developer in-office have their own gateway and connect to Azure.
  - Q&A offshore team has its own gateway and connect to Azure
Point-to-site connectivity (Point-to-site VPN)
- Configured on each client computer that you want to connect to the VNet in Azure.
- No need for VPN device
  - Instead you use VPN client you install on each client computer.
  - Requires manually starting connection from client, can have auto reset.
Combining site-to-site and point-to-site connectivity
- Q&A offshore team connects via VPN gateway (site-to-site VPN)
- Developers & IT Pros at office connects via VPN gateway (site-to-site VPN)
- Developers working from home connect via direct VPN (point-to-site VPN)
Combining ExpressRoute and site-to-site connectivity
- Reasons
  - Multiple branch offices, it’s costly to purchase peering for every location.
  - Multiple networks within the enterprise
    - Connect one to Azure using Express route for higher-risk traffic.
    - For lower-risk traffic, use site-to-site VPN
  - Use site-to-site VPN as a failover link if ExpressRoute connection fails.
Virtual network to virtual network connectivity (VNET to VNET)
- Utilizes Azure VPN gateways to connect VNets in Azure over IPSec/IKE tunnels.
- E.g.: you have following topology (topology=nodes connect to other network via links)
  - IT-pros/developers in office has VPN-to-VPN to Azure East Asia
  - Offshore QA team has VPN-to-VPN to Azure West US
  - You set VNet-to-VNet between Azure East Asia and Azure West US
    - Then both team can access Azure East Asia and Azure West US
Connecting across cloud providers
- For failover, backup or migration between providers.
- Amazon Web Services (AWS) =>
  - Create EC2 VM with Openswan (VPN software)
  - Create gateway on the Azure VNet side using static routing.
  - Use gateway IP from Azure to configure Openswan for tunnel connection

Storing in cloud

Durability of data

A transaction is set of operations.
- Seek to achieve some or all ACID properties.
  - Atomic
    - A transaction is executed only once; all work completes or none does.
    - Why?
      - Operations in a transaction often share common intent or depend on each other.
      - Performing only subset => intent can be missed.
  - Consistent
    - A transaction preserves the consistency of data.
      - Performed on consistent state and leads to consistent state.
      - Typically, developers are responsible for maintaining consistency.
  - Isolated
    - Concurrent transactions behave as if each were the only transaction running in the system.
    - Some applications reduce isolation level for better throughput
      - High isolation => limits number of concurrent transactions
  - Durable
    - A transaction must be recoverable.
    - It must be persisted if e.g. computer crashes.
      - Special logging solves this.
- In relational database systems (RDBMS) it’s a single unit of work.
- All-or-none => If it fails, DB is rolled back, all modification are erased.

Caching

Caching aims to improve performance & scalability of a system.
It’s done by temporarily copying frequently accessed data to a fast storage, close to application.
Most effective when
- Same data is repeatedly read.
- Original data store =>
  - Relatively static
  - Slow compared to cache’s speed
  - Subject to significant level of contention
    - Contention in DB systems =>
      - multiple processes or instances competing for access to the same index or data block at the same time
  - It’s far away & network latency cause access to be slow.
Distributed applications typically implement either or both when caching data:
- Private cache : Locally held on computer that’s running application.
  - In-memory store: Accessed by single process.
    - Quick & affective, size is typically constrained to host machine.
  - Local file system
    - Slower than in-memory, but faster than retrieving across network.
    - Each application holds its own copy of the data.
  - Problem:
    - Snapshot of the original data at a point of past.
      - Different application instance can hold different versions.
- Shared cache : Common source which multiple processes/machines can access.
  - All instances see same view of data as opposed to in-memory.
  - It’s highly scalable
    - Cache services uses cluster of servers and software for distribution.
    - Easy to scale by adding to / removing from a cluster.
  - Disadvantages:
    - Slower to access => Held locally to each application instance.
    - Implementing separate cache service => increases complexity.
Caching considerations
- When?
  - The more data you have, the larger number of users that need to access this data => minimum load on the original data store.
  - If original data store is unavailable, cache can be used.
- How to cache data effectively?
  - Determine the post appropriate data to cache
  - Cache it at the appropriate time.
    - Add data to the cache on demand when it’s retrieved first time.
    - Populate in advance
      - Seeding: when the application start.
      - Not good for large cache as it can cause sudden high load.
- Manage data expiration.
  - Cached data becomes stale after a while.
  - Expire caches so they’re removed, and retrieved on next read.
  - Set a default policy, many cache services you can set period for individual objects while storing them programmatically.
Redis Cache
- Recommended by Azure, replaces Azure Cache (deprecated).
- NoSQL key-value database.
  - Unique: Allows complex data structure for its keys.
- SKU’s: Basic (single node), Standard (2 nodes + SLA)

Measuring throughput

Normalized units
- Relative performance guarantees by cloud vendors.
- your application uses 20 units, 40 unit will give you appr. double performance.
DTUs – Database throughput units (Azure SQL Database)
- Based on compute, storage and IO.
- DTU’s for single databases, eDTUs for elastic pools.
- Fixed per pricing plans, e.g.: Basic = 5 DTU, Standard 2 = 50 DTU
RUs – Request unit processing per second (Azure Cosmos DB)
- Each operation incurs a request charge, which is expressed in Rus.
  - Single request unit (normalized) => 1 read of 1 KB document.
  - Create, replace, delete consumes more processing = more request units.

Structure of data

Polyglot persistence => solutions that uses mix of data store technologies.
Structured data stores
- Most vendors use SQL.
- Have RDMS (relational database management system)
  - Conforms to be ACID.
  - Supports schema-on-write
    - You define data structure, all read+write use same schema.
- Hard to scale out.
- g. Azure SQL Database, Azure Database for MySQL, Azure Database for Postgres
Unstructured / semi-structured data stores
- Doesn’t use tabular schema of rows & columns.
- Can store as key/value pairs, JSON documents, or as a graph (edges + vertices)
- Have no relational model.
- Graph databases => Cosmos DB, Gremlin API
  - Optimized for exploring weighted relationships between entities.
  - Stores edges (entities) and nodes (relationship between enodes).
- Document databases => Azure Cosmos Db
- NoSQL => Most systems supports SQL compatible queries, but non-SQL DB’s.
- Column family: HBase in HDInsights
  - Key-value pair, where key is mapped to a value that’s a set of column.
- Massively parallel & distributed solutions for ingesting, storing, and analyzing data
  - SQL Data Warehouse
  - Azure Data Lake
  - Time series data stores => Time Series Insights
    - ptimized for queries over time-based sequences of data, indexed by datetime.
- Others: Object storage => Blob storage, Shared files => File storage

This site is open source. Improve this page.