Understanding Cloud Architect Technology Solutions
Design and Connectivity Patterns
- https://docs.microsoft.com/Azure/architecture/patterns/
- Partitioning workloads
- Modularize application to functional units.
- Each module
- Handles portion of application’s overall functionality
- Represents set of related concerns.
- Why?
- Easier to design both current & future iterations of your application.
- Modules can also be tested & distributed and otherwise verified in isolation.
- Load balancing
- Application traffic or load is distributed among various endpoints by using algorithms.
- Allows
- Multiple instances of the website can be created
- They can behave in a predictable manner
- Flexibility to grow or shrink the number of instances in application without changing the expected behaviour
- Load balancing strategy considerations
- Physical vs Virtual Load balancers
- Use Virtual Load Balancers (hosted in VM’s) if company requires a very specific configuration.
- Load balancing algorithm
- round robin => Selects next instance for each request based on a predetermined order that includes all of the instances.
- random choice
- Configurations
- Affinity/stickiness: If subsequent requests from the same client machine should be routed to the same service instance.
- Required when application has state.
- Transient fault handling
- Leads to more resilient applications.
- Implemented in .NET lib’s (entity framework, Azure SDK etc)
- Transient errors=> occur due to temporary interruptions in the service or to excess latency.
- Many are self-healing and can be resolved with retry policy
- Retry policy
- Retry when a temporary failure occurs.
- A break in the circuit => abort retries if it’s a serious issue.
- Queues
- Provides a degree of consistency regardless of the behaviour of the modules.
- Direct method invocation
- Connection is severed on transient errors
- Use 3rd party queue to persist the requests beyond a temporary failure.
- Allows you to audit failing requests independently.
- Retry pattern
- Cloud applications must be sensitive to transient faults.
- E.g. loss of network connectivity, the temporary unavailability of a service, timeouts that arise when a service is busy.
- They’re typically self-correcting, if the action that triggered a fault is repeated after a suitable delay, it’s likely to be successful.
- DB with too many concurrent requests can have throttling (fails until workload is eased). Fixes itself after some delay.
- Solution: Retry for temporarily fails.
- Remote service => retry after short wait.
- Fails again => Limit attempts to avoid brute forcing retry again until maximum tries are reached.
- to spread requests from multiple instances of the application as evenly as possible.
- Competing consumers pattern
- Sudden large number of requests may cause unpredictable workload.
- Single consumer => risk of being flooded, or messaging system being overloaded.
- Solution: asynchronous messaging with variable quantities of message producers and consumers
- Business logic in the application is not blocked while the requests are being processed.
- Handle fluctuating workloads => system can run multiple instances of the consumer service.
- Cache-aside pattern
- Problem: Cached data consistency
- A strategy is needed to ensure that the data is up-to-date & handle situations where the data in cache has become stale.
- Solution: read-through and write-through caching
- Cache-aside => Effectively loads data into the cache on demand if it’s not already available in the cache.
- Not in cache? Fetch & add it to cache, modifications on cache=> write to data store.
- Sharding pattern
- Problem: hosting large volumes of data in a traditional singe-instance store
- Some limitations
- Storage space: Upgrading disks is not easy.
- Computing resources: It’s not possible to always increase more
- Network bandwith: Network traffic might exceed
- Geography: Reduce latency of data access for different across regions.
- Scaling up can postpone affects but only temporary solution
- Solution: partitioning data horizontally across many nodes
- Divide data store into horizontal partitions, or shards.
- Shard: Same schema but distinct subset of data.
- Sharding can be in data access code, or storage system with transparent sharding
- Abstracting physical location => High level of control over which shard contain which data.
- Easier to migrate between shard without touching application logic.
- Tradeoff => Additional data access overhead to determine the location of each data item as it’s retrieved
- For optimal performance & scalability
- Split data in a way that’s appropriate for the types of queries the application performs.
- Sharding schema will exactly match requirements of every query.
- E.g.:
- In multitenant system => You lookup with tenant id + e.g. tenant’s name, Tenant’s name = sharding key
Hybrid Networking
- Site-to-site connectivity (Site-to-site VPN)
- Between your on-premises site <=> VNet in Azure via IPsec tunnel.
- Resources on local network can communicate with resources on Azure VNet
- No need for separate connection for each client computer in local network.
- Requires VPN device.
- E.g.:
- IT Pros and Developer in-office have their own gateway and connect to Azure.
- Q&A offshore team has its own gateway and connect to Azure
- Point-to-site connectivity (Point-to-site VPN)
- Configured on each client computer that you want to connect to the VNet in Azure.
- No need for VPN device
- Instead you use VPN client you install on each client computer.
- Requires manually starting connection from client, can have auto reset.
- Combining site-to-site and point-to-site connectivity
- Q&A offshore team connects via VPN gateway (site-to-site VPN)
- Developers & IT Pros at office connects via VPN gateway (site-to-site VPN)
- Developers working from home connect via direct VPN (point-to-site VPN)
- Combining ExpressRoute and site-to-site connectivity
- Reasons
- Multiple branch offices, it’s costly to purchase peering for every location.
- Multiple networks within the enterprise
- Connect one to Azure using Express route for higher-risk traffic.
- For lower-risk traffic, use site-to-site VPN
- Use site-to-site VPN as a failover link if ExpressRoute connection fails.
- Virtual network to virtual network connectivity (VNET to VNET)
- Utilizes Azure VPN gateways to connect VNets in Azure over IPSec/IKE tunnels.
- E.g.: you have following topology (topology=nodes connect to other network via links)
- IT-pros/developers in office has VPN-to-VPN to Azure East Asia
- Offshore QA team has VPN-to-VPN to Azure West US
- You set VNet-to-VNet between Azure East Asia and Azure West US
- Then both team can access Azure East Asia and Azure West US
- Connecting across cloud providers
- For failover, backup or migration between providers.
- Amazon Web Services (AWS) =>
- Create EC2 VM with Openswan (VPN software)
- Create gateway on the Azure VNet side using static routing.
- Use gateway IP from Azure to configure Openswan for tunnel connection
Storing in cloud
Durability of data
- A transaction is set of operations.
- Seek to achieve some or all ACID properties.
- Atomic
- A transaction is executed only once; all work completes or none does.
- Why?
- Operations in a transaction often share common intent or depend on each other.
- Performing only subset => intent can be missed.
- Consistent
- A transaction preserves the consistency of data.
- Performed on consistent state and leads to consistent state.
- Typically, developers are responsible for maintaining consistency.
- Isolated
- Concurrent transactions behave as if each were the only transaction running in the system.
- Some applications reduce isolation level for better throughput
- High isolation => limits number of concurrent transactions
- Durable
- A transaction must be recoverable.
- It must be persisted if e.g. computer crashes.
- Special logging solves this.
- In relational database systems (RDBMS) it’s a single unit of work.
- All-or-none => If it fails, DB is rolled back, all modification are erased.
Caching
- Caching aims to improve performance & scalability of a system.
- It’s done by temporarily copying frequently accessed data to a fast storage, close to application.
- Most effective when
- Same data is repeatedly read.
- Original data store =>
- Relatively static
- Slow compared to cache’s speed
- Subject to significant level of contention
- Contention in DB systems =>
- multiple processes or instances competing for access to the same index or data block at the same time
- It’s far away & network latency cause access to be slow.
- Distributed applications typically implement either or both when caching data:
- Private cache : Locally held on computer that’s running application.
- In-memory store: Accessed by single process.
- Quick & affective, size is typically constrained to host machine.
- Local file system
- Slower than in-memory, but faster than retrieving across network.
- Each application holds its own copy of the data.
- Problem:
- Snapshot of the original data at a point of past.
- Different application instance can hold different versions.
- Shared cache : Common source which multiple processes/machines can access.
- All instances see same view of data as opposed to in-memory.
- It’s highly scalable
- Cache services uses cluster of servers and software for distribution.
- Easy to scale by adding to / removing from a cluster.
- Disadvantages:
- Slower to access => Held locally to each application instance.
- Implementing separate cache service => increases complexity.
- Caching considerations
- When?
- The more data you have, the larger number of users that need to access this data => minimum load on the original data store.
- If original data store is unavailable, cache can be used.
- How to cache data effectively?
- Determine the post appropriate data to cache
- Cache it at the appropriate time.
- Add data to the cache on demand when it’s retrieved first time.
- Populate in advance
- Seeding: when the application start.
- Not good for large cache as it can cause sudden high load.
- Manage data expiration.
- Cached data becomes stale after a while.
- Expire caches so they’re removed, and retrieved on next read.
- Set a default policy, many cache services you can set period for individual objects while storing them programmatically.
- Redis Cache
- Recommended by Azure, replaces Azure Cache (deprecated).
- NoSQL key-value database.
- Unique: Allows complex data structure for its keys.
- SKU’s: Basic (single node), Standard (2 nodes + SLA)
Measuring throughput
- Normalized units
- Relative performance guarantees by cloud vendors.
- your application uses 20 units, 40 unit will give you appr. double performance.
- DTUs – Database throughput units (Azure SQL Database)
- Based on compute, storage and IO.
- DTU’s for single databases, eDTUs for elastic pools.
- Fixed per pricing plans, e.g.: Basic = 5 DTU, Standard 2 = 50 DTU
- RUs – Request unit processing per second (Azure Cosmos DB)
- Each operation incurs a request charge, which is expressed in Rus.
- Single request unit (normalized) => 1 read of 1 KB document.
- Create, replace, delete consumes more processing = more request units.
Structure of data
- Polyglot persistence => solutions that uses mix of data store technologies.
- Structured data stores
- Most vendors use SQL.
- Have RDMS (relational database management system)
- Conforms to be ACID.
- Supports schema-on-write
- You define data structure, all read+write use same schema.
- Hard to scale out.
- g. Azure SQL Database, Azure Database for MySQL, Azure Database for Postgres
- Unstructured / semi-structured data stores
- Doesn’t use tabular schema of rows & columns.
- Can store as key/value pairs, JSON documents, or as a graph (edges + vertices)
- Have no relational model.
- Graph databases => Cosmos DB, Gremlin API
- Optimized for exploring weighted relationships between entities.
- Stores edges (entities) and nodes (relationship between enodes).
- Document databases => Azure Cosmos Db
- NoSQL => Most systems supports SQL compatible queries, but non-SQL DB’s.
- Column family: HBase in HDInsights
- Key-value pair, where key is mapped to a value that’s a set of column.
- Massively parallel & distributed solutions for ingesting, storing, and analyzing data
- SQL Data Warehouse
- Azure Data Lake
- Time series data stores => Time Series Insights
- ptimized for queries over time-based sequences of data, indexed by datetime.
- Others: Object storage => Blob storage, Shared files => File storage