Storage
Storage services
- Storage account is top-level account for following services:
- Blob Storage
- File Storage
- Table Storage
- Queue Storage
Blob Storage
- Object and disk storage
- Blob storage tiers
- Azure Search integration
- Blob Lease for exclusive write access
- Pass in lease id to API to modify
- E.g. IaaS VMs lease Page Blob disks to ensure its managed by single VM
- You can create snapshots on blob level and view snapshots.
Azure Data Lake Storage
- Uses blob storage to store data
- Big data analytics
- Analytics interface and APIs
- Blob storage APIs
- Hadoop compatible access to data
- ❗ GPv2 Storage accounts only
Blob Types
- Block Blob
- Composed of 100 MB blocks
- Optimized for efficient upload
- Insert, replace, delete, blocks
- ❗ Up to 4.77TB max file size
- ❗ 50.000 max blobs
- Append blob
- Can only append blocks
- Ideal for log and audit files
- ❗ 195GB max file size
- Page Blob
- Optimized for frequent read/write operations
- Good for VM disks and databases
- Foundation for IaaS disks
- Stores VHD files.
- Underlying storage for Azure SQL
- ❗ Standard (HDD) / Premium (SSD) storage
- ❗ 8 TB max file size
- ❗ Only offered in General Purpose account types
Blob Storage Access Tiers
- Set on blob level.
- Three tiers:
- Hot Tier: Frequent reads
- Lower data access costs
- Higher data storage costs
- Cool Tier: Accessed less frequently
- Higher data access costs
- Lower data storage costs
- Optimized for data that’s stored 30 days
- Archive Tier: Take hours to get data available
- Highest data access cost
- Lowest data storage cost
- Optimized for data that’s stored 180 days
- ❗ Only supported for Block Blobs
- Changing storage tiers incurs charges
- ❗ Can’t change the Storage Tier of a Blob that has snapshots
- Azure Blob Storage Lifecycle Management Policies
- E.g. configure a policy to move a blob directly to the archive storage tier X days after it’s uploaded
- Storage Account -> Blob Service -> Lifecycle Management
- Executed daily
WORM: Write Once Read Many
- Cannot be erased or modify for certain period of time.
- Set on container level
- Enable: Access Policy -> Add Policy -> Time-based retention (set retention period) / Legal hold (attach tags) -> Lock policy
Soft Delete
- Saves deleted for a specified period of time
- Storage Account -> Blob Services -> Soft Delete
Static Website Hosting
- When activated it creates $web container.
- You need to have default document and error page.
- You can integrate Azure CDN
- Azure Content Delivery Network (CDN)
- Distributed network of cache servers
- Provide data to user from closest source
- Offload traffic from origin servers to CDN
- Pricing tiers are Microsoft, Akami, Verizon (Microsoft partners)
- Supports HTTPS, large file download optimization, file compression, geo-filtering
- Azure CDN Core Analytics is collected and can be exported to blob storage, event hubs, Log Analytics.
- Azure Storage blob becomes origin server.
- Azure CDN servers become edge servers
- CDN can authenticate to Blob Storage with SAS tokens to be able to read private data.
- Caching rules: On blobs you can set CDN caching rules, such as
CacheControl:max-age=86400
in blob properties.
- Set up: Two alternatives
- Create CDN and configure against blob service.
- Storage account -> Blob service -> CDN
- You can have custom domain
- You can have CORS policies
Azure Search
- Integrates with Azure Search
- You can provide metadata in blobs, they’ll be used as fields in search index which helps categorize documents and aid with features like faceted search.
- You can choose index content, content+metadata or just metadata.
- Searchable blobs can be PDF, doc/docs, xls/xlsx, ppt/pptx, msg, HTML, XML, ZIP, EML, RTF, TXT, CSV, JSON.
- Azure Search
- Structure: Index, Fields, Documents
- Data Load: Push data in yourself, pull data from Azure sources (SQL, Cosmos DB or blob storage)
- Data Access: REST API, Simpleq Query, Lucene, .NET SDK
- Features:
- Fuzzy search handles misspelled words.
- Suggestions from partial input.
- Facets for categories.
- Highlighting search tags for the results.
- Tune and rank search results
- Paging
- Geo-spatial search if index data has latitude and longtitude, user can get related data based on proximity
- Synonyms
- Lexical analysis done by Analyzers
- You can combine following cognitive skills in pipelines: OCR, language detection, key phrase extraction, NER, sentiment analysis, merger/split/image analysis/shaper.
File Storage
- SMB File Shares
- Attach to Virtual Machines as file shares
- Integrates with Azure File Sync
- On-prem to Azure sync with caching strategy
Table Storage
- NoSQL Data Store
- Scheme-less design
- Azure Cosmos DB
Queue Storage
- Message based
- For building synchronous applications
- URL format: e.g.
http://storageaccount.blob.core.windows.net
Account Types
Blob Storage Account
- Supported services: Blob storage
- Supported blob types: Block blobs, append blobs
- Supports blob storage access tiers (hot, cool, archive)
General Purpose V1
- Supported services: Blob storage
- ❗ Does not support blob storage access tiers (hot, cool, archive)
- ❗ Classic deployment & ARM
- ❗ Does not support ZRS (Zone Redundant Storage) replication
- Slightly cheaper storage transaction costs, can be converted to V2.
General Purpose V2
- Supports all latest features.
- Including anything in General Purpose V1 and blob storage access tiers.
- 💡 Recommended choice when creating storage account.
- Lower storage costs than V1
- ❗ Has a changing soft limit (as of now 500 TB)
- You can contact Azure support and request higher limits (as of now 5 PB). Same for ingress/egress limits to.
Account Replication
- Impacts SLA
- Locally Redundant Storage (LRS)
- Three copies of data in single data center.
- Data is spread across multiple hardware racks.
- Zone Redundant Storage (ZRS)
- Three copies of data in different availability zones in same region.
- ❗ Only available for GPv2 storage accounts
- Geo-reduntant Storage (GRS)
- Three copies of data in two different data centers in two different regions.
- ❗ You don’t get to choose second region, they’re paired regions decided by Microsoft.
- ❗ Replication involves a delay.
- RPO (recovery point objective) is typically lower than 15 minutes.
- Read-access Geo-reduntant Storage (RA-GRS)
- Same as GRS, but you get read-only access to data in secondary region.
Azure Storage Explorer
- Cross-platform client application to administer/view storage and Cosmos DB accounts.
- Can be downloaded with Storage Account -> Open in Explorer in Portal.
- Available in Azure portal as well (preview & simpler)
- Can manage accounts across multiple subscriptions
- Allows you to
- Run storage emulator in local environment.
- Manage SAS, CORS, access levels, meta data, files in File Share, stored procedures in Cosmos DB
- Manage soft delete:
- Enables recycle bin (retention period) for deleted items.
- Connecting and authentication
- Admin access with accont log-in
- Limited access with account level SAS
Pricing
- Data storage cost (capacity)
- Data operations
- Outbound data transfer (bandwidth)
- Geo-replication data transfer
Import and export data to Azure
- You can use portal, PowerShell, REST API, Azure CLI, or .NET Storage SDKs.
- You can upload files/folders using Azure Storage Explorer.
- You can use AzCopy commandline utility tool.
- No limit to # of files in batch
- Pattern filters to select files
- Can continue batch after connection interruption
- Uses internal journal file to handle it
- Copy newer/older source files.
- Throttle # of concurrent connections
- Modify file name and metadata during upload.
- Generate log file
- Authenticate with storage account key or SAS.
- You can use physical drives
- ❗ 64 bit only operating systems: Windows 8+ and Windows Server 2008+
- Preparing the drive
- ❗ NSTF only.
- ❗ Drives must be encrypted using BitLocker
- WAImportExportTool
- Azure Import/Export tool
- V1: Blob Storage, Export Jobs, V2: GP v1, GP v2
- Allows you to copy from on-prem.
- Importing data
- Create import job
- Create storage account
- Prepare the drives
- Connect disk drives to the Windows system via SATA connectors
- Create a single NTFS volume on each drive
- Prepare data using WAImportExportTool
- Modify
dataset.csv
to include files/folders
- Modify
driveset.csv
to include disks & encryption settings
- Copy access key from storage account
- In Azure -> Create import/export job -> Import into Azure -> Select container RG -> Upload JRN (journal) file created from WAImportExportTool -> Choose import destination to the storage account -> Fill return shipping info
- Ship the drives to the Azure datacenter & update status with tracking number
- Costs:
- Charged: fixed price per device, return shipping costs
- Free: for the data transfer in Azure
- ❗ No SLAs on shipping
- Estimated: 7-10 days after arrival
- Exporting data
- Azure -> Create Import/Export Job -> Choose Export from Azure
- Select storage account and optionally containers
- Type shipping info
- Ship blank drives to Azure
- Azure encrypts & copies files
- Provides recovery key for encrypted drive.
- Azure Databox
- Microsoft ships Data Box storage device
- Each storage device has a maximum usable storage capacity of 80 TB.
- It lets you send terabytes of data into Azure in a quick, inexpensive, and reliable way