Windows Azure and Java: Working with Blob Storage

Windows Azure Blobs are part of the Windows Azure Storage service, along with Queues and Tables. Windows Azure Blob Storage can store large amounts of data such as videos, audio, and images. Data stored in Blob storage can be exposed publicly or privately and can be accessed from anywhere via HTTP or HTTPS. A single blob can store up to 200GB (or 1TB), depending on type. A storage account can have up to 100TB of blobs. Data stored in Windows Azure Storage is durable, meaning storage is triple-replicated within the datacenter, providing resiliency to hardware failures. Also Blobs are, by default, replicated to another sub-region which ensures high degree of disaster recovery.

Blob Storage can be accessed using Windows Azure SDK for Java, which is a wrapper over the REST API and provides a way to work with containers and blobs.

Here we will demonstrate the use of Windows Azure Blob Storage service from a Java application. Blob Storage can be accessed from a Java application running locally or within Windows Azure worker and web role instances. We recently published CloudNinja for Java to github, a reference application illustrating how to build multi-tenant Java based applications for Windows Azure. CloudNinja for Java uses Windows Azure Blob Storage for storing Tomcat access logs and tenant logo files.

Here we will discuss the following operations on Blob Storage:

  • Create and delete a blob container
  • Create and delete blobs inside a container
  • Verify the integrity of the blob content
  • Lease blobs
  • Create and delete blob snapshots
  • Set Access Control Levels (ACLs) on blobs and containers
  • List blobs in a container
  • Create a directory structure of blobs and containers
  • Use Shared Access Signatures on containers

Prerequisites

The prerequisites for using Windows Azure Blob Storage service from a Java application are:

  • Windows Azure Libraries for Java
  • Windows Azure SDK
  • Java Development Kit (JDK)┬á

Creating a Java Application to Access Blob Storage

We add the following import statements to the Java classes that we use to access Blob Storage.

 

// Import following to use blob API’s

Import java.io.*;

Import java.net.*;

Import java.util.*;

Import com.microsoft.windowsazure.services.blob.*;

Import com.microsoft.windowsazure.services.core.*;

import java.security.InvalidKeyException;

Retrieving a Storage Account

A storage account is required to create a blob client, which is used to perform various operations on Blob Storage. To retrieve a storage account, initialize an object of the CloudStorageAccount class. The initialized object represents the storage account. We can initialize CloudStorageAccount using a Windows Azure Storage account or an emulated storage account (Storage Emulator account).

Retrieving a Windows Azure Storage Account

We first need to retrieve the cloud storage account using the CloudStorageAccount class. The cloud storage account can be retrieved by parsing the connection string using the CloudStorageAccount.parse method. The connection string consists of the default endpoint protocol, storage account name, and storage account key.

Here is the sample code of retrieving the cloud storage account.

// Define the connection-string with your values

public static final String storageConnectionString =

┬á┬á┬á DefaultEndpointsProtocol=http;” +

┬á┬á┬á “AccountName=your_storage_account;” +

┬á┬á┬á “AccountKey=your_storage_account_key”;

 

// Retrieve storage account from connection-string

CloudStorageAccount storageAccount =      

    CloudStorageAccount.parse(storageConnectionString);

In this code, the storage account is specified as AccountName and the primary access key of the storage account is specified as AccountKey. The primary access key is listed in the Windows Azure management portal.

Working with Storage Account Local Emulator

Windows Azure SDK provides a Storage Emulator that emulates Windows Azure Storage, and is backed by a local SQL Server instance (SQL Express, by default). While the storage emulator is fine for development, it differs from Windows Azure Storage. Please see this MSDN article for details about specific differences.

The code below retrieves the emulated storage account. Before running the following code, ensure that Storage Emulator is up and running.

CloudStorageAccount storageAccount =

    CloudStorageAccount.getDevelopmentStorageAccount();

 

While developing an application the CloudStorageAccount.getDevelopmentStorageAccount method can be used to access the emulated storage account. This is particularly useful if the developer is not having access to the Windows Azure Storage account. However, you should not use this method in code that you deploy to Windows Azure, because the development storage account is not available in Windows Azure.

 

An alternative approach to accessing the local emulator storage account is to access it just like you would access a real storage account, with a storage account name and key in your configuration file. The emulator account has a special account name and key:

  • Account name: devstoreaccount1
  • Account key: Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==

You can place these in the local configuration file, and place your real credentials in the cloud configuration file, allowing you to easily run code against either account without changing any code.

 

Development storage account details are documented in this MSDN article.

Performing Operations on Blob Storage from a Java Application

To access the Blob Storage service, a blob client is required. We use the CloudBlobClient class to get the reference to blobs and containers. We initialize blobClient, the object of CloudBlobClient, using the CloudStorageAccount class. Here is the sample code to create a blob client. 

 

 
CloudBlobClient blobClient = storageAccount.createCloudBlobClient();
 

blobClient is used to perform various operations on Blob Storage.

How to Create a Blob Container

A blob container is necessary to create and store a blob, and it facilitates organization of blobs. The container has its own metadata and properties. To create a blob container, we initialize an object of the CloudBlobContainer class by getting the reference of a container with the help of blobClient.

 

CloudBlobContainer blobContainer = blobClient.getContainerReference(“container-name”);

 

blobContainer.createIfNotExist();

 

Create a blob container using the createIfNotExist method that checks whether a container exists with the same name. The method creates the blob container only if a container with the same name does not exist. Otherwise, no operation is performed.

 

It is better to use createIfNotExist method instead of create method, as create throws StorageException if the specified container name already exists.

Blob and Blob Container Naming Conventions

Blob container names are alphanumeric and lowercase, while blob names are case-sensitive. For complete naming rules, please view this article.

How to Delete a Blob containerHow to Create a Blob

Using the same approach that we specified for creating a blob container, a blob container can be deleted as well.

 

blobContainer.delete();

 

blobContainer.deleteIfExists();

 

How to Check the Existence of a Container

If a container exists, the following code returns true.

 

// Check if the container exists or not.

boolean containerExists = blobContainer.exists();

 

Blobs are managed inside a blob container. By using CloudBlobContainer, we get the reference of the blobs available in the specific container.

 

Blobs are of two types:

  • Block blobs: Block blobs are optimized for streaming and can be of a maximum of 200 GB in size. Block blobs consist of blocks, identified by block ID. Each block can be of different size but not exceeding 4MB. Block blobs are used where random access to data is not required, for example streaming video or a jpeg file.
  • Page blobs: Page blobs are a collection of 512-byte pages that are optimized for random read and write operations. These blobs provide the ability to write to a specific range of bytes and can be of a maximum of 1 TB in size. Some of the usage scenarios for Page Blobs are virtual hard drives (VHDs), Files with Range-Based Updates (updating just the parts of the blob that have changed using ranged writes).

For more information on Block blobs and Page blobs, you can visit Understanding Block Blobs and Page Blobs .

We create a block blob for an image file using following code. If a blob with the same name already exists, the code overwrites the existing blob.

 

// Getting reference to a block blob.

CloudBlob blockBlob = blobContainer.getBlockBlobReference(“image.gif”);

 

blockBlob.getProperties().setContentType(“image/gif”);

 

// Uploading stream to the blob.

InputStream stream = new FileInputStream(ΓÇ¥C:\\image.gifΓÇ¥);

blockBlob.upload(stream, stream.available());

How to Delete a Blob

Using the same approach that we specified for creating a blob, a blob can be deleted as well.

 

blob.delete();

 

blob.deleteIfExists();

 

How to Verify Integrity of the Blob Content

Data transfer over a network is possibly going to face some errors. While uploading or downloading data on cloud, the data may get corrupted due to network behavior or some other intermittent issues.

To reduce the risk of corrupt data being processed, Windows Azure blob storage supports MD5 hashing. This hashing ensures end-to-end data integrity.

While uploading a blob, we calculate the Base64 encoded MD5 content for the blob. This encoded content is also uploaded in the request header. The encoded content is then used to perform the end-to-end integrity of the data being uploaded. If the content of blob and its MD5 hash donΓÇÖt match, the upload operation will fail. The blob will not be uploaded, and the operation will throw StorageException.

 

String blobContent = “This is the blob content.ΓÇ¥;

                       

byte [] blobContentBytes = blobContent.getBytes();

                       

//Generating MD5 of the blob content.

MessageDigest md = MessageDigest.getInstance(“MD5″);

md.reset();

md.update(blobContentBytes);

                       

// Encode the md5 content using Base64 encoding

String base64EncodedMD5content = Base64.encode(md.digest());

                       

// initialize blob properties and assign md5 content generated.

BlobProperties blobProperties = blob.getProperties();

blobProperties.setContentMD5(base64EncodedMD5content);

                       

// Upload the blob content in the blob along with request options.

// This will also upload the ContentMD5 property.

// The Server will verify the uploaded content against ContentMD5 if not matched will throw an exception.

InputStream stream = new ByteArrayInputStream(blobContentBytes);

 

try {

      // If the integrity check fails then it throws StorageException

      blob.upload(stream, stream.available());

} catch (StorageException storageException) {

      storageException.printStackTrace();

}

Leasing blobs

Leasing means acquiring a lock on a blob. No other lease can be acquired for that blob until the lease is released. This is particularly useful in a multi-threaded or multi-role instance scenario. Consider that only one instance is to run a process such as a scheduler. This can be ensured by making use of lease. The instance that succeeds in acquiring lease will run the scheduler, and other instances will not as they fail to acquire a lease. Thus leases can be used to manage concurrency.

Another use of lease can be to maintain consistency of blob contents. For example, suppose a blob contains a counter which should be updated by only one instance in multi-role instance scenario. This is achieved with the help of lease as it provides exclusive write access to a blob. The instance that succeeds in acquiring lease will update the counter.

h3 style=”LINE-HEIGHT: 150%; MARGIN: 10pt 0in 0pt”>How to Lease BlobsAcquire Lease

The Lease Blob operation establishes and manages a lock on a blob to get exclusive write access to the blob and its metadata or properties. Lease can be acquired for 15s up to 60s or for an infinite time period.

The Lease Blob operation can be called in one of four modes:

  • Acquire
  • Renew
  • Release
  • Break

Acquire lease is to request a new lease. The method blob.acquireLease() returns a Lease ID. Using this ID, we can modify, renew, and release a lease.

 

// Lease Blob and acquire lock. It returns the lease ID

String leaseId = blob.acquireLease();

 

Renew Lease

Renew lease is to renew an existing lease for additional 60 seconds to continue the protected write access to the blob and its data.

               

// Setting the lease ID into Access Conditions

AccessCondition accessCondition = new AccessCondition();

accessCondition.setLeaseID(leaseId);

                 

// Renew lease

blob.renewLease(accessCondition);

 

Release Lease

Release lease is to free a lease if it is no longer needed so that another client may immediately acquire a lease on the blob. It requires a Lease ID.

                       

blob.releaseLease(accessCondition);

 

Break Lease

Break lease is to end a lease. After breaking a lease, we must ensure that another client cannot acquire a new lease to the blob until the current lease period has expired.  Breaking a lease leaves the blob in an unlocked state for the remaining duration of the lease period.

 

// Break Lease

blob.breakLease();

 

How to Create a Blob Snapshot

Sometimes an application may corrupt the blob content while processing it. This may happen due to different reasons. For example, a runtime exception occurs before completing the operations.  In this situation, we may need to reinstate the blob to its original content. This can be done by creating a blob snapshot.

Another use of creating blob snapshots is for creating backups of blobs. The name of a snapshot consists of base blob name followed by the DateTime value as suffix. The DateTime value indicates time at which snapshot was taken. Blob snapshots are read-only. They can be read, copied, and deleted; but never modified.

Upon creation, snapshots have no associated cost. However, as committed blocks (or pages) are replaced in the base blob, storage costs begin to accrue as the base blob diverges from the snapshot.

More details about snapshots may be found here. Snapshot billing details are here.

The createSnapshot method creates a read-only snapshot of a blob.  

// Create a snapshot

CloudBlob snapshotBlob = blob.createSnapshot();

A unique ID is associated with each snapshot blob. Generally, the timestamp of the snapshot blob is the ID. It appears as a string. For example, 2012-03-26T14:16:18.0174890Z.

// Get the snapshot ID

String snapshotID = snapshotBlob.getSnapshotID();

How to Delete a Blob SnapshotHow to Set Access Control Levels on Blob Containers and Blobs

When we delete a snapshot blob, only the snapshot of the original blob is deleted. The original blob is not deleted. If we have the reference of the snapshot blob, we delete the snapshot using the following code.

           

// Deleting the snapshot

booleansnapshotDeleted= snapshotBlob.deleteIfExists();

 

Listing the Blob snapshots

There can be scenarios where you need to restore the contents of a blob from a snapshot that was taken in a DateTime range for example 12 July 2012 between 4 to 5 pm. In such a case you need to iterate through the snapshot list, retrieve and parse URL for the DateTime value. Once found, restore from the snapshot by copying the snapshot to the base blob.

 

 The following code lists the existing blob snapshots along with the original blobs in the container.

 

for (ListBlobItem item : blobContainer.listBlobs(“”, true, EnumSet.of(BlobListingDetails.SNAPSHOTS), null, null)) {

      URI snapshotURI = item.getUri();

}

Setting Access Control Levels (ACLs) for a blob container is setting permissions for the container. The permissions define who can access the blob container.

 

The access level can be public or private. Anyone can access a public container using the URLs of the blobs that are available in the container, while a private container can only be accessed using the account credentials (or with a Shared Access Signature, which will be covered shortly).

 

The BlobContainerPermissions class is used to set the permissions that are uploaded to a blob container.

 

BlobContainerPermissions permissions = new BlobContainerPermissions();

 

Setting Private Access Level for a Blob Container

By default blob container has private access level. In some scenarios it may be required to change the access level from public to private. Setting private access for a blob container restricts the access to the container for everyone, except the account holder. Only the one who holds the account credentials can work with the blobs inside a private container.

 

// Setting Private access to Container

permissions.setPublicAccess(BlobContainerPublicAccessType.OFF);

// Uploading the permissions

blobContainer.uploadPermissions(permissions);

 

Setting Public Access Level for a Blob Container

The container with the public access is available to everyone who knows the container URL. All the blobs under a public container are by default public and accessible to all.

Clients can read the container metadata and the blob content, and can also list the blobs within the container.

 

// Setting Public access to Container

permissions.setPublicAccess(BlobContainerPublicAccessType.CONTAINER);

// Uploading the permissions

blobContainer.uploadPermissions(permissions);

 

Public Access Level to a Blob

The following code sets the blobs as public inside a private container. Users can read the content and metadata of blobs within this container. But, they cannot read the container metadata or list the blobs within the container.

 

// Setting Public access to Container

permissions.setPublicAccess(BlobContainerPublicAccessType.BLOB);

// Uploading the permissions

blobContainer.uploadPermissions(permissions);

 

How to Use Shared Access Signature on Blob Containers

Shared Access Signature is used to make a private blob container or blobs accessible to public for a specific period of time. It permits us to provide access rights to containers and blobs at a more granular level than by simply setting the permission for a container for public access.

 

The following sample code grants the Shared Access READ permission to a container for an hour.

 

//Set ACL to private

BlobContainerPermissions permissions = new BlobContainerPermissions();

// Setting Private access to Container

permissions.setPublicAccess(BlobContainerPublicAccessType.OFF);

                       

Calendar cal = Calendar.getInstance();

cal.setTimeZone(TimeZone.getTimeZone(“UTC”));

// Define the start and end time to grant permissions.

Date sharedAccessStartTime = cal.getTime();

cal.add(Calendar.HOUR, 1);

Date sharedAccessExpiryTime = cal.getTime();

                                               

// Define shared access policy

SharedAccessPolicy policy = new SharedAccessPolicy ();

 

// In the Sample the Shared Access Permissions are set to READ permission.

EnumSet<SharedAccessPermissions> perEnumSet = EnumSet.of(SharedAccessPermissions.READ);

 

policy.setPermissions(perEnumSet);

policy.setSharedAccessExpiryTime(sharedAccessExpiryTime);

policy.setSharedAccessStartTime(sharedAccessStartTime);

                                   

// Define Blob container permissions.

HashMap<String, SharedAccessPolicy > map = new HashMap<String, SharedAccessPolicy >();

map.put(“policy”, policy);

permissions = new BlobContainerPermissions();

permissions.setSharedAccessPolicies(map);

 

// Uploading the permissions

blobContainer.uploadPermissions(permissions);

 

 

The Signature is generated using CloudBlobContainer as shown in the following code.

 

SharedAccessPolicy policy = blobContainer.downloadPermissions().getSharedAccessPolicies().get(“policy”);

String signature = blobContainer.generateSharedAccessSignature(policy));

 

After generating the Signature, the format of the URL to access the blobs in the container is:

http://<storage-account-name>.blob.core.windows.net/<container-name>/<blob-name>?<signature>

How to Use Shared Access Signature on Blobs

Shared Access Signature (SAS) on a blob will allow operations on it for a specific duration as specified in the code. Following code is to generate SAS for duration of 30 minutes.

 

// Generate shared access signature on blob

// Define the start and end time to granting permissions.

Calendar cal = Calendar.getInstance();

cal.setTimeZone(TimeZone.getTimeZone(“UTC”));

 

// Define the start and end time to grant permissions.

// To handle clock skew set start time 5 min early and

// expiry time 5 min later. So actual duration to be specified

// for SAS is between 1hr 5min and 1hr 35 min from now.

cal.add(Calendar.HOUR, 1);

Date sharedAccessStartTime = cal.getTime();

cal.add(Calendar.MINUTE, 40);

Date sharedAccessExpiryTime = cal.getTime();

                                               

// Define shared access policy

SharedAccessPolicy policy = new SharedAccessPolicy();

EnumSet<SharedAccessPermissions> perEnumSet = EnumSet.of(SharedAccessPermissions.READ);

policy.setPermissions(perEnumSet);

policy.setSharedAccessExpiryTime(sharedAccessExpiryTime);

policy.setSharedAccessStartTime(sharedAccessStartTime);

                                               

//Generating Shared Access Signature

String sharedUri = blob.generateSharedAccessSignature(policy);

 

 

A SAS token might start or expire earlier or later than expected as a result of clock skew. To handle clock skew problem specify start time a few minutes earlier than required and expiry time a few minutes later than required.

How to List Blobs in a ContainerHow to Create a Directory Structure

The CloudBlobContainer.listBlobs method returns the iterator for ListBlobItem objects from the container.

 

// Listing Blobs in a container and printing its URI

for(ListBlobItem blobItem : blobContainer.listBlobs()) {

// Getting URI

      URI blobUri = blobItem.getUri();

      // Get the blob using the Uri and

      // perform operations on blob.

}

 

Blob Storage does not have nested containers. To simulate a subdirectory structure in Blob Storage, we can use forward slashes (/) to represent the directory level in the respective URI. For example, MainDir/SubDir/sampleTextDoc.txt.

CloudBlob blob = container.getBlockBlobReference(“MainDir/SubDir/sampleTextDoc.txt”);

This will help in organizing blobs in a container.

In order to search through a structure of subdirectories you can use the method listBlobs of blob container and specify the prefix parameter. The prefix parameter value can be path to subdirectory whose contents are to be listed.

Summary

In this article we discussed using the Blob Storage service to perform various operations on blobs. Also we discussed how acquiring leases on blobs can be used to manage concurrency. We demonstrated generating shared access signature to provide access rights to blobs for specific time periods.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>