When a user completes a survey, the application must
save the user’s answers to the survey questions to storage where the
survey creator can access and analyze the results.
1. Goals and Requirements
The format that
application uses to save the summary response data must enable the
Surveys application to meet the following three requirements:
The owner of the survey must be able to browse the results.
The application must be able to calculate summary statistics from the answers.
The owner of the survey must be able to export the answers in a format that enables detailed analysis of the results.
Tailspin expects to see a very
large number of users completing surveys; therefore, the process that
initially saves the data should be as efficient as possible. The
application can handle any processing of the data after it has been
saved by using an asynchronous worker process.
Transaction
costs will be significant because calculating summary statistical data
and exporting survey results will require the application to read survey
responses from storage. |
The focus here is on the way the
Surveys application stores the survey answers. Whatever type of storage
the Surveys application uses, it must be able to support the three
requirements listed earlier. Storage costs are also a significant factor
in the choice of storage type because survey answers account for the
majority of the application’s storage requirements; both in terms of
space used and by the number of storage transactions.
2. The Solution
To meet the requirements,
the developers at Tailspin analyzed two possible storage solutions: a
delayed write pattern using queues and table storage, and a solution
that saves directly to BLOB storage. In both cases, the application
first saves the survey responses to storage, and then it uses an
asynchronous task in a worker role to calculate and save the summary
statistics.
Note:
The Surveys application saves each survey response as a BLOB.
2.1. Solution 1: The Delayed Write Pattern
Figure 2 shows the delayed
write pattern that the Surveys application could use to save the
results of a filled out survey to Windows Azure table storage.
In this scenario, a user
browses to a survey, fills it out, and then submits his or her answers
back to the Surveys website. The Surveys website puts the survey answers
into a message on a queue and returns a “Thank you” message to the user
as quickly as possible, minimizing the value of Tp in Figure 5-2.
A task in a worker role is then responsible for reading the survey
answers from the queue and saving them to table storage. This operation
must be idempotent, to avoid any possibility of double counting and
skewing the results.
Note:
You
could use separate worker roles, one to calculate and save the summary
statistics, and one to save the survey results to table storage if you
need to scale the application.
Surveys is a “geo-aware” application. For example, the Surveys website and queue could be hosted in a data center in the U.S., and the worker role and table storage could be hosted in a data center in Europe. |
There is an 8-kilobyte (KB)
maximum size for a message on a Windows Azure queue, so this approach
works only if the size of each survey response is less than that maximum. Figure 3 shows how you could modify this solution to handle survey results that are greater than 8 KB in size.
Figure 3
includes an optimization, whereby the application places messages that
are smaller than 8 KB directly onto a queue, as in the previous example.
For messages that are larger than 8 KB in size, the application saves
them to Windows Azure BLOB storage and places a message on the “Big
Surveys” queue to notify the worker role. The worker role now contains
two tasks: Task 1 retrieves and processes small surveys from the “Small
Surveys” queue; Task 2 polls the “Big Surveys” queue for notifications
of large surveys that it retrieves and processes from BLOB storage.
2.2. Solution 2: Writing Directly to BLOB Storage
As you saw in the previous section, the delayed
write pattern becomes more complex if the size of a survey answer can
be greater than 8 KB. In this case, it is necessary to save the response
as a BLOB and notify the worker role of the new response data by using a message on a queue. The developers at Tailspin also analyzed a simpler approach to saving and processing query responses using only BLOB storage. Figure 4 illustrates this alternative approach.
When you
calculate the size of messages, you must consider the effect of any
encoding, such as Base64, that you use to encode the data before you
place it in a message. |
As you can see from the sequence diagram in Figure 4, the first stages of the saving survey
response process are the same as for the delayed write pattern. In this
approach, there is no queue and no table storage, and the application
stores the survey results directly in BLOB storage. The worker role now generates the summary statistical data directly from the survey responses in BLOB storage.
Figure 5
illustrates a variation on this scenario where the subscriber has
chosen to host a survey in a different data center from his or her
account.
In this scenario, there is an additional worker role. This worker role is responsible for moving the survey
response data from the data center where the subscriber chose to host
the survey to the data center hosting the subscriber’s account. This
way, the application transfers the survey data between data centers only
once, instead of every time the application needs to read it; this
minimizes the costs associated with this scenario.
2.3. Comparing the Solutions
The second solution is much
simpler than the first. However, you also need to check whether keeping
the survey responses in BLOBs instead of tables adds complexity to any
of the processes that use the survey results. In the Surveys
application, using BLOBs does not add significantly to the complexity of
generating summary statistics, enabling the survey owner to browse the
responses, or exporting the data to SQL Azure.
The
application reads survey response data when it calculates the
statistics, when a user browses through the responses, and when it
exports the data to SQL Azure. |
Although the second solution
does not limit the functionality that the Surveys application requires,
this design may be limiting in other applications. Using the delayed
write pattern means that you can easily perform operations on the data
before it’s saved to a table, so in scenarios where the raw data
requires some processing to make it usable, the first solution may be
more appropriate. Secondly, storing data in tables makes it much easier to access the data with dynamically constructed queries.
Note:
The delayed write pattern enables you to transform the data before saving it without affecting the performance of the web role.
The third difference between the solutions
is the storage costs. The following table summarizes this difference,
showing the number of storage transactions that the application must
perform in order to save a single survey response.
Solution 1 The delayed write pattern | Solution 2 Writing directly to BLOB storage |
1 save to BLOB
1 add message to queue
1 get message from queue
1 read BLOB
1 save to table | 1 save to BLOB |
Total 5 storage transactions | Total 1 storage transactions |
You should also
verify that the second solution does not add to the number of storage
transactions that your application needs to perform when it processes or
uses the saved data. |
3. Inside the Implementation
Now is a good time to walk
through the code that saves the survey responses in more detail. As you
go through this section, you may want to download the Visual Studio
solution for the Tailspin Surveys application from http://wag.codeplex.com/.
3.1. Saving the Survey Response Data to a Temporary Blob
The following code from the SurveysController class in the TailSpin. Web.Survey.Public project shows how the application initiates saving the survey response asynchronously.
[HttpPost]
[ValidateAntiForgeryToken]
public ActionResult Display(string tenant, string surveySlug,
SurveyAnswer contentModel)
{
var surveyAnswer = CallGetSurveyAndCreateSurveyAnswer(
this.surveyStore, tenant, surveySlug);
...
for (int i = 0; i < surveyAnswer.QuestionAnswers.Count; i++)
{
surveyAnswer.QuestionAnswers[i].Answer =
contentModel.QuestionAnswers[i].Answer;
}
if (!this.ModelState.IsValid)
{
var model =
new TenantPageViewData<SurveyAnswer>(surveyAnswer);
model.Title = surveyAnswer.Title;
return this.View(model);
}
this.surveyAnswerStore.SaveSurveyAnswer(surveyAnswer);
return this.RedirectToAction("ThankYou");
}
The surveyAnswerStore variable holds a reference to an instance of the SurveyAnswerStore type. The application uses Unity to initialize this instance with the correct IAzureBlob and IAzureQueue instances. The BLOB container stores the answers to the survey
questions, and the queue maintains a list of new survey answers that
haven’t been included in the summary statistics or the list of survey
answers.
Make sure
that the storage connection strings in your deployment point to storage
in the deployment’s geographical location. The application should use
local queues and BLOB storage to minimize latency. |
The SaveSurveyAnswer method writes the survey response data to the BLOB storage and puts a message onto the queue. The action method then immediately returns a “Thank you” message.
The following code example shows the SaveSurveyAnswer method in the SurveyAnswerStore class.
public void SaveSurveyAnswer(SurveyAnswer surveyAnswer)
{
var surveyBlobContainer = this.surveyAnswerContainerFactory
.Create(surveyAnswer.Tenant, surveyAnswer.SlugName);
surveyBlobContainer.EnsureExist();
DateTime now = DateTime.UtcNow;
surveyAnswer.CreatedOn = now;
var blobId = now.GetFormatedTicks();
surveyBlobContainer.Save(blobId, surveyAnswer);
this.surveyAnswerStoredQueue.AddMessage(
new SurveyAnswerStoredMessage
{
SurveyAnswerBlobId = blobId,
Tenant = surveyAnswer.Tenant,
SurveySlugName = surveyAnswer.SlugName
});
}
This method
first checks that the BLOB container exists and creates it if
necessary. It then creates a unique BLOB ID by using a tick count and
saves the BLOB to the survey container. Finally, it adds a message to
the queue. The application uses the queue to track new survey responses
that must be included in the summary statistics and list of responses
for paging through answers.
Note:
It
is possible, but very unlikely, that the application could try to save
two BLOBs with the same ID if two users completed a survey at exactly
the same time. The code should check for this possibility and, if
necessary, retry the save with a new tick count value.
It’s possible
that the role could fail after it adds the survey data to BLOB storage
but before it adds the message to the queue. In this case, the response
data would not be included in the summary statistics or the list of
responses used for paging. However, the response would be included if
the user exported the survey to SQL Azure. |