Cosmos DB scenario-based labs - Retail hands-on lab step-by-step

Table of Contents

Cosmos DB scenario-based labs - Retail hands-on lab step-by-step

Abstract and learning objectives

In this demo you will show your audience how to utilize Azure services to host a movie retail store with custom AI models and CosmosDb. Several other PaaS based technologies will be used to show how Azure can be used to migrate legacy applications to the cloud.

Overview

Contoso Movies, Ltd. has redesigned its website to utilize Azure PaaS services including CosmosDb, Functions, EventHubs, Stream Analytics, Power BI and Logic Apps. As part of this redesign they have also implemented a new recommendation system based on custom AI models. These AI models are done offline and stored in CosmosDb for reference when users are browsing a site. User events will implicitly rank the items they are clicking on and then modify their recommendations based on these events.

Solution architecture (High-level)

Requirements

Microsoft Azure subscription must be pay-as-you-go or MSDN.
- Trial subscriptions will not work.
Visual Studio 2019
Azure CLI - version 2.0.68 or later

NOTE You can run the following commands to install the latest

Invoke-WebRequest -Uri https://aka.ms/installazurecliwindows -OutFile .\AzureCLI.msi;
Start-Process msiexec.exe -Wait -ArgumentList '/I AzureCLI.msi /quiet'

.Net Framework 4.7.2
.Net Core 2.2

Before the demo

Refer to the Before the hands-on lab setup guide manual before continuing to the lab exercises.

Be sure that you change the script mode to demo such that the solution code is deployed to the web app and function apps.

Exercise 1: Deployment and Setup

Duration: 60 minutes

Synopsis: In this exercise you will do the necessary setup items that could not be done in the deployment scripts.

Exercise 1: Configure Databricks and generate event data

Duration: 30 minutes

Synopsis: We have pre-generated a set of events that include buy and details events. Based on this data, a Top Items recommendation will be made to users that are new to the site (aka a cold start recommendation). You will implement this top items code in the web application and function applications, then deploy the applications to test the functionality.

The algorithms for creating the offline calculations are written in Python and are executed via Azure Databricks.

Task 1: Configure Azure Databricks

Open the Azure portal (https://portal.azure.com), search for your assigned lab resource group. If you were not assigned a resource group, your generated resource group will be named after the following pattern: YOURINIT-s2-retail.
Select your resource group, and then select your Azure Databricks instance, it should be named s2_databricks....
Select Launch Workspace, if prompted, login as the account you used to create your environment.
In the side navigation, Select Clusters.
Select Create Cluster.
On the create cluster form, provide the following:
- Cluster Name: small
- Cluster Type: Standard
- Databricks Runtime Version: Runtime: 5.5 (Scala 2.11, Spark 2.4.3) (Note: the runtime version may have LTS after the version. This is also a valid selection.)
- Python Version: 3
- Enable Autoscaling: Uncheck this option.
- Auto Termination: Check the box and enter 120
- Worker Type: Standard_DS3_v2
- Driver Type: Same as worker
- Workers: 1
Select Create Cluster.
Before continuing to the next step, verify that your new cluster is running. Wait for the state to change from Pending to Running.
Select the small cluster, then select Libraries.
Select Install New.
In the Install Library dialog, select Maven for the Library Source.

In the Coordinates field type:

com.microsoft.azure:azure-cosmosdb-spark_2.4.0_2.11:1.4.1

Select Install.
Wait until the library's status shows as Installed before continuing.

Task 2: Populate event data

Within Azure Databricks, select Workspace on the menu, then Users, select your user, then select the down arrow on the top of your user workspace. Select Import.
Within the Import Notebooks dialog, select Import from: file, then drag-and-drop the file or browse to upload it ({un-zipped repo folder}/Retail/Notebooks/02 Retail.dbc)
Select Import
After importing, select the new 02 Retail folder, then navigation to the Includes folder
Select the Shared-Configuration notebook
Update the configuration settings and set the following using the values from your lab setup script output:
- Endpoint = Cosmos DB endpoint url
- Masterkey = Cosmos DB master key
- Database = Database id of the cosmos db ('movies')
If you do not have your setup script output values available for reference, you may find the Endpoint and Masterkey values by navigating to your Cosmos DB account in the Azure portal, then selecting Keys in the left-hand menu. Copy the URI value for Endpoint, and Primary Key for the Masterkey value.
Attach your cluster to the notebook using the dropdown. You will need to do this for each notebook you open. In the drop down, select the small cluster.
Next, navigate back up to 02 Retail and select the 01 Event Generator notebook

This notebook will simulate the browsing and purchasing activity for six users with different personality based preferences and save the result to the events container in Cosmos DB.

The movies have been pre-selected and sorted into the genres of comedy, drama and action. While the actual movie selection and activity taken is random, it is weighted to respect the user's preferences in each genre to hit a distribution that would mirror that user's taste.

For example, user 400001 has the preference of 20 for comedy, 30 for drama, 50 for action. This will result in the user logging more activity with action movies.

NOTE: Your results (aka the events generated) may be different from your fellow lab participants
Attach your cluster to the notebook using the dropdown. In the drop down, select the small cluster.
Select Run All.

Task 3: Run the aggregation and import utility

Browse to the {un-zipped repo folder}/Retail/Solution/Contoso Movies folder and open the Contoso.Apps.Movies.sln solution.

If Visual Studio prompts you to sign in when it first launches, use the account provided to you for this lab (if applicable), or an existing Microsoft account.
Within the Solution Explorer, expand the /Utilities/MovieDataImport project and open the Program.cs file. Take a few moments to browse code. You will see that it:
- Aggregates all the event data generated from the Databricks notebook
- Creates the user personalities
- Creates the movie categories/genres
- Creates the movies
Right-click the project, select Set as startup project.
Press F5 to run the project.

You may see several of the following lines output to the console window after saving the genres and before adding the movies: Input string was not in a correct format.. You can safely ignore these due to some movies the API retrieved are poorly formatted.

NOTE: You must have waited for the Event Generator Databricks notebook to complete for this to run and have the later steps in the lab match.

Task 4: Perform and deploy association rules calculation for offline algorithms

Synopsis: Based on the pre-calculated events in the Cosmos DB for our pre-defined personality types (Comedy fan, Drama fan, etc.), you will implement and deploy an algorithm that will generate these associations and put them in Cosmos DB for offline processing by the web and function applications.

Switch back to your Databricks workspace and open the 02 Association Rules notebook.
Attach your cluster to the notebook using the dropdown. In the drop down, select the small cluster.
Run each cell of the 02 Association Rules notebook by selecting within the cell, then entering Ctrl+Enter on your keyboard. Pay close attention to the instructions within the notebook so you understand each step of the data preparation process.

The goal of this algorithm is to compute two metrics that indicate the strength of a relationship between a source item and a target item based on event history, and then save that matrix to the associations container in Cosmos DB.

The algorithm begins with grouping events with a buy action into a transaction, grouping by the sessionId. This provides the set of items bough together.

For example, a transaction with two items would look like: '404973': ['5512872', '4172430'] where 404973 is the sessionId that is used as the transactionId, and the the array contains the id's of the items bought ('5512872' and '4172430').

Task 5: Perform and deploy collaborative filtering rules calculation

Synopsis: In this exercise you will execute the implict ratings notebook in Azure Databricks to generate the implict rating for each user that has event data. You will only execute this once during this lab, however this notebook would need to be run on a set schedule to ensure that the users rating data is up to date.

Within Azure Databricks, open 03 Ratings.
Attach your cluster to the notebook using the dropdown. In the drop down, select the small cluster.
Run each cell of the 03 Ratings notebook by selecting within the cell, then entering Ctrl+Enter on your keyboard. Pay close attention to the instructions within the notebook so you understand each step of the data preparation process.

This notebook will use the implict events captured in the events container in Cosmos DB to calculate what a user would rate a given item, based on their actions. In other words it converts a users buy, addToCart and details actions into a numeric score for the item. The resulting user to item ratings matrix will be saved to the ratings container in Cosmos DB.
Switch back to the Azure portal.
In your resource group, navigate to your Cosmos DB instance.
Open the ratings container, review the items in the container.

NOTE: These ratings are generated as part of this notebook as an 'offline' operation. If you collect a significant amount of user data, you would need to re-evaluate the events using this notebook and populate the ratings container again for the online calculations to utilize.

Task 6: Generate the Collaborative Rules

Within Azure Databricks, open 04 Similarity.
Attach your cluster to the notebook using the dropdown. In the drop down, select the small cluster.
Run each cell of the 04 Similarity notebook by selecting within the cell, then entering Ctrl+Enter on your keyboard. Pay close attention to the instructions within the notebook so you understand each step of the data preparation process.

The notebook logic uses the user to item ratings previously created to calculate a score indicating the similarity between a source item and a target item. The process begins by loading the ratings matrix and for each user to item rating, calculating a new normalized rating (to adjust for the user's bias).

An overlap matrix is calculated that identifies, for any pair of items, how many users rated both items. First, the normalized ratings matrix is converted to a Boolean matrix. That is, if an item for a user has a rating (regardless of the value of the rating), it has a value of 1, otherwise it is zero. Then dot product of the normalized ratings matrix against its transpose is calculated. This yields a simpler matrix where the value each cell now contains the count of the number users who rated both items. Cells that don't have any overlap, have a value of zero.

Separately, the cosine similarity of the normalized ratings matrix is computed. It's easiest to understand the cosine similarity calculation as being done between an item i and another item j. The cosine similarity is a ratio:
- The numerator is computed as the sum of the product of the normalized rating of item i multiplied with the rating of j, for all users who have provided ratings. The denominator is computed as the square root of the sum of the squares of the normalized rating of item i multiplied by the square root of the sum of thesquares of the normalized rating of item j. In Python, the logic uses the cosine_similarity method from scikit-learn to compute the similarity between items by providing it our normalized user-to-items ratings matrix.
The result is then filtered to remove entries with a similarity score lower than configured, and having an overlap in the overlap matrix of less than a configured overlap in quantity of ratings for the pair of items. Just before saving, any resulting similarities with scores less than the configured minimum similarity are removed, so that weaker similarities are not recommended.

Task 7: Setup Stream Analytics

Open the Azure Portal, navigate to your Stream Analytics job that was created for you in the setup script
Select Inputs
Select +Add stream input, then select Event Hub
For the alias, type s2events
Select your subscription
Select the s2ns.. event hub
For the event hub, select store
For the policy name, select RootManageSharedAccessKey
Select Save
Select Outputs
Select +Add, then select Power BI
For the output alias, type eventOrdersLastHour
For the dataset, type eventOrdersLastHour
For the table name, type eventOrdersLastHour
Select Authorize, login to your Power BI instance
Select Save
Repeat for steps 11-16, but replace eventOrdersLastHour with:

eventSummary
failureCount
eventData

Select Query
Update the query to the following:

SELECT Count(*) as FailureCount
 INTO failureCount
 FROM s2events
 WHERE Event = 'paymentFailure'
 GROUP BY TumblingWindow(second,10)

SELECT Count(distinct UserId) as UserCount, System.TimeStamp AS Time, Count(*) as EventCount
 INTO eventData
 FROM s2events
 GROUP BY TumblingWindow(second,10)

 SELECT System.TimeStamp AS Time, Event, Count(*)
 INTO eventSummary
 FROM s2events
 GROUP BY Event, TumblingWindow(second,10)

 select DateAdd(second,-10,System.Timestamp()) AS WinStartTime, System.Timestamp() AS WinEndTime,0 as Min, Count(*) as Count, 10 as Target
 into eventOrdersLastHour
 from s2events
 where event = 'buy'
 GROUP BY SlidingWindow(second,10)

The Query windows should look similar to this:

Select Save query
Select Overview, in the menu, select Start to start your stream analytics job

In the dialog, ensure that Now is selected, then select Start

NOTE: If your job fails for any reason, you can use the Activity Log to see what the error(s) were.

Task 8: Generate user events for PowerBI

Browse to the {un-zipped repo folder}/Retail/Solution/Contoso Movies folder and open the Contoso.Apps.Movies.sln solution
Right-click the DataGenerator project, select Set as startup project
Press F5 to run the project
Notice events will be generated based on a set of users and their preferred movie type

Buy events will be generated for the first 30 seconds with random payment failures also generated. After 30 seconds, you will notice the orders per hour will fall below the target of 10. This would signify that something is wrong with the front end web site or order processing.
After about 1 minute, close the DataGenerator console program

Task 9: Setup Power BI Dashboard

Open a new browser window to Power BI
Click Sign In, sign in using the same credentials you used to authorize your outputs for Stream Analytics above.
Select My workspace
Select +Create, then select Dashboard

For the name, type Contoso Movies, select Create
Select the ... ellipses, then select +Add tile
Select Custom Streaming Data, select Next
Select the eventData data set, then select Next

For the visualization type, select Card
For the Fields, select EventCount
Select Next
For the title, type Event Count, then select Apply
Select +Add tile, you may need to select the ... ellipses first
Select Custom Streaming Data, select Next. Use the following table to create the needed tiles:


Dataset	Type	Fields	Title
eventData	Card	UserCount	User Count
failureCount	Card	FailureCount	Payment Failures
eventSummary	Line cart	Axis = UserCount, Legend = Event, Values = Count	Count By Event
eventOrdersLastHour	Gauge	Value = Count, Minimum = Min, Target = Target	Orders Per Hour

Your dashboard should look similar to the following:

Task 10: Generate user events for real time analytics

Switch back to Visual Studio, press F5 to run the data generator project
Switch to your Power BI dashboard, after a few minutes, you should see it update with the event data:

Exercise 2: Email alerts using Logic Apps

Duration: 30 minutes

In this exercise you will configure your change feed function to call an HTTP login app endpoint that will then send an email when an order event occurs. The function will be using Polly to handle retries in the case the function app is not available.

Task 1: Setup Logic App

Open the Azure Portal to your resource group and select the Logic App in your resource group, it should be named s2logicapp...
Click Edit

Click +New step

Search for send an email, then select the Office 365 outlook connector

Click Sign in, login using your Azure AD credentials

Set the To as your email
Set the Subject as Thank you for your order
Set the Body as Your order is being processed
Click Save

Click on the When a HTTP request is received action, copy the HTTP POST URL for the logic app and save it for the next task

Task 2: Configure the function app settings

Open the Azure Portal to your resource group and select the Function App in your resource group, it should be named s2func...
Click Configuration
Add or update the LogicAppUrl configuration variable to the Logic App http endpoint you recorded above
Click Save

Task 3: Explore the Databricks notebooks

Switch back to the Azure Portal
Select your Databricks instance, then click Launch Workspace
Browse to each of the notebooks that were deployed as part of your deployment script and review the contents with your audience. Note the following:

01 Event Generator - this notebook will generate a random set of events for each target user and their personality. This is then used to generate the 'ratings'. Most of the generation code is in Cmd 9 and you can focus your converstation around that cell.
02 Associations Rules - Review the comments in Cmd 7, this describes what is happening in the rest of the notebook
03 Ratings - Review Cmd 9, point out the weightings for each action and then where the implict rating is created.
04 Similarity - REview the comments in Cmd 7, this describes what is happening in the rest of the notebook

Task 4: Explore the Function App Recommendation Code

Switch to Visual Studio and open the Contoso.Apps.FunctionApp project, then open the RecommendationHelper.cs file
Navigate to the public static List<Item> Get(string algo, int userId, int contentId, int take) Get method signature. Point out that this is the entry point for where a recommedation will start based on the algorithm requested.
Review the following methods and their code:

TopRecommendation - this is the basic method for randomly selecting a set of top purchased items.
AssociationRecommendationByUser -
CollaborativeBasedRecommendation -

Task 5: Explore the Function App ChangeFeed Code

Switch to Visual Studio and open the Contoso.Apps.FunctionApp project, then open the FuncChangeFeed.cs file
Review the Dependency Injection for the IHttpClientFactory and the CosmosClient objects:

// Use Dependency Injection to inject the HttpClientFactory service that was configured in Startup.cs.
public FuncChangeFeed(IHttpClientFactory httpClientFactory, CosmosClient cosmosClient)
{
    _httpClientFactory = httpClientFactory;
    _cosmosClient = cosmosClient;
}

Review the following methods and their code:

DoAggregateCalculations - This method updates the item aggregations for the buy events to keep track of the top items purchased. This will continually update and drive the top suggestions. You will see this when you execute the Data Generator tool.
AddEventToEventHub - This method will forward the changefeed item to the event hub where Stream Analytics will then process the data.
CallLogicApp - This method will forward the changefeed item to the logic app's http endpoint that will generate an email

Task 6: Test order email delivery

Switch to Visual Studio, right-click the DataGenerator project, select Set as startup project
Press F5 to run the project
For each buy event, you should receive an email

NOTE: You could receive quite a few emails.

Exercise 3: Explore Contoso Movie Store

Duration: 15 minutes

Synopsis: You will show your attendees the Contoso Movies store. It is an ecommerce site setup using Cosmos DB as its data store. In addition, Azure Functions are monitoring the changefeed of Cosmos DB to execute reporting and notification activities. A second function is in charge of providing recommendations based on the logged in user. This function calls logic and pre-calculated offline AI models based on user behavior to make movie recommendations.

Task 1: Explore the Contoso Movie Store

Open the deployed Conotos Movie web site

NOTE: This should have opened as part of the demo mode setup script.

Mention that you are not logged in as any user and the results that are being displayed are based on the top purchased items in the Cosmso database.
In the top navigation, select the Login link
Mention that there are several pre-populated personalities. Select the COMEDY@CONTOSOMOVIES.COM personality
Mention that you now have targeted movies based on two different algorithms (Association and Collaborative)
In the top navigation, select the COMEDY@CONTOSOMOVIES.COM link, then select SWITCH
Change the user to the DRAMA@CONTOSOMOVIES.COM user. Note how the recommendations are different from the comedy user.

Task 2: Create a new personality

In the top navigation, select the DRAMA@CONTOSOMOVIES.COM link, then select SWITCH
Select New User. This will create a session as a new user that has no implict ratings (no actions have been generated).
Point out that you have no Association or Collaboration recommendations.
Click on a few movies in the portal, then select Add to Cart for a random set. These actions will generate events for the new user.
Click Home, you should now see recommendations displayed.

NOTE: Some movies may not have a corresponding similarity or assocations depending on the randomness of the Databricks notebook execution. You may need to click on a few movies before you see any recommendations.

After the hands-on lab

Duration: 10 minutes

In this exercise, attendees will deprovision any Azure resources that were created in support of the lab.

Task 1: Delete resource group

Using the Azure portal, navigate to the Resource group you used throughout this hands-on lab by selecting Resource groups in the menu.
Search for the name of your research group, and select it from the list.
Select Delete in the command bar, and confirm the deletion by re-typing the Resource group name and selecting Delete.

You should follow all steps provided after attending the Hands-on lab.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demo step-by-step - Cosmos DB scenario-based demo - Retail.md

Demo step-by-step - Cosmos DB scenario-based demo - Retail.md

Cosmos DB scenario-based labs - Retail hands-on lab step-by-step

Abstract and learning objectives

Overview

Solution architecture (High-level)

Requirements

Before the demo

Exercise 1: Deployment and Setup

Exercise 1: Configure Databricks and generate event data

Task 1: Configure Azure Databricks

Task 2: Populate event data

Task 3: Run the aggregation and import utility

Task 4: Perform and deploy association rules calculation for offline algorithms

Task 5: Perform and deploy collaborative filtering rules calculation

Task 6: Generate the Collaborative Rules

Task 7: Setup Stream Analytics

Task 8: Generate user events for PowerBI

Task 9: Setup Power BI Dashboard

Task 10: Generate user events for real time analytics

Exercise 2: Email alerts using Logic Apps

Task 1: Setup Logic App

Task 2: Configure the function app settings

Task 3: Explore the Databricks notebooks

Task 4: Explore the Function App Recommendation Code

Task 5: Explore the Function App ChangeFeed Code

Task 6: Test order email delivery

Exercise 3: Explore Contoso Movie Store

Task 1: Explore the Contoso Movie Store

Task 2: Create a new personality

After the hands-on lab

Task 1: Delete resource group

Files

Demo step-by-step - Cosmos DB scenario-based demo - Retail.md

Latest commit

History

Demo step-by-step - Cosmos DB scenario-based demo - Retail.md

File metadata and controls

Cosmos DB scenario-based labs - Retail hands-on lab step-by-step

Abstract and learning objectives

Overview

Solution architecture (High-level)

Requirements

Before the demo

Exercise 1: Deployment and Setup

Exercise 1: Configure Databricks and generate event data

Task 1: Configure Azure Databricks

Task 2: Populate event data

Task 3: Run the aggregation and import utility

Task 4: Perform and deploy association rules calculation for offline algorithms

Task 5: Perform and deploy collaborative filtering rules calculation

Task 6: Generate the Collaborative Rules

Task 7: Setup Stream Analytics

Task 8: Generate user events for PowerBI

Task 9: Setup Power BI Dashboard

Task 10: Generate user events for real time analytics

Exercise 2: Email alerts using Logic Apps

Task 1: Setup Logic App

Task 2: Configure the function app settings

Task 3: Explore the Databricks notebooks

Task 4: Explore the Function App Recommendation Code

Task 5: Explore the Function App ChangeFeed Code

Task 6: Test order email delivery

Exercise 3: Explore Contoso Movie Store

Task 1: Explore the Contoso Movie Store

Task 2: Create a new personality

After the hands-on lab

Task 1: Delete resource group