title | description | services | documentationcenter | author | manager | editor | tags | ms.assetid | ms.service | ms.workload | ms.tgt_pltfrm | ms.devlang | ms.topic | ms.date | ms.author | ROBOTS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Use Hive with Hadoop for website log analysis - Azure HDInsight | Microsoft Docs |
Learn how to use Hive with HDInsight to analyze website logs. You'll use a log file as input into an HDInsight table, and use HiveQL to query the data. |
hdinsight |
nitinme |
jhubbard |
cgronlun |
azure-portal |
6fb7b5c2-8df4-40b1-a9e2-6815080004f9 |
hdinsight |
big-data |
na |
na |
article |
05/17/2016 |
nitinme |
NOINDEX |
Learn how to use HiveQL with HDInsight to analyze logs from a website. Website log analysis can be used to segment your audience based on similar activities, categorize site visitors by demographics, and to find out the content they view, the websites they come from, and so on.
Important
The steps in this document only work with Windows-based HDInsight clusters. HDInsight is only available on Windows for versions lower than HDInsight 3.4. Linux is the only operating system used on HDInsight version 3.4 or greater. For more information, see HDInsight retirement on Windows.
In this sample, you will use an HDInsight cluster to analyze website log files to get insight into the frequency of visits to the website from external websites in a day. You'll also generate a summary of website errors that the users experience. You will learn how to:
- Connect to a Azure Blob storage, which contains website log files.
- Create HIVE tables to query those logs.
- Create HIVE queries to analyze the data.
- Use Microsoft Excel to connect to HDInsight (by using open database connectivity (ODBC) to retrieve the analyzed data.
- You must have provisioned a Hadoop cluster on Azure HDInsight. For instructions, see Provision HDInsight Clusters.
- You must have Microsoft Excel 2013 or Excel 2010 installed.
- You must have Microsoft Hive ODBC Driver to import data from Hive into Excel.
-
From the Azure Portal, from the Startboard (if you pinned the cluster there), click the cluster tile on which you want to run the sample.
-
From the cluster blade, under Quick Links, click Cluster Dashboard, and then from the Cluster Dashboard blade, click HDInsight Cluster Dashboard. Alternatively, you can directly open the dashboard by using the following URL:
https://<clustername>.azurehdinsight.net
When prompted, authenticate by using the administrator user name and password you used when provisioning the cluster.
-
From the web page that opens, click the Getting Started Gallery tab, and then under the Solutions with Sample Data category, click the Website Log Analysis sample.
-
Follow the instructions provided on the web page to finish the sample.
Try the following sample: Analyzing sensor data using Hive with HDInsight.