Skip to content

Latest commit

 

History

History
58 lines (45 loc) · 3.42 KB

hdinsight-hive-analyze-website-log.md

File metadata and controls

58 lines (45 loc) · 3.42 KB
title description services documentationcenter author manager editor tags ms.assetid ms.service ms.workload ms.tgt_pltfrm ms.devlang ms.topic ms.date ms.author ROBOTS
Use Hive with Hadoop for website log analysis - Azure HDInsight | Microsoft Docs
Learn how to use Hive with HDInsight to analyze website logs. You'll use a log file as input into an HDInsight table, and use HiveQL to query the data.
hdinsight
nitinme
jhubbard
cgronlun
azure-portal
6fb7b5c2-8df4-40b1-a9e2-6815080004f9
hdinsight
big-data
na
na
article
05/17/2016
nitinme
NOINDEX

Use Hive with Windows-based HDInsight to analyze logs from websites

Learn how to use HiveQL with HDInsight to analyze logs from a website. Website log analysis can be used to segment your audience based on similar activities, categorize site visitors by demographics, and to find out the content they view, the websites they come from, and so on.

Important

The steps in this document only work with Windows-based HDInsight clusters. HDInsight is only available on Windows for versions lower than HDInsight 3.4. Linux is the only operating system used on HDInsight version 3.4 or greater. For more information, see HDInsight retirement on Windows.

In this sample, you will use an HDInsight cluster to analyze website log files to get insight into the frequency of visits to the website from external websites in a day. You'll also generate a summary of website errors that the users experience. You will learn how to:

  • Connect to a Azure Blob storage, which contains website log files.
  • Create HIVE tables to query those logs.
  • Create HIVE queries to analyze the data.
  • Use Microsoft Excel to connect to HDInsight (by using open database connectivity (ODBC) to retrieve the analyzed data.

HDI.Samples.Website.Log.Analysis

Prerequisites

To run the sample

  1. From the Azure Portal, from the Startboard (if you pinned the cluster there), click the cluster tile on which you want to run the sample.

  2. From the cluster blade, under Quick Links, click Cluster Dashboard, and then from the Cluster Dashboard blade, click HDInsight Cluster Dashboard. Alternatively, you can directly open the dashboard by using the following URL:

      https://<clustername>.azurehdinsight.net
    

    When prompted, authenticate by using the administrator user name and password you used when provisioning the cluster.

  3. From the web page that opens, click the Getting Started Gallery tab, and then under the Solutions with Sample Data category, click the Website Log Analysis sample.

  4. Follow the instructions provided on the web page to finish the sample.

Next steps

Try the following sample: Analyzing sensor data using Hive with HDInsight.