Skip to content

Latest commit

 

History

History
88 lines (56 loc) · 5.28 KB

hdinsight-install-published-app-streamsets.md

File metadata and controls

88 lines (56 loc) · 5.28 KB
title description services documentationcenter author manager editor tags ms.assetid ms.service ms.custom ms.devlang ms.topic ms.tgt_pltfrm ms.workload ms.date ms.author
Install Published Application - StreamSets Data Collector on Azure HDInsight | Microsoft Docs
Learn how to install third-party Hadoop applications on Azure HDInsight.
hdinsight
azure-portal
hdinsight
hdinsightactive
na
article
na
big-data

Install published application - StreamSets Data Collector on Azure HDInsight

In this article, you will learn how to install the StreamSets Data Collector for HDInsight published Hadoop application on Azure HDInsight. Read Install third-party Hadoop applications for a list of available Independent Software Vendor (ISV) applications, as well as an overview of the HDInsight application platform. For instructions on installing your own application, see Install custom HDInsight applications.

About StreamSets Data Collector

StreamSets Data Collector deploys on top of Azure HDInsight application. It provides a full-featured integrated development environment (IDE) that lets you design, test, deploy, and manage any-to-any ingest pipelines that mesh stream and batch data, and include a variety of in-stream transformations—all without having to write custom code.

StreamSets Data Collector lets you build data flows, including numerous Big Data components such as HDFS, Kafka, Solr, Hive, HBASE, and Kudu. Once StreamSets Data Collector is running on edge or in your Hadoop cluster, you get real-time monitoring for both data anomalies and data flow operations, including threshold-based alerting, anomaly detection, and automatic remediation of error records.

Because it is architected to logically isolate each stage in a pipeline, you can meet new business requirements by dropping in new processors and connectors without code and with minimal downtime.

Resource links

Installing the StreamSets Data Collector published application

For step-by-step instructions on installing this and other available ISV applications, please read Install third-party Hadoop applications.

Prerequisites

When creating a new HDInsight cluster, or to install on an existing one, you must have the following configuration to install this app:

  • Cluster tier(s): Standard or Premium
  • Cluster version(s): 3.5 and above

Launching StreamSets Data Collector for the first time

After installation, you can launch StreamSets from your cluster in Azure Portal by going to the the Settings blade, then clicking Applications under the General category. The Installed Apps blade lists all the installed applications.

Installed StreamSets app

When you select StreamSets Data Collector, you'll see a link to the web page, as well as the SSH endpoint path. Select the WEBPAGE link.

In the Login dialog box, use the following credentials to log in: admin / admin.

  • On the Get Started page, click Create New Pipeline.

    Create new pipeline

  • In the New Pipeline window, enter a name for the pipeline ("Hello World"), optionally enter a description, and click Save.

  • The Data Collector console will appear. The Properties panel displays pipeline properties.

    Data Collector console

  • You are now ready to follow the official StreamSets tutorial. This will provide you with detailed step-by-step directions to create your first pipeline.

Next steps