-
Notifications
You must be signed in to change notification settings - Fork 12
ML notebook
Gezim Sejdiu edited this page Feb 12, 2018
·
1 revision
SANSA-ML is the Machine Learning (ML) library in the SANSA stack. Algorithms in this repository perform various machine learning tasks directly on RDF/OWL input data. While most machine learning algorithms are based on processing simple features, the machine learning algorithms in SANSA-ML exploit the graph structure and semantics of the background knowledge specified using the RDF and OWL standards. In many cases, this allows obtaining either more accurate or more human-understandable results. In contrast to most other algorithms supporting background knowledge, they scale horizontally using Apache Spark and Apache Flink.
import scala.collection.mutable
import org.apache.spark.sql.SparkSession
import org.apache.log4j.{ Level, Logger }
import net.sansa_stack.ml.spark.clustering.RDFByModularityClustering
val graphFile = "hdfs://namenode:8020/data/Clustering_sampledata.nt"
val outputFile = "hdfs://namenode:8020/data/clustering.out"
val numIterations = 10
RDFByModularityClustering(sc, numIterations, graphFile, outputFile)
import scala.collection.mutable
import net.sansa_stack.ml.spark.mining.amieSpark.KBObject.KB
import net.sansa_stack.ml.spark.mining.amieSpark.{ RDFGraphLoader, DfLoader }
import net.sansa_stack.ml.spark.mining.amieSpark.MineRules.Algorithm
val input = "hdfs://namenode:8020/data/MineRules_sampledata.tsv"
val outputPath = "hdfs://namenode:8020/output"
val hdfsPath = outputPath + "/"
val know = new KB()
know.sethdfsPath(hdfsPath)
know.setKbSrc(input)
know.setKbGraph(RDFGraphLoader.loadFromFile(know.getKbSrc(), spark.sparkContext, 2))
know.setDFTable(DfLoader.loadFromFileDF(know.getKbSrc, spark.sparkContext, spark.sqlContext, 2))
val algo = new Algorithm(know, 0.01, 3, 0.1, hdfsPath)
var output = algo.ruleMining(spark.sparkContext, spark.sqlContext)
var outString = output.map { x =>
var rdfTrp = x.getRule()
var temp = ""
for (i <- 0 to rdfTrp.length - 1) {
if (i == 0) {
temp = rdfTrp(i) + " <= "
} else {
temp += rdfTrp(i) + " \u2227 "
}
}
temp = temp.stripSuffix(" \u2227 ")
temp
}.toSeq
var rddOut = spark.sparkContext.parallelize(outString).repartition(1)
rddOut.saveAsTextFile(outputPath + "/testOut")