DataSource — Pluggable Data Source

DataSource is…FIXME

DataSource is created when…FIXME

Tip	Read DataSource — Pluggable Data Sources (for Spark SQL’s batch structured queries).

Table 1. DataSource’s Internal Properties (e.g. Registries, Counters and Flags)

Name	Description
`providingClass`	java.lang.Class that corresponds to the className (that can be a fully-qualified class name or an alias of the data source)
`sourceInfo`	`SourceInfo` with the name, the schema, and optional partitioning columns of a source. Used when: `DataSource` creates a FileStreamSource (that requires the schema and the optional partitioning columns) `StreamingRelation` is created (for a `DataSource`)

sourceSchema(): SourceInfo

sourceSchema…FIXME

Note	`sourceSchema` is used exclusively when `DataSource` is requested for the SourceInfo.

DataSource takes the following when created:

createSource(metadataPath: String): Source

createSource…FIXME

Note	`createSource` is used exclusively when `MicroBatchExecution` is requested to initialize the analyzed logical plan.

createSink(outputMode: OutputMode): Sink

createSink creates a streaming sink for StreamSinkProvider or FileFormat data sources.

Tip	Read up on FileFormat Data Source in The Internals of Spark SQL book.

Internally, createSink creates a new instance of the providingClass and branches off per type:

For a StreamSinkProvider, createSink simply delegates the call and requests it to create a streaming sink
For a FileFormat, createSink creates a FileStreamSink when path option is specified and the output mode is Append

createSink throws a IllegalArgumentException when path option is not specified for a FileFormat data source:

'path' is not specified

createSink throws an AnalysisException when the given OutputMode is different from Append for a FileFormat data source:

Data source [className] does not support [outputMode] output mode

createSink throws an UnsupportedOperationException for unsupported data source formats:

Data source [className] does not support streamed writing

Note	`createSink` is used exclusively when `DataStreamWriter` is requested to create and start a streaming query.

Provide feedback