DataSource
is…FIXME
DataSource
is created when…FIXME
Tip
|
Read DataSource — Pluggable Data Sources (for Spark SQL’s batch structured queries). |
Name | Description |
---|---|
|
java.lang.Class that corresponds to the className (that can be a fully-qualified class name or an alias of the data source) |
|
Used when:
|
sourceSchema(): SourceInfo
sourceSchema
…FIXME
Note
|
sourceSchema is used exclusively when DataSource is requested for the SourceInfo.
|
DataSource
takes the following when created:
DataSource
initializes the internal registries and counters.
createSource(metadataPath: String): Source
createSource
…FIXME
Note
|
createSource is used exclusively when MicroBatchExecution is requested to initialize the analyzed logical plan.
|
createSink(outputMode: OutputMode): Sink
createSink
creates a streaming sink for StreamSinkProvider or FileFormat
data sources.
Tip
|
Read up on FileFormat Data Source in The Internals of Spark SQL book. |
Internally, createSink
creates a new instance of the providingClass and branches off per type:
-
For a StreamSinkProvider,
createSink
simply delegates the call and requests it to create a streaming sink -
For a
FileFormat
,createSink
creates a FileStreamSink whenpath
option is specified and the output mode is Append
createSink
throws a IllegalArgumentException
when path
option is not specified for a FileFormat
data source:
'path' is not specified
createSink
throws an AnalysisException
when the given OutputMode is different from Append for a FileFormat
data source:
Data source [className] does not support [outputMode] output mode
createSink
throws an UnsupportedOperationException
for unsupported data source formats:
Data source [className] does not support streamed writing
Note
|
createSink is used exclusively when DataStreamWriter is requested to create and start a streaming query.
|