-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation on how to write a RecordBuilder #739
Open
diosmosis
wants to merge
7
commits into
live
Choose a base branch
from
record-builder-doc
base: live
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 6 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
148591c
add document on how to write a RecordBuilder
diosmosis 49a3f79
Update writing-a-record-builder.md
diosmosis 1e5ad87
Update docs/5.x/writing-a-record-builder.md
diosmosis a540e1f
Update docs/5.x/writing-a-record-builder.md
diosmosis 255d8ad
add a quick section on overriding non-day period aggregation
diosmosis 397eb4a
apply review feedback
diosmosis 2b9ccbf
Update writing-a-record-builder.md
diosmosis File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,250 @@ | ||
--- | ||
category: Develop | ||
--- | ||
|
||
<div markdown="1" class="alert alert-warning"> | ||
**This API is unstable.** | ||
|
||
The RecordBuilder API will eventually be public and the only way to define archiving logic, but for now the API is unstable | ||
and subject to change. Please be aware it could potentially change between minor version releases. | ||
</div> | ||
|
||
# Writing a RecordBuilder | ||
|
||
RecordBuilders encapsulate the smallest units of aggregation logic required to generate records for a plugin. | ||
|
||
They define two methods: `aggregate()` which builds the actual `DataTable` & numeric records to insert into archive tables, | ||
and `getRecordMetadata()` which returns information about what records the `RecordBuilder` builds. | ||
|
||
`aggregate()` will generally aggregate data from log tables to create records, but it does not have to. An example of a use case | ||
without aggregation would be importing analytics data from another service. | ||
|
||
`getRecordMetadata()` is used when aggregating records for non-day periods. In this case, Matomo will find the record values | ||
for the subperiods of the non-day period and aggregate them together. | ||
|
||
If your plugin needs to insert data into the archive tables during archiving, then you'll want to create your own `RecordBuilder` classes. | ||
This guide describes how to do that. | ||
|
||
## How to create one | ||
|
||
### Step one: identify the list of records and log aggregation queries you want to bundle together | ||
|
||
Log aggregation queries are expensive (especially with segmentation) and Matomo wants to be able to run as few of them | ||
as possible at a time. A `RecordBuilder` is meant to encapsulate the smallest amount of archiving logic possible, allowing Matomo | ||
to run just what it needs to. | ||
|
||
Many times this will either be running a single log aggregation query to generate a single `DataTable` or running a single | ||
log aggregation query to generate multiple numeric metrics. Sometimes it will mean running multiple log aggregation queries | ||
to generate a single `DataTable` or running multiple log aggregation queries to generate multiple `DataTable`s and multiple metrics. | ||
|
||
It is up to you as a developer to find the balance between efficiency (executing the fewest log aggregation queries overall) | ||
and modularity (having `RecordBuilder`s that individually do as little as possible). | ||
|
||
Once you've identified the `RecordBuilder`s you'll need, create empty classes for them in a `RecordBuilders` subfolder of your plugin. For example, | ||
`/path/to/matomo/plugins/MyPlugin/RecordBuilders/MyRecordBuilder`. | ||
|
||
**A note about Parameterized RecordBuilders** | ||
|
||
`RecordBuilder`s that can be created without specifying constructor arguments (as in, are default constructable) | ||
are found and created automatically by Matomo. But it is also possible to create `RecordBuilder`s that require | ||
parameters. These `RecordBuilder`s are added via the `Archiver.addRecordBuilders` event. | ||
|
||
The ability to create parameterized `RecordBuilder`s may not be necessary in most cases, but if your plugin | ||
manages entities and provides reports about those entities, it can be used to avoid having to run a query for | ||
every entity in the database within a single `RecordBuilder`. | ||
|
||
Examples of plugins that use this feature are the Custom Reports premium feature and the A/B Testing premium feature. | ||
Each of these plugins use a `RecordBuilder` that takes an ID. For Custom Reports this is the ID of the specific custom | ||
report and for A/B Testing this is the ID of the experiment. | ||
|
||
### Step two: implement `getRecordMetadata()` | ||
|
||
Once you know what queries the `RecordBuilder`s you are going to create will execute, you can start coding. | ||
The first thing to do is implement the `getRecordMetadata()` method. | ||
|
||
All this method does is return a list of `Record` entries describing the records the builder will create: | ||
|
||
``` | ||
use Piwik\ArchiveProcessor\Record; | ||
|
||
public function getRecordMetadata(ArchiveProcessor $archiveProcessor): array | ||
{ | ||
return [ | ||
Record::make(Record::TYPE_BLOB, 'MyPlugin_myRecord'), | ||
Record::make(Record::TYPE_NUMERIC, 'MyPlugin_myMetric'), | ||
... | ||
]; | ||
} | ||
``` | ||
|
||
The above is a typical example of how this method would be implemented, but it doesn't have to just be a hard-coded array. | ||
You can use the `ArchiveProcessor` to get the current site/period/segment or fetch system settings or measurable | ||
settings and vary the result based on that information. The only requirement is that every `Record` returned matches | ||
what can be returned by the `aggregate()` method, which we'll look at next. | ||
|
||
### Step three: implement `aggregate()` | ||
|
||
The next step is to implement your actual log aggregation logic in the `aggregate()` method. This method accepts | ||
an `ArchiveProcessor` and returns an array mapping record names with record values to insert. Record values are | ||
either numeric metric values or `DataTable` instances, which get serialized and inserted as blobs. | ||
|
||
As for how they are created, well, there is no straightforward way to define how log aggregation is done. | ||
|
||
The current pattern in Matomo is to use the core `LogAggregator` class to query log data and loop through the result. | ||
If your plugin provides its own additional log tables, then the pattern is to define your own `Aggregator` classes | ||
to build and execute log aggregation SQL queries, and use those classes in your `RecordBuilders`. | ||
|
||
An example of this might look like: | ||
|
||
``` | ||
public function aggregate(ArchiveProcessor $archiveProcessor): array | ||
{ | ||
$logAggregator = $archiveProcessor->getLogAggregator(); | ||
|
||
$report = new DataTable(); | ||
|
||
$query = $logAggregator->queryVisitsByDimension(['label' => 'config_browser_name']); | ||
while ($row = $query->fetch()) { | ||
$columns = [ | ||
Metrics::INDEX_NB_UNIQ_VISITORS => $row[Metrics::INDEX_NB_UNIQ_VISITORS], | ||
Metrics::INDEX_NB_VISITS => $row[Metrics::INDEX_NB_VISITS], | ||
Metrics::INDEX_NB_ACTIONS => $row[Metrics::INDEX_NB_ACTIONS], | ||
Metrics::INDEX_NB_USERS => $row[Metrics::INDEX_NB_USERS], | ||
Metrics::INDEX_MAX_ACTIONS => $row[Metrics::INDEX_MAX_ACTIONS], | ||
Metrics::INDEX_SUM_VISIT_LENGTH => $row[Metrics::INDEX_SUM_VISIT_LENGTH], | ||
Metrics::INDEX_BOUNCE_COUNT => $row[Metrics::INDEX_BOUNCE_COUNT], | ||
Metrics::INDEX_NB_VISITS_CONVERTED => $row[Metrics::INDEX_NB_VISITS_CONVERTED], | ||
]; | ||
|
||
$report->sumRowWithLabel($row['label'] ?? '', $columns); | ||
} | ||
|
||
return [ | ||
'MyPlugin_myRecord' => $report, | ||
michalkleiner marked this conversation as resolved.
Show resolved
Hide resolved
|
||
]; | ||
} | ||
``` | ||
|
||
This example queries the `log_visit` table, grouping by the `config_browser_name` column and aggregating visit metrics. | ||
Then, for each row of that query, it adds the metrics to a `DataTable` which is eventually returned. | ||
|
||
Most `aggregate()` methods will be more complicated than this, but hopefully it provides you with a general understanding | ||
of how they should work. We recommend looking at existing `RecordBuilder`s in Matomo as well to see what is possible. | ||
|
||
### Step four: decide whether you need to set custom row limits or aggregation operations | ||
|
||
At this point, the hard parts are over. The last two steps are just finishing touches. | ||
|
||
By default, Matomo does not limit the data that is inserted into archive tables. For reports that have a limited number | ||
of rows, like the `VisitorInterest.getVisitsByVisitCount` and `UserCountry.getCountry`, this is acceptable. But for reports | ||
with a variable number of rows, it's good practice to make sure the number of rows is capped. | ||
|
||
To set a limit, set the `maxRowsInTable` and `maxRowsInSubtable` properties in the constructor of your `RecordBuilder`. | ||
This can be hard-coded or it can come from configuration: | ||
|
||
``` | ||
class MyRecordBuilder extends RecordBuilder | ||
{ | ||
public function __construct() | ||
{ | ||
parent::__construct(); | ||
$this->maxRowsInTable = (int)Config::getInstance()->MyPlugin['datatable_archiving_maximum_rows']; | ||
$this->maxRowsInSubtable = (int)Config::getInstance()->MyPlugin['datatable_archiving_maximum_rows_subtable']; | ||
|
||
// we want to sort by the most important metric in our reports before we cut off rows | ||
$this->columnToSortByBeforeTruncation = Metrics::INDEX_NB_VISITS; | ||
} | ||
} | ||
``` | ||
|
||
If you don't know what to use, you can set both values to `Config::getInstance()->General['datatable_archiving_maximum_rows_standard']`. | ||
|
||
Also note we set `columnToSortByBeforeTruncation` to make sure the rows with the least visits are the ones that get removed. | ||
|
||
Additionally, if your plugin provides metrics that should be aggregated together with an operation other than being `sum`-ed, | ||
you will need to set the `$columnAggregationOps` property: | ||
|
||
``` | ||
class MyRecordBuilder extends RecordBuilder | ||
{ | ||
public function __construct() | ||
{ | ||
parent::__construct(); | ||
|
||
// ... | ||
|
||
$this->columnAggregationOps = [ | ||
'my_max_metric' => 'max', | ||
'my_min_metric' => 'min', | ||
'my_other_metric' => function ($thisValue, $otherValue, $thisRow, $otherRow) { | ||
// custom aggregation logic here | ||
}, | ||
]; | ||
} | ||
} | ||
``` | ||
|
||
Note that each of these settings can also be overridden for specific records by setting the relevant property | ||
on `Record` instances in your `getRecordMetadata()` method. | ||
|
||
### Step five: if your RecordBuilder is parameterized, implement the relevant event | ||
|
||
If your `RecordBuilder` is not parameterized then there's nothing else to do. You're done and Matomo will detect and use it. | ||
|
||
If it is parameterized, then there's still one thing left to do. Matomo will not be able to automatically create a `RecordBuilder` | ||
that takes parameters, so it must be added manually in the `Archiver.addRecordBuilders` event like so: | ||
|
||
``` | ||
class MyPlugin | ||
{ | ||
public function registerEvents() | ||
{ | ||
$hooks = [ | ||
'Archiver.addRecordBuilders' => 'addRecordBuilders', | ||
]; | ||
return $hooks; | ||
} | ||
|
||
public function addRecordBuilders(array &$recordBuilders): void | ||
{ | ||
$idSite = \Piwik\Request::fromRequest()->getIntegerParameter('idSite', 0); | ||
if (!$idSite) { | ||
return; | ||
} | ||
|
||
$entities = StaticContainer::get(MyEntityDao::class)->getAllEntitiesForSite($idSite); | ||
foreach ($entities as $entity) { | ||
$recordBuilders[] = new MyRecordBuilder($entity); | ||
} | ||
} | ||
} | ||
AltamashShaikh marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
Here we create a `RecordBuilder` instance for every entity our plugin manages. | ||
|
||
--- | ||
|
||
And that's it, your `RecordBuilder` is done. | ||
|
||
## Advanced | ||
|
||
### Overriding non-day period aggregation | ||
|
||
Archiving for non-day periods is handled by the `buildForNonDayPeriod()` method, which | ||
will use record metadata to query and aggregate records for the requested period's subperiods. | ||
|
||
Normally, when creating a `RecordBuilder`, you will not need to interact with it. But, in | ||
some rare cases, the default behavior of aggregating subperiods will not be enough. | ||
|
||
In this case, it is perfectly acceptable to override the `buildForNonDayPeriod()` method | ||
and implement your own logic. | ||
|
||
If doing so, keep the following in mind: | ||
|
||
* when querying for records of subperiods, do not query fetch all of them in memory at once. | ||
Record data can take up a significant amount of memory, and querying all the data at once here | ||
can cause out of memory errors for the archiving process. Instead, use a method like | ||
`Archive::querySingleBlob()` which uses a cursor. | ||
|
||
* insert blob records via the `RecordBuilder::insertBlobRecord()` method. For numeric records, | ||
use `ArchiveProcessor::insertNumericRecords()`. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still unstable or is this now the default and we can remove this warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is up to the core team. feel free to change it yourself when your team decides the answer to that.