Skip to content

Commit

Permalink
fix readme
Browse files Browse the repository at this point in the history
  • Loading branch information
slavaleleka authored Aug 5, 2024
2 parents 2173dd0 + d74496d commit 52ae1ce
Showing 1 changed file with 81 additions and 48 deletions.
129 changes: 81 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,73 +4,88 @@ This is a companies DB that we use in AdGuard Home and AdGuard DNS.
It is basically the [Whotracks.me](https://github.com/whotracksme/whotracks.me)
database converted to a simple JSON format with some additions from us.

In addition, there's also a file with companies metadata that we use in
AdGuard VPN.
In addition, there's also a file with companies metadata that we use in AdGuard VPN.

- [Workflow](#workflow)
- [Naming of branches and commits](#naming-of-branches-and-commits)
- [Assignment of files](#assignment-of-files)
- [How to add new or rewrite whotracks.me data](#how-to-add-new-or-rewrite-whotracksme-data)
- [How to add a new company or overwrite whotracks.me data](#how-to-add-a-new-company-or-overwrite-whotracksme-data)
- [How to add a new tracker or overwrite whotracks.me data](#how-to-add-a-new-tracker-or-overwrite-whotracksme-data)
- [Tracker categories](#tracker-categories)
- [How to build trackers data](#how-to-build-trackers-data)
- [Company icons](#company-icons)
- [Policy](#policy)
- [Acknowledgements](#acknowledgements)

## Workflow

- create a fork of the repository on GitHub.
- create a branch from actual main branch.
- add a tracker.
- create a Pull Request.
- Create a fork of the repository on GitHub.
- Create a branch from the actual main branch.
- Add a tracker.
- Create a Pull Request.

## Naming of branches and commits

- the branch name format:
`fix/issueNumber_domain`

```markdown
fix/34_showrss.info
fix/34_example.info
```

- the commit message format:
`Fix #issueNumber domain`

```markdown
Fix #34 showrss.info
Fix #34 example.info
```

## Assignment of files

The list of trackers and companies is generated from the database [whotracks.me](http://whotracks.me).
The list of trackers and companies is generated from the database [whotracks.me].

**Trackers**:

- [dist/whotracksme.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/whotracksme.json) - just a copy of [source/whotracksme.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/source/whotracksme.json).
- [dist/trackers.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/trackers.json) - contains information about trackers, obtained by merging the [source/trackers.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/source/trackers.json) with [source/whotracksme.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/whotracksme.json).
- [source/whotracksme.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/source/whotracksme.json) - contains information about trackers, fetched from whotracks.me.
- [source/trackers.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/source/trackers.json) - contains information about trackers, which overwrites or supplements [dist/whotracksme.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/whotracksme.json).
- [dist/whotracksme.json] just a copy of [source/whotracksme.json].
- [dist/trackers.json] contains information about trackers, obtained by merging the [source/trackers.json].
- [source/whotracksme.json] contains information about trackers, fetched from whotracks.me.
- [source/trackers.json] contains information about trackers, which overwrites or supplements [dist/whotracksme.json].

**Companies**:

- [dist/companies.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/companies.json) - contains information about companies, obtained by merging the [source/whotracksme_companies.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/source/whotracksme_companies.json) with [source/companies.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/source/companies.json).
- [source/companies.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/source/companies.json) - contains information about companies, which overwrites or supplements information in [source/whotracksme_companies.json](https://github.com/AdguardTeam/companiesdb/blob/main/source/whotracksme_companies.json).
- [source/whotracksme_companies.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/source/whotracksme_companies.json) - contains information about companies, fetched from whotracks.me.
- [dist/companies.json] contains information about companies,
obtained by merging the [source/whotracksme_companies.json] with [source/companies.json].
- [source/companies.json] contains information about companies,
which overwrites or supplements information in [source/whotracksme_companies.json].
- [source/whotracksme_companies.json] contains information about companies, fetched from whotracks.me.

**VPN Services**:

- [source/vpn_services.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/vpn_services.json) - contains a list of "Services" that can be added to exclusions in AdGuard VPN apps. This file is composed manually and not
built from other sources. New services should be added in alphabetical order.
- [dist/vpn_services.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/vpn_services.json) - just a copy of [source/vpn_services.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/vpn_services.json) with automatically added update time if the service has been added or modified.
- [source/vpn_services.json] contains a list of "Services" that can be added to exclusions in AdGuard VPN apps.
This file is composed manually and not built from other sources.
New services should be added in alphabetical order.
- [dist/vpn_services.json] — just a copy of [source/vpn_services.json] with automatically added update time
if the service has been added or modified.

## How to add new or rewrite whotracks.me data

If you need to add new data or to rewrite [whotracks.me](http://whotracks.me/) data:
If you need to add new data or to rewrite [whotracks.me] data:

- **company** - add to [source/companies.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/source/companies.json)
- **tracker** - add in [source/trackers.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/trackers.json)
- **company** add to [source/companies.json]
- **tracker** add to [source/trackers.json]

> **Warning**
>
> Add companies and tracker names in alphabetical order. Add tracker domains alphabetically **by value.**
### How to add a new company or overwrite whotracks.me data

The data about the company is added to the [source/companies.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/source/companies.json) file into the JSON key with the name that defines **companyId**, which is used when adding trackers:
The data about the company is added to the [source/companies.json] file into the JSON key with the name that defines **companyId**, which is used when adding trackers:

- **name** - the official name of the company, will be displayed in the filter log.
- **websiteUrl** [](https://www.notion.so/companiesdb-87733d1e43294ceb9311e6e60c1663b4) - the address of the company website, also used to define the company icon.
- **description** - company description, not displayed anywhere.
- **name** the official name of the company, will be displayed in the filter log.
- **websiteUrl** the address of the company website, also used to define the company icon.
- **description** company description, not displayed anywhere.

```json
"companyincID": {
Expand All @@ -82,46 +97,50 @@ The data about the company is added to the [source/companies.json](https://raw.g

### How to add a new tracker or overwrite whotracks.me data

The data about the tracker is added to the [source/trackers.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/trackers.json) file into the nested JSON key inside the **trackers** section with the name that defines the **tracker name** of the company, which is used when adding trackers to the **trackerDomains** section:
The data about the tracker is added to the [source/trackers.json] file into the nested JSON key inside the **trackers** section with the name that defines the **tracker name** of the company, which is used when adding trackers to the **trackerDomains** section:

- **name** - tracker name of the company.
- **categoryId** - tracker category.
- **url** - the address of the company tracker.
- **companyId** - company ID, taken from [dist/companies.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/companies.json) or [source/companies.json](https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/source/companies.json)
- **name** tracker name of the company.
- **categoryId** tracker category.
- **url** the address of the company tracker.
- **companyId** company ID, taken from [dist/companies.json] or [source/companies.json]

```json
"trackers": {
"company_trackername": {
{
"trackers": {
"company_tracker_name": {
"name": "Company inc. Analytics",
"categoryId": 6,
"url": "https://analytics.company.org/",
"companyId": "companyincID"
"companyId": "companyIncID"
}
}
}
```

Add tracker domains to the **trackerDomains** section:

- **key** - tracker domain.
- **value** - the **tracker name** of the company.
- **key** tracker domain.
- **value** the **tracker name** of the company (`key` from the **trackers** section).

```json
"trackerDomains": {
"collect.company.org": "company_trackername"
{
"trackerDomains": {
"collect.company.org": "company_tracker_name"
}
}
```

> **Warning**
>
> If **the value does not exist** - enter **null**:
> If **the value does not exist** enter **null**:
```json
"url": null
```

## Tracker categories

| # | Name | Purpose |
| Id | Name | Purpose |
| --- | -------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| 0 | audio_video_player | Enables websites to publish, distribute, and optimize video and audio content |
| 1 | comments | Enables comments sections for articles and product reviews |
Expand All @@ -139,7 +158,7 @@ Add tracker domains to the **trackerDomains** section:
| 13 | email | Includes webmail and email clients |
| 14 | consent | - |
| 15 | telemetry | - |
| 16 | mobile_analytics | Collects and analyzes data related to mobile app usage and performance |
| 101 | mobile_analytics | Collects and analyzes data related to mobile app usage and performance |

## **How to build trackers data**

Expand All @@ -150,19 +169,20 @@ yarn convert

The result is:

- **dist/companies.json** - companies data JSON file. This file contains the companies list from whotracks.me merged with AdGuard companies from **source/companies.json**.
- **dist/trackers.json** - trackers data JSON file. Combined data from two files:
- **dist/companies.json** — companies data JSON file.
This file contains the companies list from whotracks.me merged with AdGuard companies from **source/companies.json**.
- **dist/trackers.json** — trackers data JSON file. Combined data from two files:

- **source/trackers.json**
- **dist/whotracksme.json**.

An additional key is added to the information from AdGuard files:
**"source": "AdGuard"**

- **dist/trackers.csv** - trackers data CSV file. This file is used by the ETL process of AdGuard DNS, be very careful
with changing it's structure.
- **dist/trackers.csv** trackers data CSV file. This file is used by the ETL process of AdGuard DNS, be very careful
with changing its structure.

- **dist/whotrackme.json** - actual **whotrack.me** trackers data json file, compiled from **trackerdb.sql**.
- **dist/whotracksme.json** actual **whotracks.me** trackers data json file, compiled from **trackerdb.sql**.

During the build process, a list of warnings and errors is displayed that should be fixed.

Expand All @@ -182,5 +202,18 @@ Currently, we are avoiding adding personal websites/blogs or services that do no

## Acknowledgements

We would like to thank the team at **whotrack.me** for their work.
Initially, our database was built on top of the **whotrack.me** database, using their extensive data collection. However, we would like to emphasise that our current database is now independent and updated separately from **whotrack.me**.
We would like to thank the team at **whotracks.me** for their work.
Initially, our database was built on top of the **whotracks.me** database, using their extensive data collection.
However, we would like to emphasize that our current database is now independent
and updated separately from **whotracks.me**.

[dist/companies.json]: https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/companies.json
[dist/trackers.json]: https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/trackers.json
[dist/vpn_services.json]: https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/vpn_services.json
[dist/whotracksme.json]: https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/whotracksme.json
[source/companies.json]: https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/source/companies.json
[source/trackers.json]: https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/source/trackers.json
[source/vpn_services.json]: https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/dist/vpn_services.json
[source/whotracksme.json]: https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/source/whotracksme.json
[source/whotracksme_companies.json]: https://raw.githubusercontent.com/AdguardTeam/companiesdb/main/source/whotracksme_companies.json
[whotracks.me]: http://whotracks.me

0 comments on commit 52ae1ce

Please sign in to comment.