-
Notifications
You must be signed in to change notification settings - Fork 0
UrlId
This class is used to identify external URLs. It is a Uid with the type being UidTypes.URL. Instead of storing the timestamp and fine time it stores the domain ID (2 byte) and the file ID within this host (4 byte). We can therefore store up to 2^1666 k domains and for each domain up to 2^44.3 G files.
Consider following URL http://sub.example.com/some/path?and=variables#
- The domain is "example.com"
- The file is "some/path?and=variables#"
The mapping between UrlIds (long) and the actual URLs (String) are stored in the UrlStore. Additionally the largest file ID for each domain is stored which is used to generate UrlIds of already knonw domains.
Key | Value |
---|---|
UrlId (long) | URL(string) |
UrlId (long) | URL(string) |
domain ID (short) | Max file ID (int) |
"MaxDomainID" | Max domain ID (short) |
To allow quick traversal over all UrlIds of the same domain we map all sub domains to the same domain ID. To avoid conflicts following algorithm is applied during the creation of a new ID with an already known domain:
- Generate the domain ID ignoring the subdomain
- Generate the file ID
- Check if we've already stored the UrlId with the calculated domain and file ID
- If no, store the UrlID and complete URL in the storage
- If yes, compare if the stored URL equals the new one (if the new one has another sub domain it will not) 1. If the URLs are the same, we are done (the URL is already stored) 2. If the URLs are not the same, increment the file ID by 3 (this is a prime number 2^32 is not multiple of. this way you will always hit a space if there is something free). Then go to 3.