Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Community Review of Filedrive Allocator #200

Open
Filedrive-ops opened this issue Oct 24, 2024 · 8 comments
Open

Community Review of Filedrive Allocator #200

Filedrive-ops opened this issue Oct 24, 2024 · 8 comments
Assignees
Labels
Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards Refresh Applications received from existing Allocators for a refresh of DataCap allowance

Comments

@Filedrive-ops
Copy link

filecoin-project/notary-governance#1030
https://github.com/filedrive-team/Filecoin-Plus-Pathway/issues

After finding that the DDO model isn't the primary reason for the low retrieval rate, we have started encouraging and supporting our clients to rollback from the DDO to a non-DDO ordering model.

The potential demand from clients exceeds 20 PiB.

Our company's major business is data services. One of our free tools, go-graphsplit, which splits large datasets into graph slices suitable for making deals on the Filecoin Network, is very popular in the market. Based on feedback from our clients, we plan to change the storing of piece information in separate files to database. This will help our FIL+ clients manage pieces more easily and efficiently.

@Kevin-FF-USA Kevin-FF-USA self-assigned this Oct 24, 2024
@Kevin-FF-USA Kevin-FF-USA added Refresh Applications received from existing Allocators for a refresh of DataCap allowance Awaiting Governance/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Oct 24, 2024
@filecoin-watchdog
Copy link
Collaborator

@Filedrive-ops
Allocator Application
Compliance Report

The allocator did not ask many additional questions about how the data was prepared. It would be worth streamlining the process with these actions. It might be good to refer to this proposal

4.95 PiB granted to clients:

Client Name DC
AsiaTelevision Limited 3.95PiB
WuhanNalai Film and Television Culture Media Co., Ltd. 1PiB

Example 1 - AsiaTelevision Limited #5
This client initially requested 2 PiB of data. According to the allocator's rules, he should have stored 3 replicas. During the talks with the client, the allocator agreed and granted another 2 PiB of data (totalling 4 PiB); however, according to the report, there are still only two replicas.
I am unsure about the duplicates created - the allocator and the client encountered some problems with DDO and retrieval rate, but please provide details about where the duplicates in the report came from.
Overall, the allocator took a thorough approach, asked many questions, checked reports, and declared clear rules that he adhered to.
The client had problems with SPs, but the allocator tried to help him.
KYC/KYB was performed outside the system. Information about this can be found in the comments.
The allocator performs SP checks, and the client himself remembers to send the evidence to the allocator.
Discussions were kept in a thread so they could be reviewed easily.
Retrieval for 5 of 7 SPs is unavailable.

Example 2 - WuhanNalai Film and Television Culture Media Co., Ltd. #3
This client declared 4 replicas; however, according to the report, there are still only two replicas, whereas all the DC was granted. The first replica has almost 900TiB of data. How come?

Overall, the allocator took a thorough approach, asked many questions, checked reports, and declared clear rules that he adhered to.

KYC/KYB was performed outside the system. Information about this can be found in the comments.
The allocator performs SP checks, and the client himself remembers to send the evidence to the allocator.
Discussions were kept in a thread so they could be reviewed easily.
All 5 SPs have retrieval at 0% or unavailable.

This dataset was stored multiple times already:
ipfsforcezuofu/ipfsforce-allocator#8
ipfsforcezuofu/ipfsforce-allocator#2
VenusOfficial/Pathway-VFDA#40
VenusOfficial/Pathway-VFDA#22
https://github.com/search?q=repo%3Afilecoin-project%2Ffilecoin-plus-large-datasets+wuhan+nalai&type=issues

I realise this is a private dataset, however, it would be good to clarify with the client why so many copies were essential to store.

@filecoin-watchdog
Copy link
Collaborator

@Filedrive-ops Hi there. I wanted to remind you that the review was done.

@Kevin-FF-USA Kevin-FF-USA added Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. and removed Awaiting Governance/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Oct 29, 2024
@Filedrive-ops
Copy link
Author

@filecoin-watchdog

Thank you for your positive feedback on our practices, such as KYB/KYC and frequent communication with our clients. Here are some explanations for your questions:
Q1:
The allocator did not ask many additional questions about how the data was prepared. It would be worth streamlining the process with these actions. It might be good to refer to this proposal
A1:
After consulting with clients, the common process for preparing the data is as follows:
1.The SP downloads data from an IP specified by the client.
2.The SP uses the tool go-graphsplit to split the data into pieces. This tool creates a manifest.csv file that saves the mapping with the graph slice name, payload CID, Piece CID, and the inner file structure.
3.Based on the manifest.csv file shared by the SP, the client can validate or retrieve their stored files.

Q2:
This client initially requested 2 PiB of data. According to the allocator's rules, he should have stored 3 replicas. During the talks with the client, the allocator agreed and granted another 2 PiB of data (totalling 4 PiB); however, according to the report, there are still only two replicas.
A2:
After discussions with clients, we identified two major reasons for this situation. The first reason is that the client uses Filecoin as a backup for their archive files, and it appears that two backups are sufficient to meet their SLA requirements. The second reason is that the client understands our intention to serve more clients with a limited DC quota.

Q3:
I am unsure about the duplicates created - the allocator and the client encountered some problems with DDO and retrieval rate, but please provide details about where the duplicates in the report came from.
A3:
I'm not sure if you're questioning how DDO can't support retrieval while it can generate duplication. From my understanding, the retrieval issue is a common problem with the DDO ordering model, and Spark is working on a fix. However, this issue does not affect the ordering tool's ability to generate duplication. For instance, Droplet can generate multiple copies with different SPs as input parameters for the same pieces.

Q4:
Retrieval for 5 of 7 SPs is unavailable.
A4:
We're collaborating with our clients to enhance the retrieval rate. Improved performance can be expected in the upcoming round of DC assignments. Below are examples of our communications with clients.

image

Q5:
This client declared 4 replicas; however, according to the report, there are still only two replicas, whereas all the DC was granted. The first replica has almost 900TiB of data. How come?
A5:
Similar to A2.

Q6:
All 5 SPs have retrieval at 0% or unavailable.
A6:
Similar to A4.

Q7:
This dataset was stored multiple times already:
ipfsforcezuofu/ipfsforce-allocator#8
ipfsforcezuofu/ipfsforce-allocator#2
VenusOfficial/Pathway-VFDA#40
VenusOfficial/Pathway-VFDA#22
https://github.com/search?q=repo%3Afilecoin-project%2Ffilecoin-plus-large-datasets+wuhan+nalai&type=issues
I realise this is a private dataset, however, it would be good to clarify with the client why so many copies were essential to store.

A7:
The client mentioned that the duplication copies within the same allocator is due to technical reasons. Additionally, the accumulated DC amount from multiple allocators can support their requirement for handling large datasets in a short period.

@filecoin-watchdog
Copy link
Collaborator

@Filedrive-ops
A3:

I'm not sure if you're questioning how DDO can't support retrieval while it can generate duplication.

I'm referring to this report: https://check.allocator.tech/report/filedrive-team/Filecoin-Plus-Pathway/issues/5/1729858990260.md
The one before this was fine; no duplications occurred. The latest one did not have a good outcome. As you can see, 4 SPs have duplicated data, while 2 of them have around 30-40% duplicated. It's a lot.

@Filedrive-ops
Copy link
Author

@filecoin-watchdog
Thank you for your further explanation. After checking with the client, we learned that only 80T was assigned to each of the two SPs. However, the report indicates that a total of around 130T was sealed. Could you share the relationship between these two amounts? The client claimed that their data should not be duplicated and provided the sealing file for reference. Could you advise on how we should further investigate this case? Thank you.
Following is the screenshot of the communication with client:
image

@Filedrive-ops
Copy link
Author

@filecoin-watchdog

The client has confirmed again that the 80T data has not been sealed previously. We will rely on this assurance and enforce compliance by implementing more frequent checks and monitoring in the upcoming DC assignment.

@filecoin-watchdog
Copy link
Collaborator

@Filedrive-ops, the newest report confirmed all of this, explaining the duplication issue. Yet another issue was revealed—2 SPs sealed more than 20% of data.

⚠️ f03196399 has sealed 29.85% of total datacap.
⚠️ f03196401 has sealed 29.86% of total datacap.

@Kevin-FF-USA Kevin-FF-USA added Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards and removed Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. labels Oct 31, 2024
@Filedrive-ops
Copy link
Author

@filecoin-watchdog
Thank you for bringing this issue to our attention. To ensure compliance, we'll require the client to submit a detailed sealing plan before any future assignments. We can later use the compliance report to verify adherence to the plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards Refresh Applications received from existing Allocators for a refresh of DataCap allowance
Projects
None yet
Development

No branches or pull requests

3 participants