Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't see shared data from hypercored #90

Closed
at88mph opened this issue Oct 16, 2017 · 9 comments
Closed

Can't see shared data from hypercored #90

at88mph opened this issue Oct 16, 2017 · 9 comments

Comments

@at88mph
Copy link

at88mph commented Oct 16, 2017

I'm using dat 13.9.0 with hypercored on Debian 9.2 and OS X.

If I run dat create in my /data directory, followed by dat share, I can see the data on my local network. However, if instead of dat share I run hypercored, the same dat clone ... command that works with the dat share no longer works.

I have questions about how it's supposed to work:

  • Am I supposed to clone the same ID, or use the Archiver key as printed out after starting hypercored?
  • What relevance does the Swarm port have? Am I supposed to connect to it somehow?
  • For dats to work, do we need bi-directional access? Meaning, in order to have a share on a public machine, I can only clone to another public machine?

Thanks!
Dustin

@joehand
Copy link
Collaborator

joehand commented Oct 17, 2017

hypercored functions mostly as a backup or server mirroring tool, it can't create dats or read existing dats. You have to always create a dat first then copy it to hypercored. So the flow would be:

  • dat share /data -> prints a key, lets call it dat://my-dat-key
  • add dat://my-dat-key to hypercored ./feeds file (while dat share is still running)
  • hypercored will duplicate data to the ./archiver directory
  • stop running dat share when uploading is stopped (not very easy for this use case)
  • hypercored will make dat://my-dat-key available over the network, without needing to run dat share again
  • to update your "backup", run dat share again in /data until files are backed up by hypercored

Hopefully I am understanding your question correctly! Using Dat and hypercored on the same machine is not great right now, we'd like to build this into the main CLI so it is great to see this use case. We mostly use hypercored on servers where we want full backups and dats to be always available whereas dat share only makes a dat available while the command is running.

On your other questions:

  • Am I supposed to clone the same ID, or use the Archiver key as printed out after starting hypercored?

You should use the same ID, the archiver key is a hypercore feed that can only be used by other hypercored or related tools.

  • What relevance does the Swarm port have? Am I supposed to connect to it somehow?

The ports shouldn't matter, Dat should be able to connect to the peers - they advertise whatever port they are available on.

  • For dats to work, do we need bi-directional access? Meaning, in order to have a share on a public machine, I can only clone to another public machine?

Not necessarily. We have some tools to get around firewalls, using hole-punching. But Dat works best if at least one of the machines is public. Bi-directional hole punching is more hit or miss depending on the network.

You can run dat doctor to see if you can connect to our public peer and then use the key printed to see if your two computers can connect to each other.

@at88mph
Copy link
Author

at88mph commented Oct 17, 2017

That clarifies it, thank you for the detailed reply @joehand.

I have a Dat share (not hypercored, just dat share) on a VM on OpenStack with a Public IP, which runs happily. My desktop cannot see it with dat doctor or dat clone, but other Public IP machines can, which leads me to believe that it needs to communicate back and forth like that and both need to be able to see each other.

My full use case will look like this:

  • Create Docker image for a Dat share. Since the Docker container processes run in the foreground, a simple dat share should suffice. For clarification, is the dat create only necessary when you want the dat.json created for its metadata?
  • Run a Docker container for each share.
  • Enable replication amongst the shares.
  • Have the ability to download the closest proximity file as we will be delivering data from multiple sites globally. I like that the BitTorrent libraries are in use here. Is the system smart enough to detect what file I'm looking for and direct me to the closest one?

Many thanks for the reply.

@joehand
Copy link
Collaborator

joehand commented Oct 17, 2017

My desktop cannot see it with dat doctor or dat clone, but other Public IP machines can, which leads me to believe that it needs to communicate back and forth like that and both need to be able to see each other.

Can you connect to the public peer in the first part of the test (from desktop)? It should look like this:

❯ dat doctor
[info] Testing connection to public peer
[info] UTP ONLY - success!
[info] TCP ONLY - success!

If you are running inside Docker on your desktop, that is likely the issue. We haven't been able to figure out how to get around Docker's NAT without switching to host networking mode.

For clarification, is the dat create only necessary when you want the dat.json created for its metadata?

Yep, exactly.

Have the ability to download the closest proximity file as we will be delivering data from multiple sites globally. I like that the BitTorrent libraries are in use here. Is the system smart enough to detect what file I'm looking for and direct me to the closest one?

There is no network prioritization yet. But it should download more data from whatever is the fastest peer, just by the nature of how the requests work.

@at88mph
Copy link
Author

at88mph commented Oct 18, 2017

Thanks, @joehand. I've gone down the dat doctor route, and I'm convinced that it's our lousy network here. Thank you for explaining how hypercored works, too.

I've been running Docker with --net host without issue on hosts outside of our network. Running with --net host isn't a big issue for us.

Also, because we have a requirement to serve files from the fastest peer, how is that set up? If there are multiple dats registered in the same place and share the same path, will the download be distributed amongst them?

Thank you again! You've been extremely helpful.

@joehand
Copy link
Collaborator

joehand commented Oct 18, 2017

I've gone down the dat doctor route, and I'm convinced that it's our lousy network here.

Not ideal, but glad you figured it out. Feel free to run p2p test, it may give us some more data on your network:

npm install -g p2p-test
p2p-test [optional-short-message-description-the-network]

Also, because we have a requirement to serve files from the fastest peer, how is that set up? If there are multiple dats registered in the same place and share the same path, will the download be distributed amongst them?

Not quite sure I understand this question pre-coffee, but I'll give it a shot.

Dat networks are only created around a specific key. So peers never connect to other peers unless they are sharing/downloading the same dat. If dats share a path, then the downloads will be completely unaware of each other (it may cause some weird problems too but it should work eventually - we'll have something that locks a given path for writes while Dat is running eventually). They will write to the same metadata, in the .dat folder. Which says what is downloaded, but it won't be coordinated.

Prioritizing the fastest peer for a single key is definitely a good feature, feel free to open an issue in datproject/dat. But prioritization across keys will need to be something more custom.

@okdistribute
Copy link
Contributor

okdistribute commented Oct 19, 2017

About fastest peers, if you have two sources online, the downloader will connect to the two sources and begin downloading from them at the same time. Since each individual block downloaded is quite small, the downloader will see that the blocks have downloaded mostly from the faster source.

@at88mph
Copy link
Author

at88mph commented Oct 19, 2017

Thanks @Karissa , that's exactly what my use case is. We will have a West Coast site and an East Coast site in Canada, with data replicated across them. If a user requests a file, how does Dat know to pull from those multiple sources?

Thank you!

@joehand
Copy link
Collaborator

joehand commented Oct 19, 2017

If a user requests a file, how does Dat know to pull from those multiple sources?

Dat automatically connects to all the sources sharing a specific key (similar to bittorrent). So if both your west cost + east coast are sharing dat://xyz and you dat clone dat://xyz, it'll connect to both sources.

@at88mph
Copy link
Author

at88mph commented Oct 19, 2017

@joehand Excellent! Thank you both for helping. I thought one had to run multiple dats, but in reality one is just a clone of another, which gives me access to both sources. Thank you!

Closing this issue.

@at88mph at88mph closed this as completed Oct 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants