Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to download the files #5

Open
salawdeh opened this issue Sep 30, 2016 · 15 comments
Open

Unable to download the files #5

salawdeh opened this issue Sep 30, 2016 · 15 comments

Comments

@salawdeh
Copy link

Hello,

Thank you for sharing these data sets.
I was unsuccessfully trying to download these data sets using my AWS credentials. I always receive "An error occurred (403) when calling the HeadObject operation: Forbidden".
I am not sure if something wrong with my credentials or the bucket has some access restrictions.

Can you please help me please with that.

Thank you very much,

@tanumitra
Copy link
Contributor

You can use AWS CLI or the easier s3cmd to download the data. First configure the tool to use your AWS account credentials and then use the appropriate command to download the s3 links listed below.

To download a data file (e.g., stream_tweets_byTimestamp.data) to the current folder, use the following command:

aws s3api get-object --request-payer requester --bucket credbank --key stream_tweets_byTimestamp.data stream_tweets_byTimestamp.data

OR

s3cmd get --requester-pays s3://credbank/stream_tweets_byTimestamp.data

Note that you will be charged a small data transfer fee by AWS, which can be offset by their free tier.

@salawdeh
Copy link
Author

Thanks @TanuM,

It works now! :)

Regards,

@marcodegra
Copy link

marcodegra commented Jan 4, 2018

Hi @salawdeh, have you finally downloaded the data?
If yes, could you please let me know how much is the dataset's size? So I can understand if I exceed my AWS free tier limits or not.

Best Regards

@salawdeh
Copy link
Author

salawdeh commented Jan 8, 2018

Hi @marcodegra, I think it was something around 21GB. Regards,

@HamdaSlimi
Copy link

HamdaSlimi commented Jan 16, 2018

credbank2
I could not download the data. I created the AWS account and associated it with my credit card. When i introduce the command line given by tanumitra. After about 20 minutes i receive the error shown in the picture above. Please if anyone could help i need this DataSet for my thesis.

@tanumitra
Copy link
Contributor

Hey @HamdaSlimi,
There seems to be a connectivity issue to AWS's us-west region from your location. As per aws/aws-cli#401 issue, you might want to retry a couple of times or call the aws command with --cli-read-timeout 0 option. Let us know if this solves the issue.

@HamdaSlimi
Copy link

Hey @tanumitra
I'm sorry i took so long to answer you. I attempted to resolve the issue using this parameter --cli-read-timeout 0. However, i encountered the following error [Errno 10054] An existing connection was forcibly closed by the remote host. When I researched it I found that passing the parameter --cli-read-timeout 0 results in this response from the server. The two sides of the connection disagreeing over whether the connection timed out or not during a keepalive. (Your code tries to reuse the connection just as the server is closing it because it has been idle for too long.)
As for the number of attempts, since my last email I attempted to retrieve the dataset nearly 10 times and each time it takes about 30 min before showing the error 10054 once more. What I would like to know is, is it possible that AWS may have restrictions concerning “region=us-east-2”.

@HamdaSlimi
Copy link

I was able to downloaded the DataSet FINALLY

@the-lost-explorer
Copy link

the-lost-explorer commented Sep 22, 2018

I was able to downloaded the DataSet FINALLY

How long does it take to download on a decent connection? I heard its 21GB on the above links.

@HamdaSlimi
Copy link

HamdaSlimi commented Sep 22, 2018 via email

@sahar99
Copy link

sahar99 commented Nov 3, 2018

@HamdaSlimi Please can you tell me how you solved the problem? or any sources can help to solve? because I had a similar problem, it takes several hours before stopped and showing that message "Read timeout on endpoint URL: "None" "

@HamdaSlimi
Copy link

HamdaSlimi commented Nov 3, 2018 via email

@jaeseok-huh
Copy link

I switched the region to us-east-2 and worked well.

@Marjan-Hosseini
Copy link

Is this dataset free? Do we need to pay for AWS credential?

@HamdaSlimi
Copy link

HamdaSlimi commented Feb 20, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants