Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor examples repo #6

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ __pycache__/
*.py[cod]
*$py.class

_datasets/*

# C extensions
*.so

Expand Down
107 changes: 2 additions & 105 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,8 @@
</br>
<h1 align="center">Examples for <a href="https://github.com/activeloopai/Hub">Hub</a> - Dataset Format for AI
</h1>


A repository showcasing examples of using [Hub](https://github.com/pytorch/pytorch)
- [Uploading Dataset Places365](datasets/places365)

Note: This repository only contains examples for using [hub](https://github.com/activeloopai/Hub) (no hub source code).


### Colab Tutorials
Expand All @@ -20,106 +18,5 @@ A repository showcasing examples of using [Hub](https://github.com/pytorch/pytor
| Training an Image Classification Model in PyTorch | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/activeloopai/examples/blob/main/colabs/Training_an_Image_Classification_Model_in_PyTorch.ipynb) |



## Getting Started with Hub 🚀


### Installation
Hub is written in 100% python and can be quickly installed using pip.
```sh
pip3 install hub
```


### Creating Datasets

A hub dataset can be created in various locations (Storage providers). This is how the paths for each of them would look like:

| Storage provider | Example path |
| ---------------- | ----------------------------- |
| Hub cloud | hub://user_name/dataset_name |
| AWS S3 | s3://bucket_name/dataset_name |
| GCP | gcp://bucket_name/dataset_name|
| Local storage | path to local directory |
| In-memory | mem://dataset_name |



Let's create a dataset in the Hub cloud. Create a new account with Hub from the terminal using `activeloop register` if you haven't already. You will be asked for a user name, email id and passowrd. The user name you enter here will be used in the dataset path.

```sh
$ activeloop register
Enter your details. Your password must be atleast 6 characters long.
Username:
Email:
Password:
```

Initialize an empty dataset in the hub cloud:

```python
import hub

ds = hub.empty("hub://<USERNAME>/test-dataset")
```

Next, create a tensor to hold images in the dataset we just initialized:

```python
images = ds.create_tensor("images", htype="image", sample_compression="jpg")
```

Assuming you have a list of image file paths, lets upload them to the dataset:

```python
image_paths = ...
with ds:
for image_path in image_paths:
image = hub.read(image_path)
ds.images.append(image)
```

Alternatively, you can also upload numpy arrays. Since the `images` tensor was created with `sample_compression="jpg"`, the arrays will be compressed with jpeg compression.


```python
import numpy as np

with ds:
for _ in range(1000): # 1000 random images
radnom_image = np.random.randint(0, 256, (100, 100, 3)) # 100x100 image with 3 channels
ds.images.append(image)
```



### Loading Datasets


You can load the dataset you just created with a single line of code:

```python
import hub

ds = hub.load("hub://<USERNAME>/test-dataset")
```

You can also access other publicly available hub datasets, not just the ones you created. Here is how you would load the [Objectron Bikes Dataset](https://github.com/google-research-datasets/Objectron):

```python
import hub

ds = hub.load('hub://activeloop/objectron_bike_train')
```

To get the first image in the Objectron Bikes dataset in numpy format:


```python
image_arr = ds.image[0].numpy()
```



## Documentation
Getting started guides, examples, tutorials, API reference, and other usage information can be found on our [documentation page](http://docs.activeloop.ai/?utm_source=github&utm_medium=repo&utm_campaign=readme).
Loading