Skip to content

Commit

Permalink
DOC: Add storage backend, fix broken links, update API reference (#816)
Browse files Browse the repository at this point in the history
  • Loading branch information
luweizheng authored Sep 30, 2024
1 parent 050da89 commit 92fdfc5
Show file tree
Hide file tree
Showing 32 changed files with 206 additions and 137 deletions.
4 changes: 1 addition & 3 deletions .github/ISSUE_TEMPLATE/other.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,4 @@ assignees: ''

---

Note that the issue tracker is NOT the place for general support. For
discussions about development, questions about usage, or any general questions,
contact us on https://discuss.xorbits.io/.
Note that the issue tracker is NOT the place for general support.
10 changes: 6 additions & 4 deletions .github/workflows/python.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,8 @@ jobs:
- { os: ubuntu-latest, module: learn, python-version: 3.9 }
- { os: ubuntu-latest, module: mars-core, python-version: 3.9 }
- { os: ubuntu-20.04, module: hadoop, python-version: 3.9 }
- { os: ubuntu-latest, module: vineyard, python-version: 3.9 }
- { os: ubuntu-latest, module: external-storage, python-version: 3.9 }
- { os: ubuntu-latest, module: vineyard, python-version: 3.11 }
- { os: ubuntu-latest, module: external-storage, python-version: 3.11 }
# always test compatibility with the latest version
# - { os: ubuntu-latest, module: compatibility, python-version: 3.9 }
- { os: ubuntu-latest, module: doc-build, python-version: 3.9 }
Expand Down Expand Up @@ -131,7 +131,9 @@ jobs:
- name: Install ucx dependencies
if: ${{ (matrix.module != 'gpu') && (matrix.os == 'ubuntu-latest') && (matrix.module != 'doc-build') }}
run: |
conda install -c conda-forge -c rapidsai ucx-proc=*=cpu ucx ucx-py
# ucx-py move to ucxx and ucxx-cu12 can be run on CPU
# conda install -c conda-forge -c rapidsai ucx-proc=*=cpu ucx ucx-py
pip install ucxx-cu12
- name: Install libomp (macOS)
if: ${{ matrix.os == 'macos-latest' || matrix.os == 'macos-13' }}
run: brew install libomp
Expand Down Expand Up @@ -275,7 +277,7 @@ jobs:
run: |
source activate ${{ env.CONDA_ENV }}
pip install --extra-index-url=https://pypi.nvidia.com cudf-cu12==24.8.*
pip install ucx-py-cu12 cython "numpy>=1.14.0,<2.0.0" cloudpickle scikit-learn \
pip install ucxx-cu12 cython "numpy>=1.14.0,<2.0.0" cloudpickle scikit-learn \
pyyaml psutil tornado sqlalchemy defusedxml tqdm uvloop coverage \
pytest pytest-cov pytest-timeout pytest-forked pytest-asyncio pytest-mock
pip install -U xoscar
Expand Down
45 changes: 22 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
<div align="center">
<img width="77%" alt="" src="https://doc.xorbits.io/en/latest/_static/xorbits.svg"><br>
<img width="77%" alt="" src="https://xorbits.readthedocs.io/en/latest/_static/xorbits.svg"><br>
</div>

[![PyPI Latest Release](https://img.shields.io/pypi/v/xorbits.svg?style=for-the-badge)](https://pypi.org/project/xorbits/)
[![License](https://img.shields.io/pypi/l/xorbits.svg?style=for-the-badge)](https://github.com/xorbitsai/xorbits/blob/main/LICENSE)
[![Coverage](https://img.shields.io/codecov/c/github/xorbitsai/xorbits?style=for-the-badge)](https://codecov.io/gh/xorbitsai/xorbits)
[![Build Status](https://img.shields.io/github/actions/workflow/status/xorbitsai/xorbits/python.yaml?branch=main&style=for-the-badge&label=GITHUB%20ACTIONS&logo=github)](https://actions-badge.atrox.dev/xorbitsai/xorbits/goto?ref=main)
[![Doc](https://readthedocs.org/projects/xorbits/badge/?version=latest&style=for-the-badge)](https://doc.xorbits.io)
[![Doc](https://readthedocs.org/projects/xorbits/badge/?version=latest&style=for-the-badge)](https://xorbits.readthedocs.io/)
[![Slack](https://img.shields.io/badge/join_Slack-781FF5.svg?logo=slack&style=for-the-badge)](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg)
[![Twitter](https://img.shields.io/twitter/follow/xorbitsio?logo=twitter&style=for-the-badge)](https://twitter.com/xorbitsio)

Expand All @@ -16,7 +16,7 @@ Xorbits is an open-source computing framework that makes it easy to scale data s
from data preprocessing to tuning, training, and model serving. Xorbits can leverage multi-cores or GPUs to accelerate
computation on a single machine or scale out up to thousands of machines to support processing terabytes of data and training or serving large models.

Xorbits provides a suite of best-in-class [libraries](https://doc.xorbits.io/en/latest/libraries/index.html) for data
Xorbits provides a suite of best-in-class [libraries](https://xorbits.readthedocs.io/en/latest/libraries/index.html) for data
scientists and machine learning practitioners. Xorbits provides the capability to scale tasks without the necessity for
extensive knowledge of infrastructure.

Expand All @@ -40,15 +40,15 @@ You can keep using your existing notebooks and still enjoy a significant speed b

### Process large datasets that pandas can't

Xorbits can [leverage all of your computational cores](https://doc.xorbits.io/en/latest/getting_started/why_xorbits/pandas.html#boosting-performance-and-scalability-with-xorbits).
It is especially beneficial for handling [larger datasets](https://doc.xorbits.io/en/latest/getting_started/why_xorbits/pandas.html#overcoming-memory-limitations-in-large-datasets-with-xorbits),
Xorbits can [leverage all of your computational cores](https://xorbits.readthedocs.io/en/latest/getting_started/why_xorbits/pandas.html#boosting-performance-and-scalability-with-xorbits).
It is especially beneficial for handling [larger datasets](https://xorbits.readthedocs.io/en/latest/getting_started/why_xorbits/pandas.html#overcoming-memory-limitations-in-large-datasets-with-xorbits),
where pandas may slow down or run out of memory.

### Lightning-fast speed

According to our benchmark tests, Xorbits surpasses other popular pandas API frameworks in speed and scalability.
See our [performance comparison](https://doc.xorbits.io/en/latest/getting_started/why_xorbits/comparisons.html#performance-comparison)
and [explanation](https://doc.xorbits.io/en/latest/getting_started/why_xorbits/fast.html).
See our [performance comparison](https://xorbits.readthedocs.io/en/latest/getting_started/why_xorbits/comparisons.html#performance-comparison)
, [explanation](https://xorbits.readthedocs.io/en/latest/getting_started/why_xorbits/fast.html) and [research paper](https://arxiv.org/abs/2401.00865).

### Leverage the Python ecosystem with native integrations

Expand All @@ -66,10 +66,9 @@ pip install xorbits
```

## Other resources
* [Documentation](https://doc.xorbits.io)
* [Examples and Tutorials](https://doc.xorbits.io/en/latest/getting_started/examples.html)
* [Performance Benchmarks](https://xorbits.io/benchmark)
* [Development Guide](https://doc.xorbits.io/en/latest/development/index.html)
* [Documentation](https://xorbits.readthedocs.io)
* [Performance Benchmarks](https://xorbits.readthedocs.io/en/latest/getting_started/why_xorbits/comparisons.html#performance-comparison)
* [Development Guide](https://xorbits.readthedocs.io/en/latest/development/index.html)
* [Research Paper on Xorbits' Internals](https://arxiv.org/abs/2401.00865)

## License
Expand Down Expand Up @@ -97,24 +96,24 @@ with other upcoming ones we will propose. Stay tuned!

| Platform | Purpose |
|-----------------------------------------------------------------------------------------------|----------------------------------------------------|
| [Discourse Forum](https://discuss.xorbits.io) | Asking usage questions and discussing development. |
| [Github Issues](https://github.com/xorbitsai/xorbits/issues) | Reporting bugs and filing feature requests. |
| [Slack](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg) | Collaborating with other Xorbits users. |
| [StackOverflow](https://stackoverflow.com/questions/tagged/xorbits) | Asking questions about how to use Xorbits. |
| [Twitter](https://twitter.com/xorbitsio) | Staying up-to-date on new features. |
| [Slack](https://join.slack.com/t/xorbitsio/shared_invite/zt-1o3z9ucdh-RbfhbPVpx7prOVdM1CAuxg) | Collaborating with other Xorbits users. |

## Citing Xorbits

If Xorbits could help you, please cite our paper which is accepted by ICDE 2024 Industry and Applications Track:
If Xorbits could help you, please cite our paper using the following metadata:

```
@article{lu2023xorbits,
title={Xorbits: Automating Operator Tiling for Distributed Data Science},
author={Weizheng Lu and Kaisheng He and Xuye Qin and Chengjie Li and Zhong Wang and Tao Yuan and Feng Zhang and Yueguo Chen and Xiaoyong Du},
year={2023},
archivePrefix={arXiv},
url={https://doi.org/10.48550/arXiv.2401.00865},
eprinttype={arXiv},
eprint={2401.00865},
@inproceedings{lu2024Xorbits,
title = {Xorbits: Automating Operator Tiling for Distributed Data Science},
shorttitle = {Xorbits},
booktitle = {2024 {{IEEE}} 40th {{International Conference}} on {{Data Engineering}} ({{ICDE}})},
author = {Lu, Weizheng and He, Kaisheng and Qin, Xuye and Li, Chengjie and Wang, Zhong and Yuan, Tao and Liao, Xia and Zhang, Feng and Chen, Yueguo and Du, Xiaoyong},
year = {2024},
month = may,
pages = {5211--5223},
issn = {2375-026X},
doi = {10.1109/ICDE60146.2024.00392},
}
```
2 changes: 1 addition & 1 deletion doc/source/development/contributing_documentation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -164,4 +164,4 @@ Building main branch documentation

When pull requests are merged into the Xorbits ``main`` branch, the main parts of
the documentation are also built by readthedocs. These docs are then hosted `here
<https://doc.xorbits.io/en/latest/>`__.
<https://xorbits.readthedocs.io/>`__.
3 changes: 0 additions & 3 deletions doc/source/development/contributing_environment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,6 @@ If no C compiler is installed, or you wish to upgrade, or you're using a differe
Linux distribution, consult your favorite search engine for compiler installation/update
instructions.

Let us know if you have any difficulties by opening an issue or reaching out on our contributor
community, join slack in `Community <https://xorbits.io/community>`_.

Step 2: install Node.js
-----------------------

Expand Down
7 changes: 4 additions & 3 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,7 @@ the capability to scale tasks without the necessity for extensive knowledge of i

- :ref:`Xorbits Train <xorbits_train_index>`: Train your own state-of-the-art models for ML and DL frameworks such as PyTorch, XGBoost, etc.

- :ref:`Xorbits Tune <xorbits_tune_index>`: Finetune your models by running state of the art algorithms such as PEFT.

- :ref:`Xorbits Inference <xorbits_inference_index>`: Scalable serving to deploy state-of-the-art models. Integrate with the most popular deep learning libraries, like PyTorch, ggml, etc.
- `Xorbits Inference <https://github.com/xorbitsai/inference>`_: Scalable serving to deploy state-of-the-art models. Integrate with the most popular deep learning libraries, like PyTorch, ggml, etc.

Xorbits features a familiar Python API that supports a variety of libraries, including pandas, NumPy, scikit-learn, PyTorch,
XGBoost, Xarray, etc. With a simple modification of just one line of code, your pandas workflow can be seamlessly scaled using
Expand Down Expand Up @@ -75,3 +73,6 @@ Getting involved
user_guide/index
reference/index
development/index


.. _xorbits_inference_index: https://github.com/xorbitsai/inference
13 changes: 1 addition & 12 deletions doc/source/libraries/xorbits_data/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,9 @@ Xorbits Data

Xorbits Data is a scalable data processing library for ML and DS workloads.

Xorbits Data can leverage multi cores or GPUs to accelerate computation on a single machine,
Xorbits Data can leverage multi cores or GPUs to accelerate computation on a single machine,
or scale out up to thousands of machines to support processing terabytes of data.

All APIs in Xorbits Data library implemented or planned include:

======================================= =========================================================
API Implemented version or plan
======================================= =========================================================
:ref:`xorbits.pandas <pandas_api>` v0.1.0
:ref:`xorbits.numpy <numpy_api>` v0.1.0
:ref:`xorbits.xarray` Planned in the future
======================================= =========================================================



.. toctree::
:maxdepth: 2
Expand Down
4 changes: 2 additions & 2 deletions doc/source/libraries/xorbits_data/numpy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,8 +118,8 @@ elements that we want, instead of the step::
>>> f = np.sin(x)


However, the way of loading and saving arrays is quite different. Please see :ref:`io <routines.io>` for
detailed info. Here's an example of creating and loading an HDF5 file::
However, the way of loading and saving arrays is quite different. Here's an example of creating
and loading an HDF5 file::

>>> import h5py # if you don't have h5py installed, run "pip install h5py" first
>>> arr = np.random.randn(1000)
Expand Down
12 changes: 0 additions & 12 deletions doc/source/libraries/xorbits_inference/index.rst

This file was deleted.

13 changes: 0 additions & 13 deletions doc/source/libraries/xorbits_train/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,6 @@ Xorbits Train

Xorbits Train scales model training for popular ML and DL frameworks such as PyTorch, XGBoost, scikit-learn, etc.


All APIs in Xorbits Train library implemented or planned include:

======================================= =========================================================
API Implemented version or plan
======================================= =========================================================
:ref:`xorbits.xgboost <xgboost_api>` v0.4.0
:ref:`xorbits.lightgbm <lightgbm_api>` v0.4.0
``xorbits.sklearn`` Planned in the near future
``xorbits.scipy`` Planned in the future
``xorbits.statsmodels`` Planned in the future
======================================= =========================================================

.. toctree::
:maxdepth: 2

Expand Down
11 changes: 0 additions & 11 deletions doc/source/libraries/xorbits_tune/index.rst

This file was deleted.

2 changes: 0 additions & 2 deletions doc/source/reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,4 @@ API Reference

xorbits/index
datasets/index
xgboost/index
lightgbm/index
experimental/index
2 changes: 1 addition & 1 deletion doc/source/reference/lightgbm/learning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Learning API
.. currentmodule:: xorbits.lightgbm

.. autosummary::
:toctree: _generate/
:toctree: generated/

predict
predict_proba
12 changes: 6 additions & 6 deletions doc/source/reference/lightgbm/sklearn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@ LGBMClassifier
Constructor
~~~~~~~~~~~
.. autosummary::
:toctree: _generate/
:toctree: generated/

LGBMClassifier


Attributes
~~~~~~~~~~
.. autosummary::
:toctree: _generate/
:toctree: generated/

LGBMClassifier.fit
LGBMClassifier.get_params
Expand All @@ -37,15 +37,15 @@ LGBMRegressor
Constructor
~~~~~~~~~~~
.. autosummary::
:toctree: _generate/
:toctree: generated/

LGBMRegressor


Attributes
~~~~~~~~~~
.. autosummary::
:toctree: _generate/
:toctree: generated/

LGBMRegressor.fit
LGBMRegressor.get_params
Expand All @@ -63,15 +63,15 @@ LGBMRanker
Constructor
~~~~~~~~~~~
.. autosummary::
:toctree: _generate/
:toctree: generated/

LGBMRanker


Attributes
~~~~~~~~~~
.. autosummary::
:toctree: _generate/
:toctree: generated/

LGBMRanker.fit
LGBMRanker.get_params
Expand Down
1 change: 1 addition & 0 deletions doc/source/user_guide/best_practices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ practices, and helps users solve some common problems.
:maxdepth: 2

loading_data
storage_backend
2 changes: 1 addition & 1 deletion doc/source/user_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,6 @@ Further information on any specific method can be obtained in the
:maxdepth: 2

deferred_execution
best_practices
deployment
best_practices
logging
4 changes: 2 additions & 2 deletions doc/source/user_guide/loading_data.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
.. _loading_data:

==============
Loading data
Loading Data
==============

Recommended data formats
Recommended Data Formats
-------------------------

Xorbits supports reading data from various data sources, including csv, parquet, sql, xml and other data formats,
Expand Down
Loading

0 comments on commit 92fdfc5

Please sign in to comment.