Skip to content

Latest commit

 

History

History
522 lines (385 loc) · 21.4 KB

File metadata and controls

522 lines (385 loc) · 21.4 KB

Collaboration through Git

As introduced in the Python Basics outro, git is the predominant tool for decentralized code collaboration.

Platforms like GitHub fill an indispensable role in further easing the adoption and utilization of git and empowering the software communities to prosper.

Like Python, we will not cover how to install git. You can find plenty of information on its official site.

locussdk-collab

We will take this opportunity to initiate our locussdk-collab project to contribute ideas and recipes to help other users of LOCUS SDK/Notebook with similar needs. LOCUS SDK maintainers will also actively incorporate great ideas and recipes into the core project.

Let us begin with something trivial, and along the way, see how to use git to achieve version control and collaborations.

Empty repository

We will use GitHub to host the project. At this point, the repository is empty.

empty

For maintainers, the empty repository page on GitHub typically shows some useful commands that we can run from our local computer to get started. The command-line experience may vary depending on your choice of the operating system, command-line tool, and git client.

First, make a local directory (folder) and populate it with a README.md file and some content in it:

% mkdir -p locussdk-collab  # create the locussdk-collab directory if it doesn't exist
% cd locussdk-collab  # get into the directory
% # create a README.md with some content
% echo "# locussdk-collab\n\nCommunity collaboration on ideas and practical workflows using LOCUS SDK" >> README.md

The .md extension means that README.md interprets as a Markdown file. Its content would look like this:

# locussdk-collab

Community collaboration on ideas and practical workflows using LOCUS SDK

Next, let's git it up:

% git init  # initialize the directory as a local git repository
% git status  # check the status of our repository
On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    README.md

nothing added to commit but untracked files present (use "git add" to track)

The output of git status offers valuable insights and suggestions. Based on the information, we should proceed to add the file to commit:

% git add README.md
% git status
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
    new file:   README.md

We now have code changes -- the addition of the new file README.md, to be committed:

% git commit -m "README - init with project introduction"
[main a61f24e] README - init with project introduction
 1 file changed, 3 insertions(+)
 create mode 100644 README.md

Conceptually speaking, each commit is a collection of file/code changes, along with a mandatory message conveying the objective and some signatures that git automatically generates or associates to it.

Whenever making change commits, try to follow some good conventions to have well-scoped code changes and pair them with a concise and relevant message.

We can examine all the commit history:

% git log
commit a61f24ee507f19a839686b75e166eaf574cfa3a0 (HEAD -> main)
Author: Runzhou Li (woozyking) <[email protected]>
Date:   Thu May 6 17:07:14 2021 -0400

    README - init with project introduction

So far, there is only one commit. As mentioned, other than the commit file/code changes, git automatically includes a commit hash, its author, and a date-time, along with the author's commit message.

Now we have the change committed locally. It is time to push it up to the GitHub hosted remote repository:

% git remote add origin [email protected]:EQWorks/locussdk-collab.git
% git push -u origin main

The first command adds a remote (repository) named origin that points to the GitHub hosted repository we recently created. The second command pushes up our local main branch to the remote branch with a matching name, using the -u option as a shorthand to make the association.

After pushing up the main branch with the first commit, we now have:

first commit

Branch out

Branches are another essential element of git. Through branching, collaborators with write access to the repository can safely experiment and make new commits without impacting the main branch until they are confident to merge.

GitHub takes branching one step further through forking or making a new remote clone of the original repository. Collaborators do not necessarily need to obtain write access to the original repository to make commits.

Through either branching or forking, once the collaborators are confident about the commits, they can suggest the changes to be pushed back into the original repository through pull requests.

Pull requests allow the collaborators and maintainers to discuss and vet the commits' changes, sometimes with the help of automated quality assurance processes, before merging the commits into a designated staple branch (usually main).

Let us make our first contribution to locussdk-collab by adding a simple recipe function to search for matching POI lists by a single name.

Since I have the write access to the repository, we will proceed with git native branching.

The first thing to do is to make a new local branch to introduce the changes:

% git checkout -b recipe/search-avail-poi-lists
Switched to a new branch 'recipe/search-avail-poi-lists'

We use a shorthand to checkout a newly created branch, using the -b option followed by the desired branch name. Depending on the conventions, branch names may or may not carry as much significance as commit messages. However, it is generally favorable to use a succinct and relevant branch name for yourself and other collaborators.

We will proceed to create a new directory within the local repository and develop our recipe Python script in it:

% mkdir -p recipes
% touch recipes/search_poi_lists.py

Contribute

Recall our objective for the recipe:

a simple recipe function to search for matching POI lists by a single name.

After some brief research, the pandas.Series.str.contains() method seems to be appropriate to do the job.

from locussdk import get_avail_poi_lists


def search_poi_lists(name):
    global_lists = get_avail_poi_lists(list_type='global')

    return global_lists[global_lists['name'].str.contains(name)]

When used:

>>> search_poi_lists('pizza')
                name  poi_list_id whitelabelid customerid
67       Bostonpizza           71         None       None
113       Pizzapizza          118         None       None
175  Neworleanspizza          180         None       None
422       Royalpizza          427         None       None
450       Ginospizza          455         None       None
786      Doublepizza          791         None       None

Let us commit this:

% git add recipes/search_poi_lists.py
% git commit -m "recipe - search available POI lists by name"
[recipe/search-poi-lists bcc472c] recipe - search available POI lists by name
 1 file changed, 7 insertions(+)
 create mode 100644 recipes/search_poi_lists.py

And push it up to the matching remote branch as we did with the initial README.md change. But this time, it will not go into the main staple branch just yet:

% git push -u origin recipe/search-poi-lists
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 12 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 503 bytes | 503.00 KiB/s, done.
Total 4 (delta 0), reused 0 (delta 0), pack-reused 0
remote:
remote: Create a pull request for 'recipe/search-poi-lists' on GitHub by visiting:
remote:      https://github.com/EQWorks/locussdk-collab/pull/new/recipe/search-poi-lists
remote:
To github.com:EQWorks/locussdk-collab.git
 * [new branch]      recipe/search-poi-lists -> recipe/search-poi-lists
Branch 'recipe/search-poi-lists' set up to track remote branch 'recipe/search-poi-lists' from 'origin'.

Once we have the branch pushed up, on GitHub the repository would prompt if we would like to make a pull request:

prompt

Let's go ahead and do that, as well as inviting some team members to review:

pr

Walk the edge

For someone who may be interested in the bleeding edge development of this recipe, they can clone the repository locally and test it out:

% git clone https://github.com/EQWorks/locussdk-collab.git
Cloning into 'locussdk-collab'...
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 7 (delta 0), reused 7 (delta 0), pack-reused 0
Receiving objects: 100% (7/7), done.
% cd locussdk-collab
% git checkout recipe/search-poi-lists
Branch 'recipe/search-poi-lists' set up to track remote branch 'recipe/search-poi-lists' from 'origin'.
Switched to a new branch 'recipe/search-poi-lists'

They can also fork this repository to their own GitHub account, make and commit changes, and initiate some pull requests to the original repository's recipe/search-poi-lists branch. Pull requests do not always have to go into the staple (usually main) branch. This flexibility is very empowering to groom a versatile and vibrant collaboration environment.

Iterate

reviews

After gathering some feedback in the pull request we made for the first recipe, let us incorporate comments and suggestions to improve things.

First, let's make it possible for a case-insensitive search for broader search scope, and make it False by default:

from locussdk import get_avail_poi_lists


def search_poi_lists(name, case=False):
    global_lists = get_avail_poi_lists(list_type='global')

    return global_lists[global_lists['name'].str.contains(name, case=case)]
>>> search_poi_lists('pizza')
                name  poi_list_id whitelabelid customerid
67       Bostonpizza           71         None       None
113       Pizzapizza          118         None       None
116       Pizzaville          121         None       None
117     Pizzadelight          122         None       None
139        Pizzanova          144         None       None
157         Pizzahut          162         None       None
175  Neworleanspizza          180         None       None
327         241Pizza          332         None       None
422       Royalpizza          427         None       None
450       Ginospizza          455         None       None
730     Mammas Pizza          735         None       None
786      Doublepizza          791         None       None
>>> # test when case=True
>>> search_poi_lists('pizza', case=True)
                name  poi_list_id whitelabelid customerid
67       Bostonpizza           71         None       None
113       Pizzapizza          118         None       None
175  Neworleanspizza          180         None       None
422       Royalpizza          427         None       None
450       Ginospizza          455         None       None
786      Doublepizza          791         None       None

Let's further examine the pandas.Series.str.contains() documentation. There are quite a few arguments like case that we can relay, using a Python mechanism known as unpacking, and a convention known as Keyword (or Named) Arguments. Together they are commonly expressed as **kwargs:

from locussdk import get_avail_poi_lists


def search_poi_lists(name, **kwargs):
    global_lists = get_avail_poi_lists(list_type='global')

    return global_lists[global_lists['name'].str.contains(name, **kwargs)]

Conceptually, we are relaying the Keyword Arguments that users of this function may optionally supply:

>>> search_poi_lists('some name', case=True, regex=False)

What happens within the function execution is that the pandas.Series.str.contains() method gets them as they are supplied to the function:

    return global_lists[global_lists['name'].str.contains(name, case=True, regex=True)]

Let's use another side example to illustrate the underlying unpacking behavior:

def concat(a, b):
    return a + b
# params dict
params = {
  'a': 'leo',
  'b': 'ric',
}
# unpack dict as keyword arguments a= and b=:
concat(**params)
# conceptually equivalent to manually call as:
concat(a=params['a'], b=params['b'])

Effectively we allow users of the search_poi_lists() function to fully enjoy the power behind the Pandas mechanism by doing less. We always prefer less code if it can achieve the same or better.

Let us make a commit based on this change since its scope is now well defined:

% git status
On branch recipe/search-poi-lists
Your branch is up to date with 'origin/recipe/search-poi-lists'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
  modified:   recipes/search_poi_lists.py

% git add -u  # add updated/modified file(s)

% git status
On branch recipe/search-poi-lists
Your branch is up to date with 'origin/recipe/search-poi-lists'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
  modified:   recipes/search_poi_lists.py

% git commit -m "recipe - search_poi_lists() support pd.Series.str.contains arguments such as 'case'"

% git status
On branch recipe/search-poi-lists
Your branch is ahead of 'origin/recipe/search-poi-lists' by 1 commit.
  (use "git push" to publish your local commits)

% git push
Enumerating objects: 7, done.
Counting objects: 100% (7/7), done.
Delta compression using up to 12 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 458 bytes | 458.00 KiB/s, done.
Total 4 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
To github.com:EQWorks/locussdk-collab.git
   bcc472c..d017109  recipe/search-poi-lists -> recipe/search-poi-lists

Knowing our change commit is all well preserved on the remote, we can move further with other suggestions, such as extending beyond searching just the "global" POI lists:

from locussdk import get_avail_poi_lists


def search_poi_lists(name, **kwargs):
    # build parameters we want to pass into get_avail_poi_lists()
    params = {'list_type': 'global'}  # default search in global
    for k in ['list_type', 'whitelabel', 'customer']:
        if v := kwargs.pop(k, None):
            params[k] = v

    lists = get_avail_poi_lists(**params)

    return lists[lists['name'].str.contains(name, **kwargs)]
% git add -u
% git commit -m "recipe - search_poi_lists() support locussdk.get_avail_poi_lists() arguments such as 'list_type'"
[recipe/search-poi-lists 8eea16e] recipe - search_poi_lists() support locussdk.get_avail_poi_lists() arguments such as 'list_type'
 1 file changed, 13 insertions(+), 7 deletions(-)
 rewrite recipes/search_poi_lists.py (60%)

And adding typing, inline-documentation, and sensible defaults for both arguments and output return behaviors:

from pandas import DataFrame
from locussdk import get_avail_poi_lists


def search_poi_lists(name: str = '', **kwargs) -> DataFrame:
    '''Search available POI lists based on locussdk.get_avail_poi_lists().

    Args:
        name (str): name to search in the available POI lists.
        **kwargs:
            All locussdk.get_avail_poi_lists() supported arguments.
            All pandas.Series.str.contains() supported arguments.

    Returns:
        pandas.DataFrame that contains the search resulting POI lists.

    Examples:
    >>> search_poi_lists('pizza')
                name  poi_list_id whitelabelid customerid
    67       Bostonpizza           71         None       None
    113       Pizzapizza          118         None       None
    116       Pizzaville          121         None       None
    117     Pizzadelight          122         None       None
    139        Pizzanova          144         None       None
    157         Pizzahut          162         None       None
    175  Neworleanspizza          180         None       None
    327         241Pizza          332         None       None
    422       Royalpizza          427         None       None
    450       Ginospizza          455         None       None
    730     Mammas Pizza          735         None       None
    786      Doublepizza          791         None       None
    '''
    # build parameters we want to pass into get_avail_poi_lists()
    params = {'list_type': 'global'}  # default search in global
    for k in ['list_type', 'whitelabel', 'customer']:
        if v := kwargs.pop(k, None):
            params[k] = v

    lists = get_avail_poi_lists(**params)

    # return all available POI lists if no search string given
    if not name:
        return lists

    return lists[lists['name'].str.contains(name, **kwargs)]

You can examine the code changes before committing using the git diff command:

% git diff
diff --git a/recipes/search_poi_lists.py b/recipes/search_poi_lists.py
index 122dc77..31e59ea 100644
--- a/recipes/search_poi_lists.py
+++ b/recipes/search_poi_lists.py
@@ -1,7 +1,35 @@
+from pandas import DataFrame
 from locussdk import get_avail_poi_lists


-def search_poi_lists(name, **kwargs):
+def search_poi_lists(name: str = '', **kwargs) -> DataFrame:
+    '''Search available POI lists based on locussdk.get_avail_poi_lists().
+
+    Args:
+        name (str): name to search in the available POI lists.
+        **kwargs:
+            All locussdk.get_avail_poi_lists() supported arguments.
+            All pandas.Series.str.contains() supported arguments.
+
+    Returns:
+        pandas.DataFrame that contains the search resulting POI lists.
+
+    Examples:
+    >>> search_poi_lists('pizza')
+                name  poi_list_id whitelabelid customerid
+    67       Bostonpizza           71         None       None
+    113       Pizzapizza          118         None       None
+    116       Pizzaville          121         None       None
+    117     Pizzadelight          122         None       None
+    139        Pizzanova          144         None       None
+    157         Pizzahut          162         None       None
+    175  Neworleanspizza          180         None       None
+    327         241Pizza          332         None       None
+    422       Royalpizza          427         None       None
+    450       Ginospizza          455         None       None
+    730     Mammas Pizza          735         None       None
+    786      Doublepizza          791         None       None
+    '''
     # build parameters we want to pass into get_avail_poi_lists()
     params = {'list_type': 'global'}  # default search in global
     for k in ['list_type', 'whitelabel', 'customer']:
@@ -10,4 +38,8 @@ def search_poi_lists(name, **kwargs):

     lists = get_avail_poi_lists(**params)

+    # return all available POI lists if no search string given
+    if not name:
+        return lists
+
     return lists[lists['name'].str.contains(name, **kwargs)]

All looks good:

% git add -u
% git commit -m "recipe - add typing, docs and defaults for search_poi_lists()"
[recipe/search-poi-lists 7c78644] recipe - add typing, docs and defaults for search_poi_lists()
 1 file changed, 33 insertions(+), 1 deletion(-)

We should have 2 commits await, let's push them up to our remote repository:

% git status
On branch recipe/search-poi-lists
Your branch is ahead of 'origin/recipe/search-poi-lists' by 2 commits.
  (use "git push" to publish your local commits)

% git push
<output omitted>

On GitHub, I can go ahead and mark all the inline comments as resolved:

resolved

And once some of the reviewers get a chance to review again and possibly approve, we may have our first recipe officially merged back into the main branch, indicating that we are ready to share it with the world!

merge

updated

GitHub offers various other features such as issues for feature requests and bug reports and many more.

References