Skip to content

Latest commit

 

History

History
2011 lines (1413 loc) · 68.3 KB

blog_rough_draft.md

File metadata and controls

2011 lines (1413 loc) · 68.3 KB

In this post, we will look at how blogplish (blog autopublish) was written. This script this post describes wrote itself. It sounds confusing...let's just write it again.

Make a new git repo. Make a file called blogplish.py, using touch blogplish.py.

Add only the code print("The script is working.") in this script.

You can run this file using the python command in your terminal:

$ python blogplish.py\nThe script is working.

blogplish.py


print("The script is working.")

When you're working on a task you've never done before, most of your time is usually spent figuring out what it is you need to do. In this case, we found ourselves stuck planning out how to write the script (Do we use regular Bash, or the Github API?, Do we get all commits at once, or go backwards using the HEAD~1 style syntax until there's no commits left?, and so on). When you get that stuck, it's best to start out writing pseudocode and describe what you think you need to do overall:

blogplish.py


print("The script is working.")

"""
Write function to call Bash command from Python

Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

As far as the first step "Write function to call Bash command from Python" goes, in theory, I already had a sturdy function to flexibly run linux commands in a python script:

blogplish.py

from subprocess import Popen, PIPE


"""
Write function to call Bash command from Python

Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""


def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


output, error = call_sp('ls')
print(output)

Run 'ls' in your terminal, and see it outputs only blogplish.py, the only file in our project (besides hidden .git files). In the python script, it also runs ls, in the call_sp('ls') portion. The output here should match:

blogplish.py

from subprocess import Popen, PIPE



"""
Write function to call Bash command from Python

Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

ydef call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


output, error = call_sp('ls')
print(output)
x

Run 'ls' in your terminal, and see it outputs only blogplish.py, the only file in our project (besides hidden .git files). In the python script, it also runs ls, in the call_sp('ls') portion. The output here should match: cchilders:/blogplish (master) $ python blogplish.py blogplish.py cchilders:/blogplish (master) $ ls blogplish.py

blogplish.py

from subprocess import Popen, PIPE



"""
Write function to call Bash command from Python

Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


output, error = call_sp('ls')
print(output)

Update our pseudocode:

blogplish.py

from subprocess import Popen, PIPE



"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


output, error = call_sp('ls')
print(output)

** ERROR: call_sp('ls', '-a', ...) doesn't work; *args is only used to substitute into the command string like 'ls %s' etc. Fix this before publish ** As for Get all commit info, we can use git log for that. Update call_sp:

blogplish.py

from subprocess import Popen, PIPE



"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


output, error = call_sp('git log')
print(output)

Now run our script again. You can compare the output with git log in your terminal. You should see a summary of your commit history like: cchilders:~/blogplish (master) $ python blogplish.py commit ea270e9a879b385580a855f1f83736ccce345de3 Author: Cody Childers <[email protected]> Date: Sun Jul 30 00:06:03 2017 -0500 ** ERROR: call_sp('ls', '-a', ...) doesn't work; *args is only used to substitute into the command string like 'ls %s' etc. Fix this before publish ** As for `Get all commit info`, we can use `git log` for that. Update `call_sp`: commit bb19fca5f6461fbf8ca6e1870964021f818ba063 Author: Cody Childers <[email protected]> Date: Sun Jul 30 00:00:11 2017 -0500 Update our pseudocode: ...etc...

blogplish.py

from subprocess import Popen, PIPE



"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


output, error = call_sp('git log')
print(output)

Next, we need to parse the output of git log. Look at what it outputs and take a few minutes to think about how you'd parse it to get the commit ID and message for each commit. Start a function to do the parsing:

blogplish.py

from subprocess import Popen, PIPE


"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    pass


output, error = call_sp('git log')
print(output)

At first we tried this:

blogplish.py

from subprocess import Popen, PIPE



"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    # https://stackoverflow.com/questions/10974932/split-string-based-on-a-regular-expression
    commits_array = re.split("commit \w{40}", text_output)
    print(commits_array)


output, error = call_sp('git log')
print(output)

parse_git_log_info(output)

But the problem was, it was cutting off the commit id; ['', "\nAuthor: Cody Childers [email protected]\nDate: Sun Jul 30 00:15:39 2017 -0500\n\n Next, we need to parse the output of git log. Look at what it outputs and take a few minutes to think about how you'd parse it to get the commit ID and message for each commit.\n \n Start a function to do the parsing:\n\n", '\nAuthor: ...]

blogplish.py

from subprocess import Popen, PIPE


"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    # https://stackoverflow.com/questions/10974932/split-string-based-on-a-regular-expression
    commits_array = re.split("commit \w{40}", text_output)
    print(commits_array)


output, error = call_sp('git log')
print(output)

parse_git_log_info(output)
x

We were able to split the git log output by using re.findall. The re package is a python pattern matcher, that allows you to find text of interest. The easiest way to write regexes is to go to pythex.org.

The time module introduces a pause as we looked for a list of approximately 10-15 commits:

blogplish.py

import time
from subprocess import Popen, PIPE


"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    # https://stackoverflow.com/questions/4697882/how-can-i-find-all-matches-to-a-regular-expression-in-python
    # https://stackoverflow.com/questions/1870954/python-regular-expression-across-multiple-lines
    rgx = re.compile(r"commit \w{40}.*?(?=commit)", re.DOTALL)
    commits_array = re.findall(rgx, text_output)
    print(len(commits_array))
    time.sleep(3)
    for item in commits_array:
        print(item)
        print('\n\n\n\n')


output, error = call_sp('git log')

parse_git_log_info(output)

This seemed like a great start, but we soon noticed that the commit messages where we had copypasted the output of git log broke our parse_git_log_info function, because they also matched commit \w{40}:

...        
commit 3e4aca9f102229c890ef73967f4a4c1c61a51a73    Author: Cody Childers <[email protected]>    Date:   Sun Jul 30 00:08:46 2017 -0500            Now run our script again. You can compare the output with `git log` in your terminal. You should see a summary of your commit history like:            ```        cchilders:~/blogplish (master)        $ python blogplish.py        commit ea270e9a879b385580a855f1f83736ccce345de3        Author: Cody Childers <[email protected]>        Date:   Sun Jul 30 00:06:03 2017 -0500                ** ERROR: call_sp('ls', '-a', ...) doesn't work; *args is only used to substitute into the command string like 'ls %s' etc. Fix this before publish **                As for `Get all commit info`, we can use `git log` for that. Update `call_sp`:            commit bb19fca5f6461fbf8ca6e1870964021f818ba063        Author: Cody Childers <[email protected]>        Date:   Sun Jul 30 00:00:11 2017 -0500        ...    ```        

This threw a wrench in our plan of 1 distinct function to split the commits into an array, and another function to parse each singular commit one by one. Instead, we ended up with a rambling parser that parsed the entire output line by line. Hideous, but works:


`blogplish.py`

```import re
from subprocess import Popen, PIPE


"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    commit_count = 0
    commit_start_rgx = r"^commit \w{40}"
    lines = text_output.split('\n')
    # commits_array = []
    current_commit_string = ""
    for line in lines:
        match = re.match(commit_start_rgx, line)
        if match:
            commit_count += 1
            print(line + " matched the start of a commit")
    print("\n")
    print(commit_count)
    # return commits_array


output, error = call_sp('git log')

parse_git_log_info(output)

This parser goes line by line, checking if the line starts a new commit block or not match = re.match(commit_start_rgx, line). If not, the parser adds the line to the commit message if applicable (if it doesn't start with 'commit', 'Author: ', or 'Date: '). If the line does match "^commit (?P<commit_id>\w{40})", it will add the data to the final results if the data is ready (except on the first go around, where we have current_commit_id initialized to None). While it isn't as clean looking as smaller parsers, I always find this line-by-line style to be less error prone for tricky text parsing.

blogplish.py

from subprocess import Popen, PIPE



"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    """ returns a commits_array like:

        [
            {'commit_id': '23hj3sz...', 'message': 'cleanup cruft'},
            {'commit_id': 'df8dje...', 'message': 'Changed paypal api setting to...'},
            ...
        ]
    """
    commit_start_rgx = r"^commit (?P<commit_id>\w{40})"
    lines = text_output.split('\n')
    commits_array = []
    current_commit_id = None
    current_commit_message_string = ""

    for line in lines:
        match = re.match(commit_start_rgx, line)
        if match:
            # this if block fails only once, on the first pass through
            if current_commit_id:
                commits_array.append({'commit_id': current_commit_id, 'message': current_commit_message_string.strip()})
            current_commit_id = match.group('commit_id')
            current_commit_message_string = ""
        else:
            if not line.startswith('Author: ') and not line.startswith('Date: '):
                current_commit_message_string += line

    return commits_array


output, error = call_sp('git log')

print(parse_git_log_info(output))

Now, let's work on the Add entire files that were changed to final string part. To do this, we want to first find the files that were changed in each commit:

blogplish.py

from subprocess import Popen, PIPE



"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    """ returns a commits_array like:

        [
            {'commit_id': '23hj3sz...', 'message': 'cleanup cruft'},
            {'commit_id': 'df8dje...', 'message': 'Changed paypal api setting to...'},
            ...
        ]
    """
    commit_start_rgx = r"^commit (?P<commit_id>\w{40})"
    lines = text_output.split('\n')
    commits_array = []
    current_commit_id = None
    current_commit_message_string = ""

    for line in lines:
        match = re.match(commit_start_rgx, line)
        if match:
            # this if block fails only once, on the first pass through
            if current_commit_id:
                commits_array.append({'commit_id': current_commit_id, 'message': current_commit_message_string.strip()})
            current_commit_id = match.group('commit_id')
            current_commit_message_string = ""
        else:
            if not line.startswith('Author: ') and not line.startswith('Date: '):
                current_commit_message_string += line

    return commits_array


def get_files_that_were_changed_in_commit(commit_id):
    # "get files that were changed in a commit": https://stackoverflow.com/questions/424071/how-to-list-all-the-files-in-a-commit
    output, error = call_sp('git diff-tree --no-commit-id --name-only -r %s' % commit_id)
    if error:
        raise Exception("Error in get_files_that_were_changed_in_commit():\n\n" + error)
    return output.split('\n')


output, error = call_sp('git log')

parsed_commits = parse_git_log_info(output)

first_commit = parsed_commits[0]
first_commit_id = first_commit['commit_id']

changed_files = get_files_that_were_changed_in_commit(first_commit_id)
print(changed_files)

We have a small issue however, as the output is ['blogplish.py', '']. We can prune empty lines out of our result using a list comprehension:

blogplish.py

from subprocess import Popen, PIPE



"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    """ returns a commits_array like:

        [
            {'commit_id': '23hj3sz...', 'message': 'cleanup cruft'},
            {'commit_id': 'df8dje...', 'message': 'Changed paypal api setting to...'},
            ...
        ]
    """
    commit_start_rgx = r"^commit (?P<commit_id>\w{40})"
    lines = text_output.split('\n')
    commits_array = []
    current_commit_id = None
    current_commit_message_string = ""

    for line in lines:
        match = re.match(commit_start_rgx, line)
        if match:
            # this if block fails only once, on the first pass through
            if current_commit_id:
                commits_array.append({'commit_id': current_commit_id, 'message': current_commit_message_string.strip()})
            current_commit_id = match.group('commit_id')
            current_commit_message_string = ""
        else:
            if not line.startswith('Author: ') and not line.startswith('Date: '):
                current_commit_message_string += line

    return commits_array


def get_files_that_were_changed_in_commit(commit_id):
    # "get files that were changed in a commit": https://stackoverflow.com/questions/424071/how-to-list-all-the-files-in-a-commit
    output, error = call_sp('git diff-tree --no-commit-id --name-only -r %s' % commit_id)
    if error:
        raise Exception("Error in get_files_that_were_changed_in_commit():\n\n" + error)
    changed_files_intermediary = output.split('\n')
    # at first got a result like ['blogplish.py', '']
    changed_files = [this_file for this_file in changed_files_intermediary if this_file]
    return changed_files


output, error = call_sp('git log')

parsed_commits = parse_git_log_info(output)

first_commit = parsed_commits[0]
first_commit_id = first_commit['commit_id']

changed_files = get_files_that_were_changed_in_commit(first_commit_id)
print(changed_files)

Now that we know which files were changed in any commit, we need to get the contents of the file at that point in time:

blogplish.py

from subprocess import Popen, PIPE



"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    """ returns a commits_array like:

        [
            {'commit_id': '23hj3sz...', 'message': 'cleanup cruft'},
            {'commit_id': 'df8dje...', 'message': 'Changed paypal api setting to...'},
            ...
        ]
    """
    commit_start_rgx = r"^commit (?P<commit_id>\w{40})"
    lines = text_output.split('\n')
    commits_array = []
    current_commit_id = None
    current_commit_message_string = ""

    for line in lines:
        match = re.match(commit_start_rgx, line)
        if match:
            # this if block fails only once, on the first pass through
            if current_commit_id:
                commits_array.append({'commit_id': current_commit_id, 'message': current_commit_message_string.strip()})
            current_commit_id = match.group('commit_id')
            current_commit_message_string = ""
        else:
            if not line.startswith('Author: ') and not line.startswith('Date: '):
                current_commit_message_string += line

    return commits_array


def get_files_that_were_changed_in_commit(commit_id):
    # "get files that were changed in a commit": https://stackoverflow.com/questions/424071/how-to-list-all-the-files-in-a-commit
    output, error = call_sp('git diff-tree --no-commit-id --name-only -r %s' % commit_id)
    if error:
        raise Exception("Error in get_files_that_were_changed_in_commit():\n\n" + error)
    changed_files_intermediary = output.split('\n')
    # at first got a result like ['blogplish.py', '']
    changed_files = [this_file for this_file in changed_files_intermediary if this_file]
    return changed_files


def get_contents_of_certain_file_in_certain_commit(commit_id, filename):
    # "get contents of a certain file in a commit": https://stackoverflow.com/questions/2497051/how-can-i-show-the-contents-of-a-file-at-a-specific-state-of-a-git-repo
    output, error = call_sp('git show %s:%s' % (commit_id, filename))
    if error:
        raise Exception("Error in get_contents_of_certain_file_in_certain_commit():\n\n" + error)
    return output


output, error = call_sp('git log')

parsed_commits = parse_git_log_info(output)

first_commit = parsed_commits[0]
first_commit_id = first_commit['commit_id']

changed_files = get_files_that_were_changed_in_commit(first_commit_id)

for changed_file in changed_files:
    contents = get_contents_of_certain_file_in_certain_commit(first_commit_id, changed_file)
    print(contents)

To double-check this, we used the first commit of the blogplish project and got: cchilders:~/blogplish (master) $ python blogplish.py print("The script is working.") It's working.

blogplish.py

from subprocess import Popen, PIPE



"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    """ returns a commits_array like:

        [
            {'commit_id': '23hj3sz...', 'message': 'cleanup cruft'},
            {'commit_id': 'df8dje...', 'message': 'Changed paypal api setting to...'},
            ...
        ]
    """
    commit_start_rgx = r"^commit (?P<commit_id>\w{40})"
    lines = text_output.split('\n')
    commits_array = []
    current_commit_id = None
    current_commit_message_string = ""

    for line in lines:
        match = re.match(commit_start_rgx, line)
        if match:
            # this if block fails only once, on the first pass through
            if current_commit_id:
                commits_array.append({'commit_id': current_commit_id, 'message': current_commit_message_string.strip()})
            current_commit_id = match.group('commit_id')
            current_commit_message_string = ""
        else:
            if not line.startswith('Author: ') and not line.startswith('Date: '):
                current_commit_message_string += line

    return commits_array


def get_files_that_were_changed_in_commit(commit_id):
    # "get files that were changed in a commit": https://stackoverflow.com/questions/424071/how-to-list-all-the-files-in-a-commit
    output, error = call_sp('git diff-tree --no-commit-id --name-only -r %s' % commit_id)
    if error:
        raise Exception("Error in get_files_that_were_changed_in_commit():\n\n" + error)
    changed_files_intermediary = output.split('\n')
    # at first got a result like ['blogplish.py', '']
    changed_files = [this_file for this_file in changed_files_intermediary if this_file]
    return changed_files


def get_contents_of_certain_file_in_certain_commit(commit_id, filename):
    # "get contents of a certain file in a commit": https://stackoverflow.com/questions/2497051/how-can-i-show-the-contents-of-a-file-at-a-specific-state-of-a-git-repo
    output, error = call_sp('git show %s:%s' % (commit_id, filename))
    if error:
        raise Exception("Error in get_contents_of_certain_file_in_certain_commit():\n\n" + error)
    return output


output, error = call_sp('git log')

parsed_commits = parse_git_log_info(output)

first_commit = parsed_commits[0]
first_commit_id = first_commit['commit_id']

changed_files = get_files_that_were_changed_in_commit(first_commit_id)

# for changed_file in changed_files:
#     contents = get_contents_of_certain_file_in_certain_commit(first_commit_id, changed_file)
#     print(contents)

print(get_contents_of_certain_file_in_certain_commit('b37ae0371d1', 'blogplish.py'))

To get the diff of a file at a certain point in time, we use git diff {older_commit_id}..{newer_commit_id} {filename} syntax. My output for the diff of the most recent commit looked like: cchilders:~/blogplish (master) $ python blogplish.py diff --git a/blogplish.py b/blogplish.py index 0285d88..7be3810 100644 --- a/blogplish.py +++ b/blogplish.py @@ -82,6 +82,8 @@ first_commit_id = first_commit['commit_id'] changed_files = get_files_that_were_changed_in_commit(first_commit_id) -for changed_file in changed_files: - contents = get_contents_of_certain_file_in_certain_commit(first_commit_id, changed_file) - print(contents) +# for changed_file in changed_files: +# contents = get_contents_of_certain_file_in_certain_commit(first_commit_id, changed_file) +# print(contents) + +print(get_contents_of_certain_file_in_certain_commit('b37ae0371d1', 'blogplish.py'))

blogplish.py

import sys
from subprocess import Popen, PIPE

THIS_SCRIPT_NAME = sys.argv[0]


"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    """ returns a commits_array like:

        [
            {'commit_id': '23hj3sz...', 'message': 'cleanup cruft'},
            {'commit_id': 'df8dje...', 'message': 'Changed paypal api setting to...'},
            ...
        ]
    """
    commit_start_rgx = r"^commit (?P<commit_id>\w{40})"
    lines = text_output.split('\n')
    commits_array = []
    current_commit_id = None
    current_commit_message_string = ""

    for line in lines:
        match = re.match(commit_start_rgx, line)
        if match:
            # this if block fails only once, on the first pass through
            if current_commit_id:
                commits_array.append({'commit_id': current_commit_id, 'message': current_commit_message_string.strip()})
            current_commit_id = match.group('commit_id')
            current_commit_message_string = ""
        else:
            if not line.startswith('Author: ') and not line.startswith('Date: '):
                current_commit_message_string += line

    return commits_array


def get_files_that_were_changed_in_commit(commit_id):
    # "get files that were changed in a commit": https://stackoverflow.com/questions/424071/how-to-list-all-the-files-in-a-commit
    output, error = call_sp('git diff-tree --no-commit-id --name-only -r %s' % commit_id)
    if error:
        raise Exception("Error in get_files_that_were_changed_in_commit():\n\n" + error)
    changed_files_intermediary = output.split('\n')
    # at first got a result like ['blogplish.py', '']
    changed_files = [this_file for this_file in changed_files_intermediary if this_file]
    return changed_files


def get_contents_of_certain_file_in_certain_commit(commit_id, filename):
    # "get contents of a certain file in a commit": https://stackoverflow.com/questions/2497051/how-can-i-show-the-contents-of-a-file-at-a-specific-state-of-a-git-repo
    output, error = call_sp('git show %s:%s' % (commit_id, filename))
    if error:
        raise Exception("Error in get_contents_of_certain_file_in_certain_commit():\n\n" + error)
    return output


def get_diff_of_certain_file_in_certain_commit(newer_commit_id, older_commit_id, filename):
    """
    head diff means how many commits back, as in

        HEAD~3

    means 3 commits back
    """
    # "get dif of a certain file in certain commit": https://stackoverflow.com/questions/42357521/generate-diff-file-of-a-specific-commit-in-git
    command = 'git diff {older_commit_id}..{newer_commit_id} {filename}'.format(older_commit_id=older_commit_id, newer_commit_id=newer_commit_id, filename=filename)
    raw_diff, error = call_sp(command)
    if error:
        raise Exception("Error in get_diff_of_certain_file_in_certain_commit():\n\n" + error)
    return raw_diff


output, error = call_sp('git log')

parsed_commits = parse_git_log_info(output)

first_commit = parsed_commits[0]
first_commit_id = first_commit['commit_id']

changed_files = get_files_that_were_changed_in_commit(first_commit_id)

# for changed_file in changed_files:
#     contents = get_contents_of_certain_file_in_certain_commit(first_commit_id, changed_file)
#     print(contents)

# print(get_contents_of_certain_file_in_certain_commit('b37ae0371d1', 'blogplish.py'))

a_diff_2_commits_back = get_diff_of_certain_file_in_certain_commit('c4b7c7cabccc350eef5ef80344f', 'f66b7bfd0f82d5b987d9f71f', THIS_SCRIPT_NAME)
print(a_diff_2_commits_back)

We're finally ready to combine these 3 functions into an autogenerated markdown file for our blogpost. We started with this func and reviewed the commits data we first got:

blogplish.py

import sys
from subprocess import Popen, PIPE

THIS_SCRIPT_NAME = sys.argv[0]


"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    """ returns a commits_array like:

        [
            {'commit_id': '23hj3sz...', 'message': 'cleanup cruft'},
            {'commit_id': 'df8dje...', 'message': 'Changed paypal api setting to...'},
            ...
        ]
    """
    commit_start_rgx = r"^commit (?P<commit_id>\w{40})"
    lines = text_output.split('\n')
    commits_array = []
    current_commit_id = None
    current_commit_message_string = ""

    for line in lines:
        match = re.match(commit_start_rgx, line)
        if match:
            # this if block fails only once, on the first pass through
            if current_commit_id:
                commits_array.append({'commit_id': current_commit_id, 'message': current_commit_message_string.strip()})
            current_commit_id = match.group('commit_id')
            current_commit_message_string = ""
        else:
            if not line.startswith('Author: ') and not line.startswith('Date: '):
                current_commit_message_string += line

    return commits_array


def get_files_that_were_changed_in_commit(commit_id):
    # "get files that were changed in a commit": https://stackoverflow.com/questions/424071/how-to-list-all-the-files-in-a-commit
    output, error = call_sp('git diff-tree --no-commit-id --name-only -r %s' % commit_id)
    if error:
        raise Exception("Error in get_files_that_were_changed_in_commit():\n\n" + error)
    changed_files_intermediary = output.split('\n')
    # at first got a result like ['blogplish.py', '']
    changed_files = [this_file for this_file in changed_files_intermediary if this_file]
    return changed_files


def get_contents_of_certain_file_in_certain_commit(commit_id, filename):
    # "get contents of a certain file in a commit": https://stackoverflow.com/questions/2497051/how-can-i-show-the-contents-of-a-file-at-a-specific-state-of-a-git-repo
    output, error = call_sp('git show %s:%s' % (commit_id, filename))
    if error:
        raise Exception("Error in get_contents_of_certain_file_in_certain_commit():\n\n" + error)
    return output


def get_diff_of_certain_file_in_certain_commit(newer_commit_id, older_commit_id, filename):
    """
    head diff means how many commits back, as in

        HEAD~3

    means 3 commits back
    """
    # "get dif of a certain file in certain commit": https://stackoverflow.com/questions/42357521/generate-diff-file-of-a-specific-commit-in-git
    command = 'git diff {older_commit_id}..{newer_commit_id} {filename}'.format(older_commit_id=older_commit_id, newer_commit_id=newer_commit_id, filename=filename)
    raw_diff, error = call_sp(command)
    if error:
        raise Exception("Error in get_diff_of_certain_file_in_certain_commit():\n\n" + error)
    return raw_diff


def auto_blogplish_blog():
    blog_post = ""

    output, error = call_sp('git log')

    parsed_commits = parse_git_log_info(output)
    print(parsed_commits)

    # first_commit = parsed_commits[0]
    # first_commit_id = first_commit['commit_id']
    #
    # changed_files = get_files_that_were_changed_in_commit(first_commit_id)
    #
    # # for changed_file in changed_files:
    # #     contents = get_contents_of_certain_file_in_certain_commit(first_commit_id, changed_file)
    # #     print(contents)
    #
    # # print(get_contents_of_certain_file_in_certain_commit('b37ae0371d1', 'blogplish.py'))
    #
    # a_diff_2_commits_back = get_diff_of_certain_file_in_certain_commit('c4b7c7cabccc350eef5ef80344f', 'f66b7bfd0f82d5b987d9f71f', THIS_SCRIPT_NAME)
    # print(a_diff_2_commits_back)


auto_blogplish_blog()

While the commits come back in order, we write tutorials from start to finish, so the order is backwards. Reversing a list in python is very easy:

blogplish.py

import sys
from subprocess import Popen, PIPE

THIS_SCRIPT_NAME = sys.argv[0]


"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    """ returns a commits_array like:

        [
            {'commit_id': '23hj3sz...', 'message': 'cleanup cruft'},
            {'commit_id': 'df8dje...', 'message': 'Changed paypal api setting to...'},
            ...
        ]
    """
    commit_start_rgx = r"^commit (?P<commit_id>\w{40})"
    lines = text_output.split('\n')
    commits_array = []
    current_commit_id = None
    current_commit_message_string = ""

    for line in lines:
        match = re.match(commit_start_rgx, line)
        if match:
            # this if block fails only once, on the first pass through
            if current_commit_id:
                commits_array.append({'commit_id': current_commit_id, 'message': current_commit_message_string.strip()})
            current_commit_id = match.group('commit_id')
            current_commit_message_string = ""
        else:
            if not line.startswith('Author: ') and not line.startswith('Date: '):
                current_commit_message_string += line

    return commits_array


def get_files_that_were_changed_in_commit(commit_id):
    # "get files that were changed in a commit": https://stackoverflow.com/questions/424071/how-to-list-all-the-files-in-a-commit
    output, error = call_sp('git diff-tree --no-commit-id --name-only -r %s' % commit_id)
    if error:
        raise Exception("Error in get_files_that_were_changed_in_commit():\n\n" + error)
    changed_files_intermediary = output.split('\n')
    # at first got a result like ['blogplish.py', '']
    changed_files = [this_file for this_file in changed_files_intermediary if this_file]
    return changed_files


def get_contents_of_certain_file_in_certain_commit(commit_id, filename):
    # "get contents of a certain file in a commit": https://stackoverflow.com/questions/2497051/how-can-i-show-the-contents-of-a-file-at-a-specific-state-of-a-git-repo
    output, error = call_sp('git show %s:%s' % (commit_id, filename))
    if error:
        raise Exception("Error in get_contents_of_certain_file_in_certain_commit():\n\n" + error)
    return output


def get_diff_of_certain_file_in_certain_commit(newer_commit_id, older_commit_id, filename):
    """
    head diff means how many commits back, as in

        HEAD~3

    means 3 commits back
    """
    # "get dif of a certain file in certain commit": https://stackoverflow.com/questions/42357521/generate-diff-file-of-a-specific-commit-in-git
    command = 'git diff {older_commit_id}..{newer_commit_id} {filename}'.format(older_commit_id=older_commit_id, newer_commit_id=newer_commit_id, filename=filename)
    raw_diff, error = call_sp(command)
    if error:
        raise Exception("Error in get_diff_of_certain_file_in_certain_commit():\n\n" + error)
    return raw_diff


def auto_blogplish_blog():
    blog_post = ""

    output, error = call_sp('git log')

    parsed_commits = parse_git_log_info(output)
    # "reverse a list python": https://stackoverflow.com/questions/3940128/how-can-i-reverse-a-list-in-python
    parsed_commits.reverse()
    print(parsed_commits)

    # first_commit = parsed_commits[0]
    # first_commit_id = first_commit['commit_id']
    #
    # changed_files = get_files_that_were_changed_in_commit(first_commit_id)
    #
    # # for changed_file in changed_files:
    # #     contents = get_contents_of_certain_file_in_certain_commit(first_commit_id, changed_file)
    # #     print(contents)
    #
    # # print(get_contents_of_certain_file_in_certain_commit('b37ae0371d1', 'blogplish.py'))
    #
    # a_diff_2_commits_back = get_diff_of_certain_file_in_certain_commit('c4b7c7cabccc350eef5ef80344f', 'f66b7bfd0f82d5b987d9f71f', THIS_SCRIPT_NAME)
    # print(a_diff_2_commits_back)


auto_blogplish_blog()

Now we want to start iterating over the commit data, generating the text. The order will go 1. commit message 2. the diff of each file that was changed 3. the total contents of each file that was changed First, get the commit messages added in the correct order:

blogplish.py

import sys
from subprocess import Popen, PIPE

THIS_SCRIPT_NAME = sys.argv[0]


"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    """ returns a commits_array like:

        [
            {'commit_id': '23hj3sz...', 'message': 'cleanup cruft'},
            {'commit_id': 'df8dje...', 'message': 'Changed paypal api setting to...'},
            ...
        ]
    """
    commit_start_rgx = r"^commit (?P<commit_id>\w{40})"
    lines = text_output.split('\n')
    commits_array = []
    current_commit_id = None
    current_commit_message_string = ""

    for line in lines:
        match = re.match(commit_start_rgx, line)
        if match:
            # this if block fails only once, on the first pass through
            if current_commit_id:
                commits_array.append({'commit_id': current_commit_id, 'message': current_commit_message_string.strip()})
            current_commit_id = match.group('commit_id')
            current_commit_message_string = ""
        else:
            if not line.startswith('Author: ') and not line.startswith('Date: '):
                current_commit_message_string += line

    return commits_array


def get_files_that_were_changed_in_commit(commit_id):
    # "get files that were changed in a commit": https://stackoverflow.com/questions/424071/how-to-list-all-the-files-in-a-commit
    output, error = call_sp('git diff-tree --no-commit-id --name-only -r %s' % commit_id)
    if error:
        raise Exception("Error in get_files_that_were_changed_in_commit():\n\n" + error)
    changed_files_intermediary = output.split('\n')
    # at first got a result like ['blogplish.py', '']
    changed_files = [this_file for this_file in changed_files_intermediary if this_file]
    return changed_files


def get_contents_of_certain_file_in_certain_commit(commit_id, filename):
    # "get contents of a certain file in a commit": https://stackoverflow.com/questions/2497051/how-can-i-show-the-contents-of-a-file-at-a-specific-state-of-a-git-repo
    output, error = call_sp('git show %s:%s' % (commit_id, filename))
    if error:
        raise Exception("Error in get_contents_of_certain_file_in_certain_commit():\n\n" + error)
    return output


def get_diff_of_certain_file_in_certain_commit(newer_commit_id, older_commit_id, filename):
    """
    head diff means how many commits back, as in

        HEAD~3

    means 3 commits back
    """
    # "get dif of a certain file in certain commit": https://stackoverflow.com/questions/42357521/generate-diff-file-of-a-specific-commit-in-git
    command = 'git diff {older_commit_id}..{newer_commit_id} {filename}'.format(older_commit_id=older_commit_id, newer_commit_id=newer_commit_id, filename=filename)
    raw_diff, error = call_sp(command)
    if error:
        raise Exception("Error in get_diff_of_certain_file_in_certain_commit():\n\n" + error)
    return raw_diff


def auto_blogplish_blog():
    blog_post = ""

    output, error = call_sp('git log')

    parsed_commits = parse_git_log_info(output)
    # "reverse a list python": https://stackoverflow.com/questions/3940128/how-can-i-reverse-a-list-in-python
    parsed_commits.reverse()

    first_commit = parsed_commits[0]
    first_commit_id = first_commit['commit_id']

    for index, commit_data in enumerate(parsed_commits):
        blog_post += commit_data['message']
        blog_post += '\n\n\n\n'

    # changed_files = get_files_that_were_changed_in_commit(first_commit_id)

    # for changed_file in changed_files:
    #     contents = get_contents_of_certain_file_in_certain_commit(first_commit_id, changed_file)
    #     print(contents)

    # print(get_contents_of_certain_file_in_certain_commit('b37ae0371d1', 'blogplish.py'))

    # a_diff_2_commits_back = get_diff_of_certain_file_in_certain_commit('c4b7c7cabccc350eef5ef80344f', 'f66b7bfd0f82d5b987d9f71f', THIS_SCRIPT_NAME)
    # print(a_diff_2_commits_back)

    return blog_post


blog_text = auto_blogplish_blog()
print(blog_text)

I did step 3, add the total contents, second:

blogplish.py

import sys
from subprocess import Popen, PIPE

THIS_SCRIPT_NAME = sys.argv[0]


"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    """ returns a commits_array like:

        [
            {'commit_id': '23hj3sz...', 'message': 'cleanup cruft'},
            {'commit_id': 'df8dje...', 'message': 'Changed paypal api setting to...'},
            ...
        ]
    """
    commit_start_rgx = r"^commit (?P<commit_id>\w{40})"
    lines = text_output.split('\n')
    commits_array = []
    current_commit_id = None
    current_commit_message_string = ""

    for line in lines:
        match = re.match(commit_start_rgx, line)
        if match:
            # this if block fails only once, on the first pass through
            if current_commit_id:
                commits_array.append({'commit_id': current_commit_id, 'message': current_commit_message_string.strip()})
            current_commit_id = match.group('commit_id')
            current_commit_message_string = ""
        else:
            if not line.startswith('Author: ') and not line.startswith('Date: '):
                current_commit_message_string += line

    return commits_array


def get_files_that_were_changed_in_commit(commit_id):
    # "get files that were changed in a commit": https://stackoverflow.com/questions/424071/how-to-list-all-the-files-in-a-commit
    output, error = call_sp('git diff-tree --no-commit-id --name-only -r %s' % commit_id)
    if error:
        raise Exception("Error in get_files_that_were_changed_in_commit():\n\n" + error)
    changed_files_intermediary = output.split('\n')
    # at first got a result like ['blogplish.py', '']
    changed_files = [this_file for this_file in changed_files_intermediary if this_file]
    return changed_files


def get_contents_of_certain_file_in_certain_commit(commit_id, filename):
    # "get contents of a certain file in a commit": https://stackoverflow.com/questions/2497051/how-can-i-show-the-contents-of-a-file-at-a-specific-state-of-a-git-repo
    output, error = call_sp('git show %s:%s' % (commit_id, filename))
    if error:
        raise Exception("Error in get_contents_of_certain_file_in_certain_commit():\n\n" + error)
    return output


def get_diff_of_certain_file_in_certain_commit(newer_commit_id, older_commit_id, filename):
    """
    head diff means how many commits back, as in

        HEAD~3

    means 3 commits back
    """
    # "get dif of a certain file in certain commit": https://stackoverflow.com/questions/42357521/generate-diff-file-of-a-specific-commit-in-git
    command = 'git diff {older_commit_id}..{newer_commit_id} {filename}'.format(older_commit_id=older_commit_id, newer_commit_id=newer_commit_id, filename=filename)
    raw_diff, error = call_sp(command)
    if error:
        raise Exception("Error in get_diff_of_certain_file_in_certain_commit():\n\n" + error)
    return raw_diff


def auto_blogplish_blog():
    blog_post = ""

    output, error = call_sp('git log')

    parsed_commits = parse_git_log_info(output)
    # "reverse a list python": https://stackoverflow.com/questions/3940128/how-can-i-reverse-a-list-in-python
    parsed_commits.reverse()

    for index, commit_data in enumerate(parsed_commits):
        blog_post += commit_data['message']
        blog_post += '\n\n\n\n'
        commit_id = commit_data['commit_id']

        changed_files = get_files_that_were_changed_in_commit(commit_id)
        if changed_files:
            blog_post += '$$$ Entire contents of changed files: $$$\n\n'
        for changed_file in changed_files:
            contents = get_contents_of_certain_file_in_certain_commit(commit_id, changed_file)
            blog_post += '## ' + changed_file + ': ##\n\n'
            blog_post += contents
            blog_post += '\n\n\n\n'

    # print(get_contents_of_certain_file_in_certain_commit('b37ae0371d1', 'blogplish.py'))

    # a_diff_2_commits_back = get_diff_of_certain_file_in_certain_commit('c4b7c7cabccc350eef5ef80344f', 'f66b7bfd0f82d5b987d9f71f', THIS_SCRIPT_NAME)
    # print(a_diff_2_commits_back)

    return blog_post


blog_text = auto_blogplish_blog()
print(blog_text)

Now step 2, adding the diff of each file that was changed:

blogplish.py

import sys
from subprocess import Popen, PIPE

THIS_SCRIPT_NAME = sys.argv[0]


"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    """ returns a commits_array like:

        [
            {'commit_id': '23hj3sz...', 'message': 'cleanup cruft'},
            {'commit_id': 'df8dje...', 'message': 'Changed paypal api setting to...'},
            ...
        ]
    """
    commit_start_rgx = r"^commit (?P<commit_id>\w{40})"
    lines = text_output.split('\n')
    commits_array = []
    current_commit_id = None
    current_commit_message_string = ""

    for line in lines:
        match = re.match(commit_start_rgx, line)
        if match:
            # this if block fails only once, on the first pass through
            if current_commit_id:
                commits_array.append({'commit_id': current_commit_id, 'message': current_commit_message_string.strip()})
            current_commit_id = match.group('commit_id')
            current_commit_message_string = ""
        else:
            if not line.startswith('Author: ') and not line.startswith('Date: '):
                current_commit_message_string += line

    return commits_array


def get_files_that_were_changed_in_commit(commit_id):
    # "get files that were changed in a commit": https://stackoverflow.com/questions/424071/how-to-list-all-the-files-in-a-commit
    output, error = call_sp('git diff-tree --no-commit-id --name-only -r %s' % commit_id)
    if error:
        raise Exception("Error in get_files_that_were_changed_in_commit():\n\n" + error)
    changed_files_intermediary = output.split('\n')
    # at first got a result like ['blogplish.py', '']
    changed_files = [this_file for this_file in changed_files_intermediary if this_file]
    return changed_files


def get_contents_of_certain_file_in_certain_commit(commit_id, filename):
    # "get contents of a certain file in a commit": https://stackoverflow.com/questions/2497051/how-can-i-show-the-contents-of-a-file-at-a-specific-state-of-a-git-repo
    output, error = call_sp('git show %s:%s' % (commit_id, filename))
    if error:
        raise Exception("Error in get_contents_of_certain_file_in_certain_commit():\n\n" + error)
    return output


def get_diff_of_certain_file_in_certain_commit(newer_commit_id, older_commit_id, filename):
    """
    head diff means how many commits back, as in

        HEAD~3

    means 3 commits back
    """
    # "get dif of a certain file in certain commit": https://stackoverflow.com/questions/42357521/generate-diff-file-of-a-specific-commit-in-git
    command = 'git diff {older_commit_id}..{newer_commit_id} {filename}'.format(older_commit_id=older_commit_id, newer_commit_id=newer_commit_id, filename=filename)
    raw_diff, error = call_sp(command)
    if error:
        raise Exception("Error in get_diff_of_certain_file_in_certain_commit():\n\n" + error)
    return raw_diff


def auto_blogplish_blog():
    blog_post = ""

    output, error = call_sp('git log')

    parsed_commits = parse_git_log_info(output)
    # "reverse a list python": https://stackoverflow.com/questions/3940128/how-can-i-reverse-a-list-in-python
    parsed_commits.reverse()

    for index, commit_data in enumerate(parsed_commits):
        blog_post += commit_data['message']
        blog_post += '\n\n\n\n'
        this_commit_id = commit_data['commit_id']

        changed_files = get_files_that_were_changed_in_commit(this_commit_id)

        if changed_files:
            if index > 0:
                blog_post += '$$$ Diffs of changed files: $$$\n\n'
                for changed_file in changed_files:
                    older_commit_id = parsed_commits[index - 1]['commit_id']
                    this_diff = get_diff_of_certain_file_in_certain_commit(older_commit_id, this_commit_id, changed_file)
                    blog_post += '## ' + changed_file + ': ##\n\n'
                    blog_post += this_diff
                    blog_post += '\n\n\n\n'

            blog_post += '$$$ Entire contents of changed files: $$$\n\n'
            for changed_file in changed_files:
                contents = get_contents_of_certain_file_in_certain_commit(this_commit_id, changed_file)
                blog_post += '## ' + changed_file + ': ##\n\n'
                blog_post += contents
                blog_post += '\n\n\n\n'

    return blog_post


blog_text = auto_blogplish_blog()
print(blog_text)

The output is still rough, and the diffs printed out are hard to read. It can use a CLI to take you through each commit, each file, and let the author pick how to show the changes in the blog. A javascript UI might be much easier than using a CLI, as you can click what to keep and edit text in place much easier. Overall, in 1 day, after work, with no beer or caffiene in the house, I'd say Servando and I did pretty good.

blogplish.py

import sys
from subprocess import Popen, PIPE

THIS_SCRIPT_NAME = sys.argv[0]


"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    """ returns a commits_array like:

        [
            {'commit_id': '23hj3sz...', 'message': 'cleanup cruft'},
            {'commit_id': 'df8dje...', 'message': 'Changed paypal api setting to...'},
            ...
        ]
    """
    commit_start_rgx = r"^commit (?P<commit_id>\w{40})"
    lines = text_output.split('\n')
    commits_array = []
    current_commit_id = None
    current_commit_message_string = ""

    for line in lines:
        match = re.match(commit_start_rgx, line)
        if match:
            # this if block fails only once, on the first pass through
            if current_commit_id:
                commits_array.append({'commit_id': current_commit_id, 'message': current_commit_message_string.strip()})
            current_commit_id = match.group('commit_id')
            current_commit_message_string = ""
        else:
            if not line.startswith('Author: ') and not line.startswith('Date: '):
                current_commit_message_string += line

    return commits_array


def get_files_that_were_changed_in_commit(commit_id):
    # "get files that were changed in a commit": https://stackoverflow.com/questions/424071/how-to-list-all-the-files-in-a-commit
    output, error = call_sp('git diff-tree --no-commit-id --name-only -r %s' % commit_id)
    if error:
        raise Exception("Error in get_files_that_were_changed_in_commit():\n\n" + error)
    changed_files_intermediary = output.split('\n')
    # at first got a result like ['blogplish.py', '']
    changed_files = [this_file for this_file in changed_files_intermediary if this_file]
    return changed_files


def get_contents_of_certain_file_in_certain_commit(commit_id, filename):
    # "get contents of a certain file in a commit": https://stackoverflow.com/questions/2497051/how-can-i-show-the-contents-of-a-file-at-a-specific-state-of-a-git-repo
    output, error = call_sp('git show %s:%s' % (commit_id, filename))
    if error:
        raise Exception("Error in get_contents_of_certain_file_in_certain_commit():\n\n" + error)
    return output


def get_diff_of_certain_file_in_certain_commit(newer_commit_id, older_commit_id, filename):
    """
    head diff means how many commits back, as in

        HEAD~3

    means 3 commits back
    """
    # "get dif of a certain file in certain commit": https://stackoverflow.com/questions/42357521/generate-diff-file-of-a-specific-commit-in-git
    command = 'git diff {older_commit_id}..{newer_commit_id} {filename}'.format(older_commit_id=older_commit_id, newer_commit_id=newer_commit_id, filename=filename)
    raw_diff, error = call_sp(command)
    if error:
        raise Exception("Error in get_diff_of_certain_file_in_certain_commit():\n\n" + error)
    return raw_diff


def auto_blogplish_blog():
    blog_post = ""

    output, error = call_sp('git log')

    parsed_commits = parse_git_log_info(output)
    # "reverse a list python": https://stackoverflow.com/questions/3940128/how-can-i-reverse-a-list-in-python
    parsed_commits.reverse()

    for index, commit_data in enumerate(parsed_commits):
        blog_post += commit_data['message']
        blog_post += '\n\n\n\n'
        this_commit_id = commit_data['commit_id']

        changed_files = get_files_that_were_changed_in_commit(this_commit_id)

        if changed_files:
            if index > 0:
                blog_post += '$$$ Diffs of changed files: $$$\n\n'
                for changed_file in changed_files:
                    older_commit_id = parsed_commits[index - 1]['commit_id']
                    this_diff = get_diff_of_certain_file_in_certain_commit(older_commit_id, this_commit_id, changed_file)
                    blog_post += '## ' + changed_file + ': ##\n\n'
                    blog_post += this_diff
                    blog_post += '\n\n\n\n'

            blog_post += '$$$ Entire contents of changed files: $$$\n\n'
            for changed_file in changed_files:
                contents = get_contents_of_certain_file_in_certain_commit(this_commit_id, changed_file)
                blog_post += '## ' + changed_file + ': ##\n\n'
                blog_post += contents
                blog_post += '\n\n\n\n'

    return blog_post


blog_text = auto_blogplish_blog()
print(blog_text)
x

Before the blog you're reading was autogenerated for publishing, I commented out the lines that show the diffs. It also lacked some autoformatting, such as code blocks in ```, and filenames printed pretty:

blogplish.py

import sys
from subprocess import Popen, PIPE

THIS_SCRIPT_NAME = sys.argv[0]


"""
Get all commit info

For each commit in the commit info:
    Add commit message to a final string
    Add changes to final string
    Add entire files that were changed to final string
"""

def call_sp(command, *args, **kwargs):
    """ you can run command from any directory you want by passing in a kwarg of 'cwd' (current working directory):

        call_sp('ls', '-a', cwd='/home/username/projects/awesomeproject')
    """
    if args:
        command = command.format(*args)
    p = Popen(command, shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, **kwargs)
    output, err = p.communicate()
    return output, err


def parse_git_log_info(text_output):
    """ returns a commits_array like:

        [
            {'commit_id': '23hj3sz...', 'message': 'cleanup cruft'},
            {'commit_id': 'df8dje...', 'message': 'Changed paypal api setting to...'},
            ...
        ]
    """
    commit_start_rgx = r"^commit (?P<commit_id>\w{40})"
    lines = text_output.split('\n')
    commits_array = []
    current_commit_id = None
    current_commit_message_string = ""

    for line in lines:
        match = re.match(commit_start_rgx, line)
        if match:
            # this if block fails only once, on the first pass through
            if current_commit_id:
                commits_array.append({'commit_id': current_commit_id, 'message': current_commit_message_string.strip()})
            current_commit_id = match.group('commit_id')
            current_commit_message_string = ""
        else:
            if not line.startswith('Author: ') and not line.startswith('Date: '):
                current_commit_message_string += line

    return commits_array


def get_files_that_were_changed_in_commit(commit_id):
    # "get files that were changed in a commit": https://stackoverflow.com/questions/424071/how-to-list-all-the-files-in-a-commit
    output, error = call_sp('git diff-tree --no-commit-id --name-only -r %s' % commit_id)
    if error:
        raise Exception("Error in get_files_that_were_changed_in_commit():\n\n" + error)
    changed_files_intermediary = output.split('\n')
    # at first got a result like ['blogplish.py', '']
    changed_files = [this_file for this_file in changed_files_intermediary if this_file]
    return changed_files


def get_contents_of_certain_file_in_certain_commit(commit_id, filename):
    # "get contents of a certain file in a commit": https://stackoverflow.com/questions/2497051/how-can-i-show-the-contents-of-a-file-at-a-specific-state-of-a-git-repo
    output, error = call_sp('git show %s:%s' % (commit_id, filename))
    if error:
        raise Exception("Error in get_contents_of_certain_file_in_certain_commit():\n\n" + error)
    return output


def get_diff_of_certain_file_in_certain_commit(newer_commit_id, older_commit_id, filename):
    """
    head diff means how many commits back, as in

        HEAD~3

    means 3 commits back
    """
    # "get dif of a certain file in certain commit": https://stackoverflow.com/questions/42357521/generate-diff-file-of-a-specific-commit-in-git
    command = 'git diff {older_commit_id}..{newer_commit_id} {filename}'.format(older_commit_id=older_commit_id, newer_commit_id=newer_commit_id, filename=filename)
    raw_diff, error = call_sp(command)
    if error:
        raise Exception("Error in get_diff_of_certain_file_in_certain_commit():\n\n" + error)
    return raw_diff


def auto_blogplish_blog():
    blog_post = ""

    output, error = call_sp('git log')

    parsed_commits = parse_git_log_info(output)
    # "reverse a list python": https://stackoverflow.com/questions/3940128/how-can-i-reverse-a-list-in-python
    parsed_commits.reverse()

    for index, commit_data in enumerate(parsed_commits):
        blog_post += commit_data['message']
        blog_post += '\n\n\n\n'
        this_commit_id = commit_data['commit_id']

        changed_files = get_files_that_were_changed_in_commit(this_commit_id)

        if changed_files:
            # if index > 0:
            #     blog_post += '$$$ Diffs of changed files: $$$\n\n'
            #     for changed_file in changed_files:
            #         older_commit_id = parsed_commits[index - 1]['commit_id']
            #         this_diff = get_diff_of_certain_file_in_certain_commit(older_commit_id, this_commit_id, changed_file)
            #         blog_post += '## ' + changed_file + ': ##\n\n'
            #         blog_post += this_diff
            #         blog_post += '\n\n\n\n'

            # blog_post += '$$$ Entire contents of changed files: $$$\n\n'
            for changed_file in changed_files:
                contents = get_contents_of_certain_file_in_certain_commit(this_commit_id, changed_file)
                # blog_post += '## ' + changed_file + ': ##\n\n'
                blog_post += '`' + changed_file + '`\n\n'
                blog_post += "```" + contents + "```"
                blog_post += '\n\n\n\n'

    return blog_post


blog_text = auto_blogplish_blog()
print(blog_text)