Skip to content

Commit

Permalink
doc: expand coding standards
Browse files Browse the repository at this point in the history
  • Loading branch information
emdupre committed May 13, 2024
1 parent e49e5af commit 87fb4b8
Show file tree
Hide file tree
Showing 2 changed files with 78 additions and 20 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.DS_Store
.token
/.quarto/
config.toml
Expand Down
97 changes: 77 additions & 20 deletions labguide/research/coding_standards.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,87 @@
# Coding standards

- Code should be readable
Code is a core research product of the lab.
We expect that lab members write code with the intention of it being reviewed (and potentially re-used) by other lab members.

- All lab members should be
familiar with principles of readable coding:
To help in this, all lab members should be familiar with principles of clean code.
Dr. Poldrack has a [presentation on this topic](https://github.com/poldrack/clean_coding/blob/master/CleanCoding_Python.pdf) that we encourage you to review.
For a more in-depth introduction, two particularly useful recommendations here are:
- [Art of Readable Code](https://www.oreilly.com/library/view/the-art-of/9781449318482/)
- [Clean Code](https://www.oreilly.com/library/view/clean-code-a/9780136083238/)

- Code should be modular
- Functions should do a single
thing that is clearly expressed in the name of the function
- Functions should include a
docstring that clearly specifies input and output
When writing readable code, many different design patterns can be followed.
We therefore provide some general purpose recommendations below as well as some Python examples taken from [Dr. Poldrack's clean code tutorial](https://github.com/poldrack/clean_coding/tree/master/python_example).

- Code should be portable
- Any absolute paths should be
specified as a variable in a single location, or preferably as a
command line argument
- Any required environment
variables should be clearly described
- Any non-standard requirements
(e.g. Python libraries not available through PYPI) should be
described with instructions on how to install
We also recommend checking out relevant tutorials, like the [Good Research Code Handbook](https://goodresearch.dev/index.html).

- Important functions should be
tested
## Code should be modular

Write code such that it can be reviewed as individual "units," each of which have one well-scoped function.


- Functions should do a single thing that is clearly expressed in the name of the function
- Functions should include a docstring that clearly specifies input and output

```python
sc=[]
for i in range(data.shape[1]):
if data.columns[i].split('.')[0][-7:] == '_survey':
sc=sc+[data.columns[i]]
data=data[sc]
```

Compare this with a modularized refactoring:

```python
def extract_surveys_from_behavioral_data(behavioral_data_raw):
"""
Extract survey data from behavioral data.
survey variables are labeled <survey_name>_survey.<variable name>
so filter on variables that include "_survey" in their name
Parameters
----------
behavioral_data_raw : pandas.DataFrame
"""
survey_variables = [i for i in behavioral_data_raw.columns if i.find('_survey') > -1]
return(behavioral_data_raw[survey_variables])
```

## Code should be portable

Aim to be able to execute your code on a new machine.

- Any absolute paths should be specified as a variable in a single location, or preferably as a command line argument
- Any required environment variables should be clearly described
- Any non-standard requirements (e.g. Python libraries not available through PyPI) should be described with instructions on how to install

Here is an example of what _not_ to do:

```python
h=read_csv('https://raw.githubusercontent.com/poldrack/clean_coding/master/data/health.csv',index_col=0)[hc].dropna().mean(1)
```

Compare this with a modular, portable refactoring:

```python
# load health data
def load_health_data(datadir, filename='health.csv'):
return(pd.read_csv(os.path.join(datadir, filename), index_col=0))
```

## Important functions should be tested

Functions that are critical for correct outputs should be tested.
At a minimum, unit tests should be written to check that the correctness of outputs.
For example, here is a minimal unit test to ensure that two data frames have the same index:

```python
def confirm_data_frame_index_alignment(df1, df2):
assert all(df1.index == df2.index)
```

More detailed recommendations on testing---including testing frameworks such as PyTest---are available in the [Hitchhiker's Guide to Python](https://docs.python-guide.org/writing/tests/).

## Python packaging

For projects that aim to develop pip-installable packages should follow current best-practices in Python Packing.
As of May 2024, this is outlined in [this blog post](https://effigies.gitlab.io/posts/python-packaging-2023/) by lab member Chris Markiewicz.

0 comments on commit 87fb4b8

Please sign in to comment.