diff --git a/README.md b/README.md index 7817216..281f5c2 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,6 @@ -# Recipes for data analysis: _An attempt to bring structure to an unstructured activity_ +# Recipes for data analysis + +_An attempt to bring structure to an unstructured activity._ According to most sources, data-analysis is a _well-defined_ process in which specialists (data-analysts) *clean, transform, model and question data* for *helping businesses make intelligent decisions*. @@ -9,10 +11,14 @@ The lack of documentation leads to large differences in how data-analysts operat This project lists a collection of recipes (*do's and don'ts*) for analysts to make them more effective in their data analysis engagements. Listed recipes are grouped into the following four categories: -1. [General](https://github.com/srctaha/recipes-for-data-analysis/blob/master/1-general.md) -1. [Software](https://github.com/srctaha/recipes-for-data-analysis/blob/master/2-software.md) -1. [Statistics](https://github.com/srctaha/recipes-for-data-analysis/blob/master/3-statistics.md) -1. [Social](https://github.com/srctaha/recipes-for-data-analysis/blob/master/4-social.md) +1. [General](https://github.com/srctaha/recipes-for-data-analysis/blob/master/general.md) +1. [Software](https://github.com/srctaha/recipes-for-data-analysis/blob/master/software.md) +1. [Statistics](https://github.com/srctaha/recipes-for-data-analysis/blob/master/statistics.md) +1. [Social](https://github.com/srctaha/recipes-for-data-analysis/blob/master/social.md) +1. [Writing](https://github.com/srctaha/recipes-for-data-analysis/blob/master/writing.md) +1. [Politics](https://github.com/srctaha/recipes-for-data-analysis/blob/master/politics.md) + +`TODO:` Link references to recipes. --- diff --git a/1-general.md b/general.md similarity index 86% rename from 1-general.md rename to general.md index 8db3d24..08be3dd 100644 --- a/1-general.md +++ b/general.md @@ -54,16 +54,6 @@ Define the deliverables of your work as follows: TODO: explain the purpose of (3) -### Build a solid writing habit -Data analysts often face with the challenge of demonstrating their sophistication as a communicator, which requires them to develop advanced writing skills. To achieve effectiveness in writing, book 15~30 min everyday in your calendar and practice. - -### Be thoughtful while using jargon and acronyms in your documents -> AGC issues are highly associated with SR churn. - -Although such beginnings might be common in presentations and emails reporting analysis findings, and although _some_ clients might not have much difficulty in decoding such abbreviations, it would be wise to spell out terms on their first occurrence. - -Please consider _future yourself_ as one of the current clients as well, and prepare documents accordingly. - ### Aim high while defining your role as an analyst Say you work for an insurance company, and the property insurance team wants you to take a look at existing policyholder information and publicly available flood risk data to investigate if there are things that they need to reconsider in their operations. @@ -81,4 +71,11 @@ What value do you think you would generate under those 5 scenarios if we would r Going from *raw data servicing* to *new product proposals*, your value proposition would increase exponentially. Challenges would definitely increase exponentially too, but do not let those challenges define your role. +### Extra +>* Remember that time management is the key +>* Establish a weekly day to screen and process requests +>* Give presentations, but find a way to spread knowledge more deeply and continuously +>* Draw your processes and find a way to attach value to every project +>* Try to stay lean and simple as long as you can + [Return to README](https://github.com/srctaha/recipes-for-data-analysis/blob/master/README.md) diff --git a/politics.md b/politics.md new file mode 100644 index 0000000..7fb58b3 --- /dev/null +++ b/politics.md @@ -0,0 +1,11 @@ +## Politics + +### Summary +>* Find friendly managers and start working with them +>* Start producing value as soon as possible +>* Communicate everything you do to as many people as possible at every level +>* Be nice, explain everything clearly and help everyone +>* Avoid jerks, for real +>* Find a lightning rod absorbing damage in your stead + +[Return to README](https://github.com/srctaha/recipes-for-data-analysis/blob/master/README.md) diff --git a/references.md b/references.md index ca64843..6597d98 100644 --- a/references.md +++ b/references.md @@ -10,5 +10,6 @@ 1. Cheng-Tao Chu, 2014, "Machine Learning Done Wrong", [Blog-post](http://ml.posthaven.com/machine-learning-done-wrong) 1. Arthur Charpentier, 2015, "Variable Importance with Correlated Features", [Blog-post](https://freakonometrics.hypotheses.org/20545) 1. Claudia Perlich, "All the Data and Still Not Enough!" at Strata + Hadoop World in New York, 2014, [Video](https://www.oreilly.com/learning/all-the-data-and-still-not-enough) +1. "The most difficult thing in data science: Politics" [Blog-post](https://www.rdisorder.eu/2017/09/13/most-difficult-thing-data-science-politics/) [Return to README](https://github.com/srctaha/recipes-for-data-analysis/blob/master/README.md) diff --git a/4-social.md b/social.md similarity index 100% rename from 4-social.md rename to social.md diff --git a/2-software.md b/software.md similarity index 100% rename from 2-software.md rename to software.md diff --git a/3-statistics.md b/statistics.md similarity index 100% rename from 3-statistics.md rename to statistics.md diff --git a/writing.md b/writing.md new file mode 100644 index 0000000..cc90e71 --- /dev/null +++ b/writing.md @@ -0,0 +1,24 @@ +## Writing + +### Build a solid writing habit +Data analysts often face with the challenge of demonstrating their sophistication as a communicator, which requires them to develop advanced writing skills. To achieve effectiveness in writing, book 15~30 min everyday in your calendar and practice. + +### Be thoughtful while using jargon and acronyms in your documents +> AGC issues are highly associated with SR churn. + +Although such beginnings might be common in presentations and emails reporting analysis findings, and although _some_ clients might not have much difficulty in decoding such abbreviations, it would be wise to spell out terms on their first occurrence. + +Please consider _future yourself_ as one of the current clients as well, and prepare documents accordingly. + +### Write in a way that your message is _falsifiable_ +TODO: Explain + +### Adverbs are often _unnecessary_. Use them wisely +> Write with _nouns_ and _verbs_, not with adjectives and adverbs. The adjective hasn't been built that can pull a weak or inaccurate noun out of a tight place. — Strunk and White in "The Elements of Style" + +> The adverb is not your friend. — Stephen King in "On Writing" + +* Intensifying adverbs such as _very_, _quite_, _incredibly_, and _extremely_ are mostly pointless, as they do not alter the argument. +* Hedging adverbs such as _almost_, _generally_, _sometimes_, and _usually_ indicate uncertainty. + +[Return to README](https://github.com/srctaha/recipes-for-data-analysis/blob/master/README.md)