AD suggested changes to ReadMe for senior role #4

AmandaDoyle · 2022-08-04T17:51:27Z

Happy to talk through reasoning.

SashaWeinstein

This does incorporate most of the changes I had in mind. I still think we should make it harder, more confusing and more repetitive but I see that you have different thoughts on this

SashaWeinstein · 2022-08-04T18:42:45Z

README.md


-### Task 2: Data Aggregation
+To download 311 service request records write a script that takes 2 parameters passed from the command line: number of days and responding agency acronym. For example, if a user wanted to get all service request records created in the last week where DSNY is the responding agency, they would pass `7` and `DSNY` as the parameters. For this exercise, we ask that you download all 311 service requests filed the **last seven days** where **HPD** is the responding agency.  Save the data as a csv named `raw.csv` in a folder called `data`. 


The reason I would prefer to have them choose the number of days and responding agency is that it has them read the source data themselves and see what the responding agencies are. Having them save multiple files with names of their choice tests their ability to cache well-named files. I would prefer to see HPD_last_7.csv and DOT_last_10.csv with the filenames constructed by the python code rather than data1.csv and data2.csv.

Additionally, if we ask them to read the whole challenge before starting they will know not to chose one day or 1,000 days as these don't produce such good plots

@SashaWeinstein That makes sense to me and tests their data acumen but can see @AmandaDoyle point as well

SashaWeinstein · 2022-08-04T18:43:33Z

README.md

- `created_date_hour`: the timestap of request creation by date and hour
+Write a process to produce a time series table based on the `data/raw.csv`file we created in **Task 1** that has the following fields:
+
+- `created_date_time`: the timestap of request creation by date and hour OR just date
 - `complaint_type`: the type of the complaint


Having them pass the complaint type as an argument and having it be optional tests something that task 1 doesn't test. Optional args requires different implementation on the arg parse side and data processing side

SashaWeinstein · 2022-08-04T18:46:26Z

README.md


-### Task 4: Spatial data processing
+Create a multi-line plot to show the total service request counts by `created_date_time` for each `complaint_type`. Make sure you store the image of the plot in the `data` folder as a `.png` file.  


I think having them produce multiple plots with the multiple .csv's they cached is a good test of writing reusable data viz code that sets axes/titles programmatically based on what it's passed

I agree with Sasha especially after the work we've been doing with the QAQC app. It's great having the ability to communicate effective data viz in succinct code especially when it comes to the little formatting issues that inevitably come up

SashaWeinstein · 2022-08-04T18:46:57Z

README.md


-Depending on how you generate the map, you can store the map as a `.png` or `.html` under the `data` folder.
+At Data Engineering, we enhance datasets with geospatial attributes, such as point locations and administrative boundaries. To help us better understand the data from **Python Task 1**, we would like you to join the initial raw data to an NYC administraive boundary. Then create a choropleth map of the 7 day total count of complaints where `HPD` is the responding agency fot a specific `complaint_type` of your choice.


This seems good to me

SashaWeinstein · 2022-08-04T18:52:28Z

README.md

-> Note: Depending on your preference, you can use or [Postgres](https://www.postgresql.org/), which is prefered; however, if you are familiar with [SQLite](https://docs.python.org/3/library/sqlite3.html) (much easier to set up and use), you can use that too.
+- Set up POSTGIS container using an image.  [Here](https://registry.hub.docker.com/r/postgis/postgis/) is the one we use.
+- Load the `data/raw.csv` into a database and name the table `sample_311`. Make sure this process is captured in a script.
+- Perform the same aggregation in **Python Task 2** in SQL and store the results in a table (same name as the corresponding csv file).


seems good to me, you've read my thoughts on the file name and having the interviewee find the image themselves

mbh329 · 2022-08-04T19:31:15Z

All looks good to me. Language is clear.

Only ask for bash scripting as a bonus item for task 1. The data challenges we got for the data engineering position in the summer/fall of 2022 had a lot of copy/pasted bash code. Challenge will faster to complete if we only ask for it once. I got hung up on how to describe the date filter. I'm not sure if this language >> Write a python script to pulls data from the NYC Open DataAPI based two filters. The first filter is on responding agency. The second filter is an integer date filter to only get calls `n` days before the current date. is sufficiently clear

Add a couple sentences to remind the interviewee that they need to find new administrative boundaries to aggregate on. I think the original instructions were actually more clear that I assumed so less sure this upgrade is actually needed. Figured I would let the team give some input

Add a second bonus task to SQL/Docker task 2. The task is to push an image with the setup and code to the docker hub so we can pull it down and run the code.

include list of administrative boundaries that aren't valid choices for python task 4

Clarified instructions for python task 1

clarified the second introduction paragraph

Limit bash scripting to task 1

Don't make borough chloropleth

clarified instructions in docker bonus task

Docker hub bonus task

SashaWeinstein · 2022-11-02T21:01:13Z

does it make sense to close this PR now that we know we want to keep the advanced data challenge on a separate branch than main?

AD suggested changes to ReadMe for senior role

45482b5

Happy to talk through reasoning.

AmandaDoyle assigned td928, mbh329, Oysters1874 and SashaWeinstein and unassigned Oysters1874 Aug 4, 2022

SashaWeinstein approved these changes Aug 4, 2022

View reviewed changes

mbh329 approved these changes Aug 4, 2022

View reviewed changes

AmandaDoyle and others added 12 commits August 8, 2022 11:37

Adding in Bonus content

4705ee3

Update README.md

a7ff4d8

Docker hub bonus task

38487dc

Add a second bonus task to SQL/Docker task 2. The task is to push an image with the setup and code to the docker hub so we can pull it down and run the code.

Update README.md

cdcde76

include list of administrative boundaries that aren't valid choices for python task 4

Update README.md

2b67d69

Clarified instructions for python task 1

Update README.md

b7f705e

clarified the second introduction paragraph

Merge pull request #10 from NYCPlanning/9-Bash-Only-Task-1

40a8b19

Limit bash scripting to task 1

Merge pull request #11 from NYCPlanning/8-Clarify-Geospatial-Directions

1b3b8ad

Don't make borough chloropleth

Update README.md

d6bd156

clarified instructions in docker bonus task

Merge pull request #12 from NYCPlanning/6-Docker-Hub-Task

656eee5

Docker hub bonus task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AD suggested changes to ReadMe for senior role #4

AD suggested changes to ReadMe for senior role #4

AmandaDoyle commented Aug 4, 2022

SashaWeinstein left a comment

SashaWeinstein Aug 4, 2022

mbh329 Aug 4, 2022

SashaWeinstein Aug 4, 2022

SashaWeinstein Aug 4, 2022

mbh329 Aug 4, 2022

SashaWeinstein Aug 4, 2022

SashaWeinstein Aug 4, 2022

mbh329 commented Aug 4, 2022

SashaWeinstein commented Nov 2, 2022


		### Task 2: Data Aggregation
		To download 311 service request records write a script that takes 2 parameters passed from the command line: number of days and responding agency acronym. For example, if a user wanted to get all service request records created in the last week where DSNY is the responding agency, they would pass `7` and `DSNY` as the parameters. For this exercise, we ask that you download all 311 service requests filed the last seven days where HPD is the responding agency. Save the data as a csv named `raw.csv` in a folder called `data`.


		### Task 4: Spatial data processing
		Create a multi-line plot to show the total service request counts by `created_date_time` for each `complaint_type`. Make sure you store the image of the plot in the `data` folder as a `.png` file.


		Depending on how you generate the map, you can store the map as a `.png` or `.html` under the `data` folder.
		At Data Engineering, we enhance datasets with geospatial attributes, such as point locations and administrative boundaries. To help us better understand the data from Python Task 1, we would like you to join the initial raw data to an NYC administraive boundary. Then create a choropleth map of the 7 day total count of complaints where `HPD` is the responding agency fot a specific `complaint_type` of your choice.

AD suggested changes to ReadMe for senior role #4

Are you sure you want to change the base?

AD suggested changes to ReadMe for senior role #4

Conversation

AmandaDoyle commented Aug 4, 2022

SashaWeinstein left a comment

Choose a reason for hiding this comment

SashaWeinstein Aug 4, 2022

Choose a reason for hiding this comment

mbh329 Aug 4, 2022

Choose a reason for hiding this comment

SashaWeinstein Aug 4, 2022

Choose a reason for hiding this comment

SashaWeinstein Aug 4, 2022

Choose a reason for hiding this comment

mbh329 Aug 4, 2022

Choose a reason for hiding this comment

SashaWeinstein Aug 4, 2022

Choose a reason for hiding this comment

SashaWeinstein Aug 4, 2022

Choose a reason for hiding this comment

mbh329 commented Aug 4, 2022

SashaWeinstein commented Nov 2, 2022