Skip to content

Latest commit

 

History

History
83 lines (52 loc) · 2.42 KB

README.md

File metadata and controls

83 lines (52 loc) · 2.42 KB

chipotle-clustering-challenge

becode, team challenge

Must-have features

  • A visualisation of the USA with chipotle locations
  • Visualization of the different clusters
  • Intrinsic analysis comparison of the clusters of at least 2 methods with varying arguments (using euclidian distance as criteria)
  • A chosen centroid to live. Make your argument of why the chosen centroid is superior to others. Examples of arguments are:
    • highest density
    • greatest uninterrupted link of chipotle locations with smallest link-to-link distance
    • ...
  • a Github page where results are visualized

Steps

  • Create the repository
  • Install geopandas
  • Plot the US map
  • Visualize your data on this map
  • Plot a dendogram of your data to help you decide the appropropriate clustering resolution
  • Compare and analyse different clustering methods using intrinsic analysis to decide on a chosen method.
  • Choose a centroid/adress to live
  • Publish your results to a Github page with an explanation of your method.

Vizualisation

Dataset "chipotle_stores" as "df"

chipotle_stores

Dataset "states" using geopandas

states

At first we tried to draw the map of the states using geopandas...

states_map

...in order to be able to plot our points directly on the map.

chipotle_map

Data analysis

Now we have to choosing only relevant states

value_counts

Easy using this code

df = df.groupby('state').filter(lambda x : len(x)>20)

And next ...

  • Check number of invalid metric entries.
  • Adjusting index

Our map looks much clearer like this

chipotle_map

Dendogram

chipotle_map

Let's cluster all of that

Clusters

cluster

Center of Clusters

Snip20201215_8

Clusters @ Califonia

ca

Center of Clusters @ Califonia

Snip20201215_7