Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

katossky / panorama-bigdata Public

Notifications You must be signed in to change notification settings
Fork 0
Star 3

Code
Issues 1
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Breadcrumbs

panorama-bigdata

/

README.md

Latest commit

History

80 lines (53 loc) · 2.21 KB

Breadcrumbs

panorama-bigdata

/

README.md

File metadata and controls

80 lines (53 loc) · 2.21 KB

Enjeux des données massives

Objectifs

comprendre les réels enjeux des données massives (big data)
démystifier le "cloud" et le "big data" : beaucoup de problèmes sont juste mal posés et ne nécessitent pas de traitement particulier
utiliser de la ligne de commande
utiliser des infrastructures de calcul distantes
utiliser des infrastructures de calcul distribué
utiliser des infrastructures de calcul en flux

à compléter

Structure du cours, organisation en séances

Cours magistral

TP

lk
d

à préciser

Examens

Proposition de @katossky:

des Kahoots en début de cours et à la rentrée de la pause
des QCM post TP sur Moodle
un examen supplémentaire à définir (compte-rendu de TP? examen sur table? mini-projet ex: lecture d'un article?)

Bibliographie

Karau, H., Konwinski, A., Wendell, P. and Zaharia, M. (2019). LEARNING SPARK: O'REILLY MEDIA.
Documentation officielle amazon EMR : https://docs.aws.amazon.com/fr_fr/emr/latest/ReleaseGuide/emr-release-components.html
Doc Rstudio pour EMR : https://spark.rstudio.com/examples/yarn-cluster-emr/#set-up-the-cluster

Prochaines étapes

CM3

must have

nice to have

to do next
- re-read Shadi's course and include relevent topics
[ ]

After course has ended:

clean up directory
update readme

Before next session of the course

Romaric's read
Read books:
- Principles of Distributed Databases
- Distributed Computing for Big Data Analytics
- https://hadoop.apache.org/docs/stable

Possible improvements

include more statiscal algorithms
in course computing, add ref to pay-as-you-go
add more cloud providers, such as IBM, OpenStack, Digital Ocean...
dif.. scale-in scale-out dans le premier cours
uniformize titles
add colors (if possible by CSS) and images
read "Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services"
add the distinction between single pass, double pass, etc.
mention the concept of single-point-of-failure in introduction of the distributed system part

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.