Skip to content

Freeman-gif/msds692

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Acquisition

Description The field of data science offers an abundance of intriguing and captivating challenges, ranging from identifying relevant research questions to selecting suitable features, training models, and interpreting results. However, all of these endeavors rely on a well-organized and structured dataset that can be analyzed and modeled. According to industry experts, collecting and preparing data typically constitutes around 75% of any analytical project.

Although this course is called “Data Acquisition,” it is important to recognize that obtaining data is merely the first step. After acquiring the data, we must organize it into structured formats and often extract meaningful insights from raw data. For example, we may need to distill a Twitter feed into a single positive or negative sentiment score for a specific user. This course will provide you with the essential skills to collect, structure, consolidate, and extract insights from diverse data sources, preparing you for effective analytical work. Along the way, you will develop expertise in various tools and technologies, including the command line, git, Selenium, Flask and APIs.

Grading Exams 60%, HWs 30%, Labs 10%

To successfully pass the course, a student must achieve a minimum grade of 50% on the exams.

Instructor Yannet Interian (yinterian@usfca)

Syllabus

Here is a tentative schedule for the course:

DateTopicNotebooksHomework / Lab
Aug 25HTLM & BeautifulSouplecture_1_beautifulsoup.ipynbHW1: BeautifulSoup (due Aug 31)
Aug 28Terminal & command-lineLab 1
Sep 1Command-line & gitLab 2
Sep 4Labor Day - No Class
Sep 8Data Formats, XML, XPathlecture_4_xml_xpath.ipynb, lab_3.ipynbHW2: Data pipeline (due Sep 14) Lab 3
Sep 11Text feature extractionlecture_5_feature_from_text.ipynb
Sep 15Text manipulation, Spacylecture_6_spacy.ipynb, lab_lec_6.ipynbHW3: TFIDF (due Sep 21), Lab 4
Sep 18EXAM/ Hashtable implementationslecture_7_hashtable.ipynb, lecture_7_dict.ipynb
Sep 22Scraping with SeleniumHW4: Hash table (due Sep 28)
Sep 25Selenium Part 2
Sep 29FlaskHW5: Recommending server (due Oct 5)
Oct 2REST API
Oct 6End to end Data Science Project
Oct 9GPT for creating labels
Oct 13Exam

About

MSDS692 Data acquisition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 53.3%
  • Jupyter Notebook 45.9%
  • Other 0.8%