Description The field of data science offers an abundance of intriguing and captivating challenges, ranging from identifying relevant research questions to selecting suitable features, training models, and interpreting results. However, all of these endeavors rely on a well-organized and structured dataset that can be analyzed and modeled. According to industry experts, collecting and preparing data typically constitutes around 75% of any analytical project.
Although this course is called “Data Acquisition,” it is important to recognize that obtaining data is merely the first step. After acquiring the data, we must organize it into structured formats and often extract meaningful insights from raw data. For example, we may need to distill a Twitter feed into a single positive or negative sentiment score for a specific user. This course will provide you with the essential skills to collect, structure, consolidate, and extract insights from diverse data sources, preparing you for effective analytical work. Along the way, you will develop expertise in various tools and technologies, including the command line, git, Selenium, Flask and APIs.
Grading Exams 60%, HWs 30%, Labs 10%
To successfully pass the course, a student must achieve a minimum grade of 50% on the exams.
Instructor Yannet Interian (yinterian@usfca)
Here is a tentative schedule for the course:
Date | Topic | Notebooks | Homework / Lab |
---|---|---|---|
Aug 25 | HTLM & BeautifulSoup | lecture_1_beautifulsoup.ipynb | HW1: BeautifulSoup (due Aug 31) |
Aug 28 | Terminal & command-line | Lab 1 | |
Sep 1 | Command-line & git | Lab 2 | |
Sep 4 | Labor Day - No Class | ||
Sep 8 | Data Formats, XML, XPath | lecture_4_xml_xpath.ipynb, lab_3.ipynb | HW2: Data pipeline (due Sep 14) Lab 3 |
Sep 11 | Text feature extraction | lecture_5_feature_from_text.ipynb | |
Sep 15 | Text manipulation, Spacy | lecture_6_spacy.ipynb, lab_lec_6.ipynb | HW3: TFIDF (due Sep 21), Lab 4 |
Sep 18 | EXAM/ Hashtable implementations | lecture_7_hashtable.ipynb, lecture_7_dict.ipynb | |
Sep 22 | Scraping with Selenium | HW4: Hash table (due Sep 28) | |
Sep 25 | Selenium Part 2 | ||
Sep 29 | Flask | HW5: Recommending server (due Oct 5) | |
Oct 2 | REST API | ||
Oct 6 | End to end Data Science Project | ||
Oct 9 | GPT for creating labels | ||
Oct 13 | Exam |