Description The field of data science offers an abundance of intriguing and captivating challenges, ranging from identifying relevant research questions to selecting suitable features, training models, and interpreting results. However, all of these endeavors rely on a well-organized and structured dataset that can be analyzed and modeled. According to industry experts, collecting and preparing data typically constitutes around 75% of any analytical project.
Although this course is called “Data Acquisition,” it is important to recognize that obtaining data is merely the first step. After acquiring the data, we must organize it into structured formats and often extract meaningful insights from raw data. For example, we may need to distill a Twitter feed into a single positive or negative sentiment score for a specific user. This course will provide you with the essential skills to collect, structure, consolidate, and extract insights from diverse data sources, preparing you for effective analytical work. Along the way, you will develop expertise in various tools and technologies, including the command line, git, Selenium, Flask and APIs.
Grading Exams 60%, HWs 30%, Labs 10%
Instructor Yannet Interian (yinterian@usfca)
Topics
- HTML & BeautifulSoup
- Terminal, command-line and git
- Pandas data manipulation
- Data Formats, XML, XPath
- Text feature extraction, text manipulation, Spacy
- Scraping the web with Selenium
- Flask
- REST APIs
- End to end Data Science Project
- ChatGPT and prompt engineering for creating labels