Data Wrangling Syllabus

 DATA WRANGLING

UNIT-1

Introduction to Data Wrangling: What Is Data Wrangling?- Importance of Data Wrangling -How is Data Wrangling performed?- Tasks of Data Wrangling-Data Wrangling Tools-Introduction to Python-Python Basics-Data Meant to Be Read by Machines-CSV Data-JSON Data-XML Data.

UNIT-2

Working with Excel Files and PDFs: Installing Python Packages-Parsing Excel Files-Parsing Excel Files -Getting Started with Parsing-PDFs and Problem Solving in Python-Programmatic Approaches to PDF Parsing-Converting PDF to Text-Parsing PDFs Using pdf miner-Acquiring and Storing Data-Databases, A Brief Introduction-Relational Databases, MySQL and PostgreSQL-Non-Relational Databases, NoSQL-When to Use a Simple File-Alternative Data Storage.

UNIT-3

Data Cleanup: Why Clean Data? - Data Cleanup Basics-Identifying Values for Data Cleanup-Formatting Data-Finding Outliers and Bad Data-Finding Duplicates-Fuzzy Matching-RegEx Matching-Normalizing and Standardizing the Data-Saving the Data-Determining suitable Data Cleanup-Scripting the Cleanup- Testing with New Data.

UNIT-4

Data Exploration and Analysis: Exploring Data-Importing Data-Exploring Table Functions-Joining Numerous Datasets-Identifying Correlations-Identifying Outliers-Creating Groupings-Analyzing Data-Separating and Focusing the Data-Presenting Data-Visualizing the Data-Charts-Time-Related Data-Maps- Interactives -Words-Images, Video, and Illustrations-Presentation Tools-Publishing the Data-Open Source Platforms.

UNIT-5

Web Scraping: What to Scrape and How-Analyzing a Web Page-Network/Timeline-Interacting with JavaScript-In-Depth Analysis of a Page-Getting Pages-Reading a Web Page-Reading a Web Page with LXML-XPath-Advanced Web Scraping-Browser-Based Parsing-Screen Reading with Selenium-Screen Reading with Ghost.Py, Spidering the Web-Building a Spider with Scrapy-Crawling Whole Websites with Scrapy.


TEXT BOOKS:

1. Jacqueline Kazil& Katharine Jarmul, “Data Wrangling with Python”, O’Reilly Media, Inc,2016
2. McKinney, William, Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, 2 nd Edition, O’Reilly, 2017

Comments