Statistics for Data Science and Analytics
Description
Statistics for Data Science and Analytics is a comprehensive guide to statistical analysis using Python, presenting important topics useful for data science such as prediction, correlation, regression, and data exploration. The authors provide an introduction to statistical science and big data, as well as an overview of Python data structures and operations. Read more…
Resources
-
Getting Started with Python
Instructions for installing Python
-
Videos
Videos Mentioned in the Text
-
Try It Yourself
These files are short answers to the “Try It Yourself” questions in the text.
-
Exercise Answers for Instructors
Please fill out this form to request the answers for each chapter.
-
GitHub Repository
You will find the following files:
- datasets.zip: contains all datafiles used in the book
- notebooks.zip: Jupyter notebooks with code from chapters and the Python sections of each chapter
- python.zip: raw Python files -
Instructor Digital Evaluation Copy Request
Please complete and submit this form to request your digital evaluation copy.
About Us
-
Peter Bruce
Peter Bruce is the Founder of the Institute for Statistics Education at Statistics.com, a privately-owned online educational institution. Since its creation in 2002, the Institute has specialized in introductory and graduate level online education in statistics, machine learning, data science, optimization, and other subjects in quantitative analytics.
Prior to founding the Institute, in partnership with the noted economist Julian Simon, Peter continued and commercialized the development of Simon's Resampling Stats, a tool for bootstrapping and resampling. In his work at Cytel Software Corp., he developed Box Sampler along similar lines, and helped bring XLMiner, a machine learning add-in for Excel, to market. He is a co-author of Practical Statistics for Data Science, Machine Learning for Business Analytics, and Responsible Data Science.. He is also the author of Introductory Statistics and Analytics.
Prior to his work in statistics, Peter worked in the US diplomatic corps as a Foreign Service Officer.
-
Peter Gedeck
Dr. Peter Gedeck holds a Ph.D. in chemistry. He worked for twenty years as a computational chemist in drug discovery at Novartis in the United Kingdom, Switzerland, and Singapore. His research interests include the application of statistical and machine learning methods to problems in drug discovery. He is a scientist in the research informatics team at Collaborative Drug Discovery, which offers the pharmaceutical industry cloud-based software to manage the huge amount of data involved in the drug discovery process.
Peter’s specialty is the development of machine learning algorithms to predict biological and physicochemical properties of drug candidates. His scientific work is published in more than 50 peer reviewed articles and five books.
Peter is also a lecturer at the University of Virginia's School of Data Science teaching courses for the Master's program.
-
Janet Dobbins
Janet Dobbins is the Chair of the Board of Directors for Data Community DC (dc2). a nonprofit 501(3)(c) organization committed to connecting and promoting the work of data professionals in the National Capital Region by fostering education, opportunity, and professional development through high-quality, community-driven events, resources, products and services. She worked for nearly twenty years as a Vice President of Strategic Partnerships at The Institute for Statistics Education at Statistics.com. She directed community outreach, communication, and marketing efforts, working with colleges, universities, and industry teams to develop innovative curriculum and help teams acquire necessary skills.
Janet co-organizes monthly Data Science DC meetups and works with Arlington Tech Program (an alternative Arlington Public High School) to create a mentorship program for 9-12 grade girls interested in STEM.