Go from messy, unstructured artifacts stored in SQL and NoSQL databases
to a neat, well-organized dataset with this quick reference for the busy
data scientist. Understand text mining, machine learning, and network
analysis; process numeric data with the NumPy and Pandas modules;
describe and analyze data using statistical and network-theoretical
methods; and see actual examples of data analysis at work. This one-stop
solution covers the essential data science you need in Python.
Data science is one of the fastest-growing disciplines in terms of
academic research, student enrollment, and employment. Python, with its
flexibility and scalability, is quickly overtaking the R language for
data-scientific projects. Keep Python data-science concepts at your
fingertips with this modular, quick reference to the tools used to
acquire, clean, analyze, and store data.
This one-stop solution covers essential Python, databases, network
analysis, natural language processing, elements of machine learning, and
visualization. Access structured and unstructured text and numeric data
from local files, databases, and the Internet. Arrange, rearrange, and
clean the data. Work with relational and non-relational databases, data
visualization, and simple predictive analysis (regressions, clustering,
and decision trees). See how typical data analysis problems are handled.
And try your hand at your own solutions to a variety of medium-scale
projects that are fun to work on and look good on your resume.
Keep this handy quick guide at your side whether you’re a student, an
entry-level data science professional converting from R to Python, or a
seasoned Python developer who doesn’t want to memorize every function
You need a decent distribution of Python 3.3 or above that includes at
least NLTK, Pandas, NumPy, Matplotlib, Networkx, SciKit-Learn, and
BeautifulSoup. A great distribution that meets the requirements is
Anaconda, available for free from www.continuum.io. If you plan to set
up your own database servers, you also need MySQL (www.mysql.com) and
MongoDB (www.mongodb.com). Both packages are free and run on Windows,
Linux, and Mac OS.
Dmitry Zinoviev has an MS in Physics from Moscow State University
and a PhD in Computer Science from Stony Brook University. His research
interests include computer simulation and modeling, network science,
social network analysis, and digital humanities. He has been teaching at
Suffolk University in Boston, MA since 2001.