Books to read if you want to become a data scientist
Moving from academia to industry
If you are a graduate student in a quantitative discipline (engineering, biology, even sociology), you can make it in industry as a data scientist. But you do need a bit of preparation because data science is more than just statistics.
How to prepare for the interview
You have the necessary mathematics and programming skills, but are probably unfamiliar with business problems (e.g., life-time value), tools (e.g., data warehouses), frameworks (e.g., cloud computing), or visualization methods (e.g., Harvey balls). And these are what your employers want (and will ask you in interviews). Round out your knowledge, and you will be a strong candidate for a job that typically pays $95k/yr to start.
How? I suggest that you read these five books:
- Effective Pandas by Matt Harrison
- Choose any one of these three books on data warehouses: (a) BigQuery: The Definitive Guide (b) Jumpstart Snowflake (c) AWS Redshift. I prefer BigQuery because it has a great free tier, but then I’m biased.
- Predictive Analytics by Eric Siegel
- Art of Statistics by David Spiegelhalter
- Fundamentals of Data Visualization by Claus Wilke
More details on why I chose these five books on the Shepherd books site.
Full stack data scientist
While these books will get you in the door at most companies, they are not enough at the really higher-paying firms. Those positions will require that you know how to operationalize data science, and take it from idea to production.
If you want to get an idea of what a “full-stack data scientist” in industry does, read my book “Data Science on the Google Cloud Platform” where I work through a typical business problem from ideation all the way to operationalization.
Even if you are not going to be working in Google Cloud, it should help orient you in a firm that uses AWS or Azure.
From Data Science to Machine Learning
The last few chapters of the book above are about machine learning. That is because the trend in industry is to automate decisions made on the basis of data. The way to do such automation, whether it is for forecasting (such as demand forecasting), personalization (such as recommendation systems), or extracting operational insights (such as document AI or call center AI) is through machine learning.
To go deeper on machine learning, I recommend these books.
Good luck!