Many people complain that they are not practicing in their academic field or that their degree is essentially useless. I do not feel that way at all. In fact, I am a strong advocate for formal post-secondary education in fields like mathematics, statistics, and computer science for aspiring data scientists. While it is possible for some non-traditional students to excel, and I applaud them for diversifying the field, this may prove a more difficult path. My personal journey includes a BS in Mathematics and an MS in Statistics, taking a mix of applied and theoretical coursework.
Data science is a career where building a solid theoretical and practical foundation is crucial, but it doesn’t stop there. You’ll inevitably encounter gaps in your knowledge—tools you haven’t used, frameworks you haven’t learned—and that’s okay. A Data Scientist is a lifelong student, always expanding their skillset to meet the demands of an ever-changing field.
When you’re a data scientist, do you want to be the one who just runs pre-built models, or the one who deeply understands the tools and processes?
Math and Stats: The Foundation of Data Science
This will probably be the most controversial thing in this guide: data science is a branch of applied mathematics. Some may argue that it is in fact a branch of computer science and I would also agree with them – but I’d remind them that compute is a math term. Without it, it would just be “uter science,” and that’s ridiculous.
So, a data scientist should have a strong mathematical background.
Mathematics
Calculus
Calculus is foundational. Without it, optimizing most models would be impossible. Many models aim to minimize an error or distance function—a basic topic covered in univariate calculus (Calculus I). However, most models involve multiple inputs, and sometimes multiple outputs, meaning their optimization functions are often multivariable (Calculus III).
An introduction to calculus is essential, but equally important is understanding how its logic scales up to higher dimensions. While you likely won’t compute these values by hand, you should grasp why they work and recognize their potential pitfalls. To build a solid foundation, you should aim to complete three semesters or an equivalent of calculus.
Linear Algebra
Most mathematics problems, especially in applied fields, reduce to linear algebra problems—and this is especially true in data science. Your dataset is essentially a giant matrix of values on which you, the practitioner, perform mathematical functions. This is the essence of what data science is. Computers have simply allowed our matrices to become much larger.
An introduction to linear algebra is essential, covering topics like basic matrix operations, decompositions, and eigen-operations. If possible, beyond an introduction to linear algebra, aim for a few additional courses that either focus on linear algebra or incorporate it extensively, as this foundation will be invaluable in your work as a data scientist.
Proofs and Logic
Although not a strict requirement, I strongly recommend taking at least one or two proof-based courses. These are the pure math-type courses, such as Advanced Calculus, Real Analysis, Probability Theory, or Mathematical Statistics. While it might feel odd to label Mathematical Statistics as pure math, my advocacy is for the proof-based thought process it cultivates.
Proof-based courses are not only helpful but often essential for developing clear, logical thinking. Proof-based courses develop logical thinking and problem-solving skills, which are invaluable when debugging models or verifying assumptions in data workflows. They teach you how to structure ideas, verify assumptions, and build confidence in your reasoning—skills that are critical for tackling complex problems in data science.
Statistics
Take as many statistics courses as you can! One of my statistics professors once said, ‘Data science is just statistics done poorly.’ While I strongly disagree with that sentiment, it can become true if you lack the statistical knowledge to recognize and avoid statistical malpractice.
To achieve entry-level proficiency, you’ll likely need at least two or three solid statistics courses. I recommend two applied, calculus-based statistics courses at the junior or senior undergraduate level, along with Probability Theory and Mathematical Statistics. Bonus: the latter also fulfills my earlier recommendation for a proof-based course.
Computer Science
Computer science provides the tools and frameworks that enable data scientists to apply their theoretical knowledge in practical and scalable ways. While mathematics and statistics form the backbone of data science, computer science ensures that these concepts can be implemented efficiently and effectively. Key areas to focus on include programming fundamentals, algorithms and data structures, and databases. These topics are essential for optimizing workflows, handling large datasets, and understanding the computational underpinnings of data science tools and algorithms.
For a well-rounded education, I recommend starting with an Introduction to Computer Science course to build a strong foundation in programming and core computational concepts. From there, a course on Algorithms and Data Structures is critical for solving complex problems and writing optimized code, which is directly relevant to tasks like model training and feature engineering. Lastly, a course on Databases and SQL is indispensable, as most data science workflows involve retrieving, cleaning, and manipulating large datasets stored in relational databases. Like mathematics and statistics, you cannot have too many computer science courses.
While these topics are a bit better suited to self-study compared to mathematics and statistics, formal instruction is still highly valuable. Self-learning often requires more time, persistence, and discipline, making a structured course a worthwhile investment for many learners. Building this foundation will make data science-specific topics, such as machine learning, significantly easier to approach.
Data Science

More schools are now offering courses in data science, analytics, and machine learning. If you have the opportunity, absolutely take them—clear your schedule for them if you’re serious about becoming a data scientist. These courses will introduce you to the practical side of the field and familiarize you with common practices.
However, in my experience, they’re often not as deep or impactful as foundational courses in mathematics, statistics, or computer science. Your mileage may vary, and I still recommend taking them, but remember that the hard part of data science lies in its foundations, not just the practice of applying pre-made tools. If you want to truly be a scientist and not just a tool technician, prioritize building a strong intellectual base.
Degrees

Data scientists can come from a variety of educational backgrounds beyond the core fields of mathematics, statistics, and computer science. Degrees in other areas can help you develop an interesting niche within a specific domain. However, the mathematical demands of data science often make a STEM background essential. While fields like the humanities aren’t entirely out of the question, you’ll need to ensure you complete the critical foundational coursework.
Choosing among math, statistics, or computer science will give you a deeper understanding than the minimums outlined in this guide. I also recommend seeking a balance between theoretical and practical learning. For example, a pure math or classical statistics degree might leave gaps in practical skills, while a computer science degree might lack the statistical depth needed for robust data science practice. In such cases, supplementing your education with targeted courses or self-study will be necessary to ensure well-rounded development.
Nearly all data scientist positions require at least a bachelor’s degree, and many demand a master’s or even a doctorate. While a master’s degree is not strictly necessary, you should plan to pursue one eventually to remain competitive in the field. Ideally, this should be in one of the core fields discussed earlier. Alternatively, if your bachelor’s degree is in a core field, a master’s program could be an opportunity to develop domain expertise and broaden your skill set.
On MOOCs, Certifications, and Bootcamps
MOOCs, certifications, and bootcamps can be valuable resources for certain learners, especially those looking to gain initial exposure to data science concepts. However, I believe it’s important to address a harsh reality: most of these programs are more akin to data analysis bootcamps than true data science training. They often lack the depth in mathematics, statistics, and computer science that is critical for a robust data science foundation.
While it is possible to build a successful career from these resources, it requires a disproportionately high level of additional effort to overcome their limitations, both in hiring and in practical knowledge. Without substantial self-study and supplementary learning, graduates of such programs are often at a severe disadvantage compared to candidates with formal education in STEM fields.
Closing
If you are new, the number of difficult courses may seem daunting. However, this is appropriate for the level of expertise required. If you are not interested in studying this much math, that is ok. There are plenty of data and business analysts positions, many even with the title of data scientist. If you simply do not want to do this a formal setting, that can be very difficult but self-study is not impossible. You will not need to become a calculus computing machine but you do need to understand the concepts functionally to be a competent scientist.
Becoming a data scientist isn’t about taking shortcuts; it’s about building a foundation that lasts. Whether through formal education or disciplined self-study, the effort you invest into a rigorous path today will pay dividends throughout your career.
In the next, and likely final, post of this series, we discuss resources for aspiring data scientists.
Let me know your thoughts—how did you build your data science foundation? What challenges have you faced along the way? I’d love to hear your experiences in the comments!