Data Science is a booming career. Astonishingly, BLS expects a 36% growth over the next ten years with a median income of over $108,000 in the US. With such growth and income potential, no wonder people are trying to transition into it. If you’re drawn to technology, problem-solving, and uncovering insights hidden in data, it can be an incredibly fulfilling career with real impacts across multiple industries. However, as rewarding as it is, data science is no easy field to enter. Many underestimate the preparation required, not just to land a job, but to truly excel. Let’s forget the hype and discuss the real, intellectually rigorous path to get you started.
This article is the first in a series exploring the essential foundations for becoming a capable data scientist. In this post, we’ll discuss the essential programming skills every aspiring data scientist needs and what hiring managers should look for.
When you’re a data scientist, do you want to be the one who just runs pre-built models, or the one who deeply understands the tools and processes?
What is a Data Scientist?
Data Science is a field that combines mathematics, statistics, computer science, and domain knowledge to inform decision makers and solve problems. They do not simply apply tools. In fact, it is this distinction that separates scientists from technicians. A technician operates tools without necessarily fully understanding why but a scientist knows their field. They may not know every intricacy of their tool but should be able to explain why. While tools are critical to the profession, their usefulness is governed by their proper use and understanding. Furthermore, these tools may change and evolve but the foundational knowledge endures.
Programming
Basic to intermediate programming is a foundational requirement for the field. While Excel is an important tool and can be indispensable for smaller datasets, quick calculations, and data cleaning, a data scientist needs more robust tools capable of handling large-scale data and complex processes. These tasks are simply beyond Excel’s capabilities. Moreover, Excel lacks many essential machine learning functions, making it unsuitable for advanced data science workflows. Hence, you must learn some high-level programming languages, namely Python and R.

Bilingualism: Python and R
While it’s not imperative to learn both Python and R, I highly recommend it. Being bilingual in programming languages, especially object-oriented ones, allows you to learn new languages more quickly. Knowing both also signals to employers that you’re a competent and versatile programmer.
If you’ve taken an introductory computer science course in Java or a C-family language, you might skip R and focus on Python. Python is the industry standard for data science and even used in general-purpose programming. R, on the other hand, is more specialized for classical statistics. Python’s versatility and extensive libraries make it ideal for machine learning and deep learning applications, while R’s strength in statistical analysis and data visualization caters to different aspects of data science.
Object-oriented programming (OOP) is essential for creating modular, reusable, and scalable code, making it invaluable for managing the complexity of large projects; learning at least one OOP language, such as Python or Java, equips you with the skills to design systems that are both efficient and adaptable to future changes.
Common OOP Languages: Python, Java, C, C++, C#, Swift, Ruby
As for SAS, while it’s still used in some industries, it’s not a suitable solitary language for a data scientist, partly because it is more procedural than object-oriented. SAS is excellent for basic regression models and provides clean, insightful output for diagnostics, but it struggles to implement cutting-edge techniques like neural networks. While it remains a solid product, it is not widely considered a standard in modern data science.
Programming Language Tips for Hiring Managers
Now that we’ve covered foundational technical skills, let’s discuss how these translate to hiring decisions. When evaluating candidates, prefer those who are multilingual in programming languages. For example, a data scientist proficient in Python, R, and SQL demonstrates not only technical versatility but also an ability to adapt to different tools and environments. Prefer object oriented languages since these tend to be more versatile. This adaptability also indicates that the candidate is likely to grow with your team and stay relevant in a rapidly evolving field.
Don’t get too caught up in whether they already know your specific preferred language. If they’ve mastered two and especially three or more languages, they likely have the capacity to pick up new ones quickly. Exceptions might include highly specialized tools like SAS, since they do not translate to learning object-oriented languages well. Now, knowing these languages and tools is not a liability by any means.
Common Specialized Tools and Functional Languages1:
SAS, MATLAB, Minitab, R, SPSS, Stata, EViews, SQL
SQL: An Indispensable Tool
While Python and R are essential for analysis and modeling, Structured Query Language (SQL) stands apart as the backbone of data retrieval—an indispensable skill for any data scientist. It is the most common language for retrieving datasets from databases, whether on a local server or a cloud system. Thanks to its specific optimizations, SQL is often faster for major dataset manipulations—like joining or filtering—compared to tools like Python’s Pandas library. This is especially notable when joining or filtering massive datasets stored in cloud databases, where SQL’s optimized query execution can significantly outperform Python’s in-memory processing.
Unfortunately, SQL seems to be almost entirely overlooked in many post-secondary programs. However, it’s relatively simple to learn. You can easily pick it up on your own or on the job with some preparation. Just make sure you are familiar with its purpose and basic functionality.
Learning Tools for R, Python, and SQL2: Datacamp and Dataquest
SQL Tips for Hiring Managers
While many aspiring data scientists focus heavily on Python and R, SQL remains one of the most essential tools for working with structured data. When onboarding a data scientist, expect to teach entry-level candidates the basics since it is often ignored in schools. SQL can be learned quickly by those with strong programming fundamentals, and its practical applications—like querying databases, joining tables, and optimizing data pipelines—are critical for real-world projects.
Don’t underestimate the importance of SQL proficiency, even for advanced roles. A candidate may have a strong grasp of machine learning or statistical methods but still need to refine their database skills. Encourage learning on the job while ensuring they understand the necessity of this skill.
Final Thoughts for Aspiring Data Scientists
Mastering programming is not just a box to check—it’s a continuous journey that underpins everything you’ll do as a data scientist. Python, R, and SQL form the foundation, but the ability to adapt and grow as new tools and technologies emerge will set you apart. Data science thrives at the intersection of curiosity and capability, and your programming skills will be the bridge to uncovering meaningful insights in data. The more you invest in building these skills now, the more prepared you’ll be to tackle the challenges and opportunities that await.
Final Thoughts for Hiring Managers
When hiring a data scientist, remember that technical skills like programming and SQL are important, but adaptability, problem-solving, and the ability to learn quickly are even more critical. Look for candidates who demonstrate curiosity and a solid foundation in multiple programming languages, and don’t get too caught up in whether they meet every technical requirement upfront. The best hires are those who can grow with your team, bridge the gap between technical expertise and business needs, and continually evolve in a fast-changing field. By focusing on these qualities, you’ll be better equipped to build a data science team capable of delivering real value to your organization.
In my next post, we will examine which classes are essential, and which are optional but valuable. How can you get the most out of formal education?
- You may reasonably disagree with some of these categorizations. MATLAB, for example, is capable of object oriented programming. ↩︎
- Person recommendation, not sponsors. ↩︎