Kirill Eremenko

If you are reading this article, I imagine you, like many other data scientists, are wondering which programming language to embark upon learning. Whether you have experience in other coding tools or not, the individual features of these two, including the vast arrays of libraries and packages may initially seem daunting, but don’t worry, we’re here to help!

To no one’s surprise, both R and Python boast their respective advantages for a multitude of applications and are widely used by professionals in its global community. This article is going to help you decide which has the right tools to get you going.

To begin, it’s likely a good idea to revisit what exactly you want to use the programming language for in terms of your data science. For example, a data scientist working predominantly on genetics research may find themselves among those using R (as it’s highly used across genetics and popular with bioinformaticians), whereas someone working on models for image analysis, say an employee at Tesla creating self-driving car technology, might find themselves working with people who prefer Python, due to its sophisticated image manipulation tools. Ultimately, it’s still your choice, and while it would usually never be a good philosophy to just blindly do what everyone else is doing, do take the time to discover why these professionals are preferring certain languages. It is important to be able to “speak” the same language as your future peers.

R Programming A-Z™: R For Data Science With Real Exercises!

Last Updated October 2020

Bestseller
  • 82 lectures
  • All Levels
4.6 (36,084)

Learn Programming In R And R Studio. Data Analytics, Data Science, Statistical Analysis, Packages, Functions, GGPlot2 | By Kirill Eremenko, SuperDataScience Team

Explore Course

Who uses R and what’s its purpose?

R was created initially as a platform for statistical computing, hosting all the classical tests, time-series analysis, clustering, and more. It has a large community of data miners which means lots of accessible packages, both from R developers and users. In terms of graphics there is a multitude of packages and layers for plotting and analyzing graphs, such as ggplot2. Importantly, R has emerged onto the new-style artificial intelligence scene providing tools for neural networks, machine learning, and Bayesian inference and is compatible with such packages for deep learning as MXNet and TensorFlow. You can read more about these here at Quick list of useful R packages

It would seem R has a solid following of not only data scientists but largely statisticians and associated fields requiring data manipulation (for instance those in medicine, finance, and the social sciences). For us data scientists, finding a widely used program is important; we want to be able to speak to as many disciplines within one language as possible, making our findings easily translatable.

Who uses Python and what’s its purpose?

On the other side of the court, Python is an excellent tool for programmers and developers across the board. Whether developing algorithms for simulating biomolecules or delivering anti-spam software, you’ll find yourself at home using its interface and array of functions. Released in 1989, it is quoted as being one of the most significant general-purpose object-oriented programming languages. Python has an ever-growing popularity among new programmers (data scientists among them), which of course means a rich community of users and trouble-shooters.

Similarly, on the hot topic of artificial intelligence, Python is also the most popular choice; it has tools for machine learning, neural networks, and Tensorflow. Additionally, covering some more general purposes, its users benefit from libraries such as NumPy for statistical analysis, pandas for data preparation, and seaborn for generating plots.

R vs Python: Limitations

To the more interesting part: how do they each match up? Uncovering limitations early is possibly one of the most important pieces of advice. Speaking from experience, jumping from using Matlab where there is an enormity of online support (and usually some wonderful person who’s written an exact code for your needs), to labVIEW where there was little to no online presence, I know the sensation too well of panicking and being unable to solve that bug and becoming frustrated at not having considered these obvious potential limitations.

Some of the main things to consider for a data science application are:

Let’s have a look at how each fares on these topics.

Processing speed:

R is considered to be slow. It requires its objects to be stored in a physical memory, meaning it’s not a great option when trying to harness Big Data. That being said, faster processors are reducing this limitation, and there are various packages out there focused on tackling this. Python however, is more suited for large datasets and its ability to load large files faster. 

Online community:

As I mentioned, both R and Python have a widely backed support network for you to reach out to, this being an invaluable source of help for those bugs you just can’t seem to troubleshoot readily.

Steep learning curve:

This may or may not be considered as a limitation of R, but its steep learning curve is due to its extensive power for statisticians. Being developed by experts in the field, R is an incredible tool, but you pay the price for this with your initial investment of time. On the other hand, Python is very attractive to new programmers for its ease of use and its relative accessibility.

Both programs will require you to get familiar with terminology which may seem initially daunting and confusing (like the difference between a “package” and a “library”), with the set-up for Python having the edge on R in terms of the user-friendly experience, again a link to R being developed by statisticians and based heavily on its mature predecessor, S. Although, Python will be unrelentingly strict with users on syntax and refuse to run if you haven’t met easily-missable faults (though these do enhance user experience in the long run as it makes us better, neater code writers). R has the lovely attribute in relation to its many academic users of providing the user with lots more control over design for their graphics, allowing various display exports and formats. Importantly, both are interpreter-based and it has been found, in relation to other languages (such as C++), that this makes spotting bugs so much easier.

User-friendly interface:

Rstudio is widely considered the favorite platform for interfacing in R and once you begin familiarising yourself with it, you’ll understand why that is the case. It’s classified as an integrated development environment (IDE) and comprises a console for direct code execution with all the functions for plotting, supporting interactive graphics, debugging and workspace management, see RStudio IDE Features for a more detailed guide. 

Python hosts numerous IDEs for the choosing. The benefit of this is that it provides a nice opportunity for you to choose one which feels familiar based on your background. For instance, coming from a computer science background, Spyder is a clear favorite. Whereas, beginners in the field find PyCharm accessible and intuitive.

Widely used:

We’ve touched on this topic and I would stress that this is subjective to your chosen field. If you are leaning towards the fields of academia, finance, healthcare, R would most likely be much more widely spoken and you’ll want to take advantage of that. Whereas, those of you interested in software development, automation, or robotics, may find yourself immersed in the Python community.

R vs Python: Advantages

R:

Python:

With all this in mind, choosing a language to begin with highly depends on what you want from it. If you are the kind of data scientist who specializes in statistical analysis or you work in research, you may find R works best for you. However, if you are someone who sees themselves branching across multiple disciplines, you could make use of Python’s generality and diverse network. You may also agree that it would benefit you to eventually learn both (at least enough to be able to read the other’s syntax) as you get to know each for their respective strengths. This will undoubtedly open more doors for you in terms of landing jobs, and more importantly, give you that clarity to decide what career path you want to take. But don’t be overwhelmed; learning the second language will be easier than the first! You no doubt will also find yourself excited about opening up a whole new community to immerse yourself as you grow as a data scientist.

Good luck and happy coding!

Page Last Updated: April 2020

Top courses in Python

Python 3: Deep Dive (Part 4 - OOP)
Fred Baptiste
4.9 (766)
Learning Python for Data Analysis and Visualization
Jose Portilla
4.4 (15,097)
Bestseller
Learn Python & Ethical Hacking From Scratch
Zaid Sabih, z Security
4.6 (11,247)
The Complete Python Course | Learn Python by Doing
Codestars by Rob Percival, Jose Salvatierra, Rob Percival, Teclado by Jose Salvatierra
4.6 (11,285)
Complete Python 3 Masterclass Journey
Jose Portilla
4.5 (2,453)
The Complete Python Programmer Bootcamp 2020
365 Careers, Giles McMullen-Klein
4.6 (2,637)
Artificial Intelligence: Reinforcement Learning in Python
Lazy Programmer Team, Lazy Programmer Inc.
4.5 (7,865)
Bestseller
Natural Language Processing with Deep Learning in Python
Lazy Programmer Team, Lazy Programmer Inc.
4.5 (6,093)

More Python Courses

Python students also learn

Empower your team. Lead the industry.

Get a subscription to a library of online courses and digital learning tools for your organization with Udemy for Business.

Request a demo

Courses by Kirill Eremenko

Le Deep Learning de A à Z
Hadelin de Ponteves, Kirill Eremenko, Charles Bordet
4.6 (2,043)
Bestseller
Data Science A-Z™: Real-Life Data Science Exercises Included
Kirill Eremenko, SuperDataScience Team
4.6 (27,943)
Tableau Interview Q&A: Tableau For Data Science Careers
Kirill Eremenko, SuperDataScience Team
4.3 (421)
R Programming A-Z™: R For Data Science With Real Exercises!
Kirill Eremenko, SuperDataScience Team
4.6 (36,084)
Bestseller
Data Driven Marketing A-Z: Improve Your Campaign Performance
David Tanaskovic, Kirill Eremenko, SuperDataScience Team
4 (360)
Les Data Sciences de A à Z
Kirill Eremenko, Hadelin de Ponteves, SuperDataScience Team
4.4 (2,162)
Bestseller
R Programming: Advanced Analytics In R For Data Science
Kirill Eremenko, SuperDataScience Team
4.7 (6,281)
Python A-Z™: Python For Data Science With Real Exercises!
Kirill Eremenko, SuperDataScience Team
4.6 (18,029)
Tableau 2020 A-Z: Hands-On Tableau Training for Data Science
Kirill Eremenko, SuperDataScience Team
4.6 (56,372)
Bestseller
Tableau 20 Advanced Training: Master Tableau in Data Science
Kirill Eremenko, SuperDataScience Team
4.7 (11,249)
Machine Learning A-Z™: Hands-On Python & R In Data Science
Kirill Eremenko, Hadelin de Ponteves, SuperDataScience Team, SuperDataScience Support
4.5 (133,482)
Bestseller
Power BI A-Z: Hands-On Power BI Training For Data Science!
Kirill Eremenko, SuperDataScience Team
4.4 (10,283)

Courses by Kirill Eremenko