Top 3 Data Science Skills Trending in 2018
Data science jobs didn’t exist just a decade ago, and now they’re the hottest jobs on the market. For the last 3 years in a row, Glassdoor has named data scientist as the “best job” based on the number of job openings, salary, and job satisfaction scores.
With innovations in machine learning and artificial intelligence, the ability to predict the future is what makes data science so powerful for businesses. Machine learning is able to learn and detect patterns in data not discernible by humans. For example, by sifting through historical data and identifying patterns of calls, data science can detect if the call is likely a scam. You can also use data science to predict how many customers are going to churn in the next month. Data science can be an extremely powerful tool to help augment the core strengths of a business.
Data science is also an interdisciplinary skill. If you think of a Venn diagram with three circles—machine learning, probability and statistics, and computer science—data science sits at the intersection of these three fields. This interdisciplinary nature of data science means that traditional educational institutions have difficulty keeping up with the rapid changes in this space.
What are the hottest trends and data science skills your employees should stay on top of in 2018?
Python: the fastest-growing programming language
Python is rising in popularity in the world of data science and machine learning. Five years ago, most data scientists gravitated to R as their programming language of choice. But today, StackOverflow rates Python as the fastest-growing programming language, with many universities switching from teaching Java or C++ to Python because it’s a much easier language to use. For employees who need to acquire Python skills, my course Python for Data Science and Machine Learning Bootcamp on Udemy and Udemy for Business is a good place to start.
Data visualization skills: Plotly and Dash
There’s also a growing number of data visualization code libraries in the Python ecosystem that make it easy for data scientists to create cool graphics. Instead of relying on third parties like Tableau or Chartio, data scientists can use these Python code libraries to visualize their data with fully customized control.
For example, if your sales team wanted to understand customer churn by region, your data science team could visualize this data in a colorful map. Instead of building this graphic using code from scratch, data scientists can rely on popular data visualization libraries in Python such as Matplotlib and Seaborn for static graphics or Plotly for interactive graphics.
In fact, the trend in data visualization is quickly moving toward highly interactive graphics. Plotly recently released Dash in June 2017. Dash lets you create fully interactive graphics that can be linked to each other (similar to what third-party vendors like Tableau do). You can zoom in and out or you can input data and get updates in real-time. For example, if you’re looking at a global map of customer churn by region, you can then zoom in and see detailed graphs by country such as the dollar amount lost and sales reps responsible. Moreover, your data scientists don’t need to program these graphics from scratch or rely on third parties like Tableau. The power of the Dash code library means they can create beautiful interactive graphics just by cutting and pasting code, and also have the flexibility to integrate any other Python libraries they use.
In 2018 Plotly and Dash have quickly become highly in-demand skills for creating visualizations with Python. In response to this trend, I’ve already created a course on Udemy on how to use this new technology. You won’t find this kind of training anywhere else as universities and other learning resources aren’t as agile at keeping up to speed with the latest innovations. But on Udemy and Udemy for Business, your data scientists can stay on top of the hottest new skill with my recently released course Interactive Python Dashboards with Plotly and Dash.
Right now there’s only one official 2-day in-person training course offered by Plotly on Dash which is very expensive. However, my cost-effective course on Udemy and Udemy for Business is self-paced, takes only 10 hours, and delivers the same information in easy-to-understand videos along with a custom corresponding digital book. We also offer hands-on practice assignments by asking students to recreate data visualization dashboards.
Spark: faster and easier for processing large data sets
Although Spark has been around for awhile, it’s exploding in popularity in 2018. Why is Spark taking off? Huge data sets can’t be analyzed on your computer as there’s not enough space. Let’s say you have a data set that’s 100 gigabytes. Companies like Uber and Walmart analyze substantial data sets like this every day.
Typically, you had to learn Java and Hadoop MapReduce, sync up several servers, and hope you didn’t make any errors along the way. Apache Spark processes data in-memory while Hadoop MapReduce has to read from and write to a disk, making it a much slower process.. Hadoop syntax is also more complex and prone to errors, unlike the simple API calls available with Spark.
Spark was created at the AMPlab at University of California, Berkeley and was donated to the Apache Software Foundation making it fully open source. Spark is capable of 100x performance improvements over its Hadoop MapReduce counterpart by keeping things in-memory. This in-memory processing can help deliver near real-time analytics for data from Internet of Things sensors, social media sites, security analytics, and more.
The truly great aspect of Spark is its accessible API. Often you have a trade-off between simpler syntax versus performance. But in this case, Spark is both faster for many use cases and offers an easier to use and more abstract API (in several languages). With Spark, you truly get the best of both worlds.
Does this mean Hadoop MapReduce will become obsolete? I don’t think Hadoop MapReduce will become obsolete because there are so many legacy systems running on Hadoop, plus there are use cases where you will still want to implement a Hadoop MapReduce-based solution. However, for data science teams that want to stay on the cutting edge, I teach two courses on Spark on Udemy and Udemy for Business: Spark and Python for Big Data with PySpark and Scala and Spark for Big Data and Machine Learning.
Stay agile and keep on learning
With the rapid advances in AI and machine learning, data science is likely going to transform into something entirely different in the next 10 years. Your job as a data scientist and as a learning & development leader is to stay on top of these new innovations and ever-evolving skills.
Top courses in Data Science
Data Science students also learn
Empower your team. Lead the industry.
Get a subscription to a library of online courses and digital learning tools for your organization with Udemy Business.