Cloud computing and the top skills needed by data scientists
To put it simply, ‘Cloud’ means ‘the internet’ and ‘Cloud computing’ is the delivery of computing services over the internet. Cloud computing enables users to rent physical data servers, storage, databases and computing power from cloud providers under a pay-as-you-go payment scheme. Sometimes the cloud providers also supply software and analytics on the cloud. According to Microsoft, cloud services can be classified into four main categories: Infrastructure as a service (IaaS), Platform as a service (PaaS), Serverless computing and Software as a service (SaaS). IaaS refers to the renting of IT infrastructure including servers, virtual machines, storage, networks and operating systems. PaaS provides an environment for developing and managing web or mobile software applications. Serverless computing focuses on providing the management and maintenance of infrastructures needed for app development. SaaS involves supplying software applications on demand over the internet.
The major players
The main players in the Cloud computing market include Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), IBM Cloud, VMware Cloud, Oracle Cloud and Alibaba Cloud. According to Canalys research, a global technology market analyst firm, AWS has a 32.3% share of the market with 16.9% for Microsoft Azure and 5.8% for GCP. Alibaba, the newly emerged Chinese company with 4.9% of the market share, could shake up the pecking order in the coming years. AWS still leads the race, building on its first mover advantage, having started cloud services in 2006, seven years before its competitors entered the market. However, in sales gains, GCP has experienced an 88% year-on-year growth from 2018 to 2019, quickly expanding its footprint in the cloud computing industry.
Cloud computing for Data Scientist
Cloud computing is becoming increasingly vital for not just the software developers but in the field of big data analytics: cloud computing makes expanding computing power and deploying data solutions much easier and is therefore handy for data scientists who are digging into large datasets. Each of the three major cloud providers has a set of powerful tools for data scientists: For AWS, widely known tools include Redshift, EC2, EMR, S3, Data Pipeline and Database Migration Service. Customers include Standard Chartered Bank and S&P Global Ratings (financial services), Skyscanner (travel & hospitality), Nielsen (marketing & advertising), Royal Dutch Shell (energy) and The Guardian (media). Microsoft Azure, on the other hand, provides AzureSQL, DocumentDB, AzureTable and AzureBlob for data storage purposes, HDinsight as a HortonWorks distribution of Hadoop (including Hive, MapReduce, Spark,etc.) and AzureML for an easy implementation of machine learning algorithms. A plus of using Azure is that all tools mentioned above could be integrated with Microsoft Excel and Power BI – making results easier to visualise and more accessible for individuals with different technical skills. Customers of its data-related services in the UK include Concentra, NEL and Presence Orb. Widely used services for GCP include Google BigQuery for data collection and exploration, Vision/Speech/Translate/Natural Language API for data extraction and transformation, Cloud Dataprep, Cloud Dataflow and Apache Beam for data cleansing, Data Studio for visualization, and Tensorflow for machine learning purposes. Customers in Europe include HSBC (financial services), Sky UK and ITV (media), Philips (manufacturing), and AB InBev, Burger King, Ferrero and Morrisons (retail and consumer goods).
Skills needed for Cloud Computing jobs
According to a study of tech job listings on SimplyHired, Indeed, Monster, and LinkedIn in December 2019, demand for cloud platform skills is rising. AWS showed up in around 20% of listings with the keyword ‘Data Scientist’, while Azure appeared in around 10%. A qualified data scientist needs to hone their cloud computing skills, learning to perform a series of tasks in the data pipeline on the cloud. This includes data acquisition, data cleansing, data transformation and data mining, as well as model training and testing, using the toolkits provided by the major cloud platforms, especially AWS and Azure.