Published February 1st, 2016 by Assaf Trafikant
Analyst vs. Data Scientist vs. Big Data: What’s the Difference?
Let’s lay it all out on the table: demand for analysts and data specialists in the digital arena is immense. As for supply, well, that’s a different story. I’m going to try and make some sense of the market, the required knowledge, and how you can attain the desired skills – whether you are marketers, UX experts, designers, advertisers, statisticians, optimizers, or whatever.
I realized we need this article because I get approached by companies that mix up their demands, and want their recruits to know everything in every discipline. This usually results in jobs with way too many responsibilities that have nothing to do with one another, designed for a person who doesn’t exist and eventually is a big miss. Something along the lines of: “An experienced band seeks drummer who sings opera, with ten years of experience in folk street performance. Knowledge of kitchen building – an advantage.”
The Knowledge Disciplines
A note before we start. There are many terms out there, with plenty of interpretations for each title and role. The terminology keeps changing, so don’t stick to one description, but rather try and understand the business aspects of each discipline, what it contributes to the organization, and what type of knowledge it requires. These are the main positions, but there are plenty of subgroups and niches I won’t get into. To make things even more complicated, many companies stick the Big Data label on everything, as if it were some sort of a spice you have to add to every dish; otherwise, it wouldn’t taste good.
Business Intelligence (BI)
Job Description
The classic, and possibly oldest, role. Imagine a large mobile service provider. Its database contains billions of lines at any given moment. Each call, each text message that’s sent out, bounces from switch to switch. Every charge, every call made to the call center – it’s all documented in a huge, complex database. BI analysts are in charge of analyzing all the data that’s accumulated in this immense database. Some of them use systems that link to this database and simplify the reports – Tableau, Business Object, etc. Most of them need to know SQL and may even have to run queries directly on the database, without the help of additional software. It all depends on how much data the company has and how it’s laid out. BI teams and analysts need to answer questions, whip out complex numbers, and present them to the organization’s internal clients. If that sounds just like an analyst – you’re right. In some cases, these two roles have merged into one.
Another role is the BI Engineer (or BI Developer, or Data-warehouse Engineer), those in charge of preparing the databases and their structure for the BI teams. They’re the architects who actively take the databases and create a more convenient and efficient layer on top of them, from which information can be extracted (sometimes known as “cubes”), and eventually provide the BI analysts with a set of tools for executing more efficient queries. They’re also the ones who build BI worlds, develop dashboards, and are primarily developers.
Incidentally, once the digital world became measurable and analytics systems evolved, some of them with external connectivity, some companies and organizations began pulling out information from them and pouring it into their internal databases, using the BI system.
Again, the term BI is pretty broad, but it’s every day (and narrow) use usually refers to managing and analyzing inter-organizational information. If you want to nitpick, you could say that Google Analytics is also a BI system, in a sense.
Required Knowledge
A BI analyst has to be an SQL wiz with high analytical skills, and have an incredible command of Excel and other BI tools. Developing skills – an advantage.
Data Scientist
I can’t give a detailed explanation without someone popping up and saying that’s right, and that’s wrong, and here’s what I missed. It’s uncharted territory and incorporates knowledge from different disciplines, so you can’t say there’s one single kind of data scientist, just like you can’t say there’s only one single kind of software developer. There are dozens, and you have to find the one that fits your needs and infrastructure.
If BI analysts answer questions, then data scientists try to find out the reason why the numbers come out as they do – or in other words, these are the why people. And because they’re the why people, they are statistics-oriented: tons of math, building mathematical models, developing machine learning models, etc.
Data scientists are usually also in charge of connecting between different disciplines, data cleansing, and building an infrastructure that can sustain vast quantities of data processing.
Required Knowledge
Tons of it, depending on the need, existing infrastructure, and type of activity. This is my rating, but you can argue over it in the comments – a very partial list that doesn’t do the field justice – but you have to cut it somewhere:
- Statistics (the more profound the better)
- Game theory
- Data mining tools – start with SPSS or SAS
- Machine Learning
- Anything you learn in computer science studies, including data communication
- Languages – Python, R, C++, Ruby
- Technologies such as Ruby, Hadoop, Node
- Databases – oh, so many! Mongo DB, Casandra, Redshift. The list is endless.
The number of fields is incredible, infinite, and you don’t need to know everything, for two reasons: one, it’s impossible to be a pro at everything; two, data scientists don’t usually work alone, and each member of the team contributes their part.
What I didn’t include in this list but is always good to have is a sense of business. Research is based on understanding the company’s business model and product, the areas that need to be researched, and the contribution of that information to improving the entire ecosystem (and bottom line).
How To Become A Data Scientist?
You can’t argue with the fact that data scientist is the most sought-after job in the world of products and technology today, even more than talented developers, in some senses. But not only is the pool extremely small – sometimes it is downright nonexistent. Several reasons cause this. We’ve seen that the fields of expertise are many. A Ruby developer won’t necessarily know Python, and an SQL whiz won’t necessarily dominate the R language. There are some fantastic algorithms and statistics experts out there. Still, their programming abilities are at the pseudo level (pseudo-codes that don’t work but rather represent a working principle), and they require developers to back them up. Whatever the case may be, it’s hard to find one person to answer all of those needs. If you want to be a highly sought-after, successful data scientist, roll up your sleeves and add more skills on top of the core studies (computer science, math, statistics), such as programming languages. In terms of framework and technologies, join Big Data groups, forums, and meetups and learn from them. I’ll try to crack the whole Big Data thing by the end of this article.
Digital Analysts
There are two types of analysts. The first, irrelevant to this article, are those who collect data, market research, financial analyses, risk assessment, etc. They usually work for retail, finance, or consulting firms. The second is digital analysts, and they’re the reason we’re here.
Digital analysts do the following:
- Determine what needs to be measured and help or actively implement the measuring tool (Google Analytics, Mixpanel, Adobe Analytics, and more.)
- A/B testing or multivariate testing.
- Competitive analysis
- Ad campaign performance analysis.
- User behavior analysis
- UX analysis
- Be fluent in “product” lingo
- Give recommendations for optimizing the site/product/process/campaign based on empiric data, points of reference, tracking, etc.
- Ongoing analyses, building dashboards and reports
Required Knowledge
If you want to be a good analyst or Conversion Rate Optimizer, here’s what you have to know (by order of importance):
- Be versed in the digital world. It’s hard to study analytics in a vacuum. The world of digital marketing has its lingo, and every analyst must know what’s SEO and how it’s done, what’s inbound marketing, how to do social, and how a Google or Facebook campaign is made. The only way to learn (apart from courses) is through doing, actively participating in groups, and continuous learning. I participate in dozens of groups and forums, ranging from web design to UX, Google advertising, growth hacking, product management, and many other topics.
- Speak product. What’s a process? Who’s the client? How do you create a description, how do you characterize, what’s UX, how does product development work, how is a product made and managed, and understanding the business behind it. This is the second foundation. Without the two, you’re working in a useless vacuum.
- Google Analytics. Knowing everything from the bottom to the top. Advantages, disadvantages, how it works, behind the scenes, data collecting capabilities, techniques, and methodologies – everything this tool has to offer, including campaign optimization, UX analysis, traffic analysis, segmentation, understanding user behavior, and the tool’s capabilities.
- Additional tools. Knowledge of statistics, full command of Excel.
- A/B Testing tools/ the two I like to work with are Optimizely and Visual Website Optimizer (VWO). You should become versed in one of them, as well as the methodology. You can also start with Google Optimize.
- Be familiar with Google Ads and Facebook Ads.
- Google Tag Manager. A useful bonus, but mostly for development-oriented analysts.
- Video tracking tools. Heatmaps and video tracking tools such as Hotjar, CrazyEgg, and others. You should know at least one of them well.
How To Become An Analyst Without Any Real Experience
It’s a problem shared by almost all professions. The answer is always the same, and it’s a two-parter.
Few are willing to “pay” for an inexperienced employee to study on the job. It’s a risk: you put effort into someone and teach them what they need, but they could quickly leave you in only a few months, and it will all be a financially-misguided move. The solution is to decide on a training period. An internship, if you will. The employer will share everything you need to know, put time and effort into training, and in return, you will be paid a lower salary than the standard. This is common in other fields, too (law, accounting, medicine). Want skills? Tuition has to fall on both parties – the employer is less at risk, and you can get to know the system from the inside and gain experience.
The second option is less friendly, and mostly requires some guts (it’s not as hard as it seems): go solo. No, I don’t mean you have to start your own company (but small business is a good start) and start advertising on Google. Offer your services to agencies at an intern fee. Something like:
Hi, my name is [—] and I’m starting with Google Tag Manager. I learned the basics at [—] and took an online course at [—]. I did some cool stuff like [—], but still don’t have enough experience to work as an analyst. That’s why I’m offering my services in small tasks such as [—], for a friendly fee of [—] per hour. Jack, you next analyst
You can contact potential employers directly, go to agencies, or spread the word online, and hope to hit the right person. You probably will.
And Now: Big Data
This is, without a doubt, the most overused buzzword of the decade. I’d even go as far as saying that it’s being abused, so let’s set things straight. Big data refers to the management of massive quantities of data in an organization. And by massive, I don’t mean a website or an app with a million hits per day, but more like 100 million. At least. Think Instagram, Facebook, and other networks (even Fiverr). Such a volume of data requires several things of the organization:
- Infrastructure – servers, networks, extensive use of cloud services. Not much to go into here.
- Databases or environments that can hold such data quantities in structures that allow quick analysis and processing. And no, SQL databases would mean that complex queries could take three days to process. There are specially-structured databases or ones with a smart management layer for swift and efficient data extraction – a unique, decentralized data structure instead of tables. I’m talking Amazon’s RedShift, MongoDB, Hadoop, and others. Most major companies invest millions in developing innovative data environments (which later become official products).
- Data extraction and management layer – how do you make a query with 100 million entries? How do you build a table with 100 columns? That’s why we have different information management systems that can access databases and efficiently execute actions based on complex algorithms. Some big names on the scene: Hive, Splunk, PIG, and others.
- Interface – at the end of the day, you need a system to present all that data, right? The organization develops a control system with dashboards and analysis tools to be used with these environments. This could be an off-the-shelf solution that interacts with the database, or the organization might create one from scratch.
In between are hundreds of products and technologies that make work more efficient, solve big data management problems, and it seems that not a day goes by without some new big data technology popping up. The field develops rapidly – too rapidly – and causes much headache to companies that realize that only one year after adopting the latest technology, its creator has decided to change course, and now they need to switch to a new platform.
What Does Big Data Have To Do With Digital Analysts
Google Analytics and co. aren’t designed to handle masses of data (not even Google Analytics’ premium version). Moreover, these tools aren’t good for the highly complex queries created by a company that produces millions of hits per day. So the companies develop a big data environment, but at the end of the day, an analyst and/or data scientist are needed to work it and make the most out of the data. For most analysts, it doesn’t matter whether or not the system is based on big data technology. It’s all clear. Give them an interface and the ability to analyze data, and it’s all peachy. Data scientists will dive deeper, but also use the data management layer to run complex queries. Neither has to know if it’s Redshift or any other big data infrastructure. Separate between infrastructure, management layer, and analysis requirements.
What Kind Of An Analyst Am I?
Knowing many systems is excellent. A good analyst isn’t measured by the number of systems they’ve worked with, but by their analytical skills and understanding of the analytics world – what can be measured, what analyses can be performed, where should you dig, and where might you find gold. Here, for example, we simply have good analysts. It doesn’t matter how much they rock Google Tag Manager or how many hours they’ve clocked on R language. You can learn anything relatively easily. The difference is what you have in your noggin and how you understand the methodology, your dedication, thoroughness, brightness, and, yes, experience, and the ability to sit down and learn.
I know a few data scientists who’ve studied the digital world and became experts at Google Analytics. I know several Google Analytics pros who’ve studied R language and statistics to become better (myself included).
A Word To Recruiters
There aren’t many jobs out there, and plenty of companies take the “catch-all” approach and publish one position that seems to incorporate every item covered here. So please, recruiters – separate the “must” from the “nice to have.” Don’t just say “big-data” without explaining what you mean, and how relevant it is to the job.