How big data tool experience tracks with tech salaries
Connecting state and local government leaders
Data analysts and engineers who know how to use the advanced, recently developed tools of big data typically can pull in high salaries.
Big data pays big money.
That’s the conclusion of a couple of recent surveys that found that data analysts and engineers with big data chops are earning more than $120,000, compared with the reported average IT salary of $89,450. And Storm and Spark users can pull down $150,000, compared to the median total salary of all data analysts and engineers of $98,000.
"Big data made a big showing last year and we're seeing it this year too," said Shravan Goli, president of tech job consultancy Dice.com, in a statement. "Tech professionals who analyze and mine information in a way that makes an impact on overall business goals have proven to be incredibly valuable to companies. The proof is in the pay."
And while that is surely good news for data scientists in the financial and marketing sectors, government agencies are getting pinched. Like private-sector enterprises, they see there are insights and efficiencies to be had through analysis of big data, but agencies can’t compete on salary.
The median total salary of government data analysts and engineers was significantly lower– by approximately $17,000 – than median salaries earned by data analysts and engineers across other industries, according to a recent salary survey by O’Reilly Media, which also analyzed the tools used by data professionals. Unsurprisingly, respondents who work for government vendors reported higher salaries.
Other contributing salary factors included age, gender, years in the field, employee level, degrees held and usage of cloud technology. Among O’Reilly’s findings:
- Every year of age added $1,100, with an additional $1,400 for every year of experience working in data.
- Women earned a median of $13,000 less than men, a number consistent with the general U.S. population.
- Those with doctorates earned $11,000 more, and every position increase added an average $10,000 to salary.
- Those using cloud technology earned $13,000 more than those who didn’t.
High-end, high pay
Data engineers who have experience with Storm and Spark earn the highest median salaries, according to O’Reilly.
Apache Storm is a distributed, fault tolerant, real-time computation system for processing large volumes of high-velocity data. Its speed makes it useful for real-time analytics, machine learning and continuous computation.
Apache Spark is a big data processing framework that improves traditional Hadoop-based analytics. It uses in-memory primitives and other enhanced technologies to outperform MapReduce and offers more computational options, with tool libraries for enhanced SQL querying, streaming data analytics, machine learning and more.
Other high-salary tools were IBM’s Netezza, Cassandra, Amazon Elastic MapReduce, Homegrown (avt), Pig, Hortonworks, Teradata and Hbase (all with median salaries over $130,000).
The more tools a data professional used, the higher the salary, with those using up to 10 tools earning a median salary of $82,000 rising to $110,000 for those using 11 to 20 tools and $143k for those using more than 20.
The tools most typically used by respondents were programming languages, databases, Hadoop distributions, visualization applications, business intelligence programs, operating systems or statistical packages.
Aside from operating systems, SQL was the most commonly used tool, with R and Python closely behind Excel. Over 50 percent of respondents used these four top data tools, followed by Java and JavaScript with 32 percent and 29 percent respectively. MySQL was the most popular database, closely followed by Microsoft SQL Server.
The study also looked at tools commonly used together and tried to determine the relationship between tool clusters and salaries.
These clusters were:
- Cluster 1: Windows; C#; SPSS; Visual Basic, VBA; SQL; Business Objects; Oracle BI; PowerPivot; Excel; Oracle; SAS; Microstrategy; MS SQL Server.
- Cluster 2: Linux; Java; Redis; Hive; Amazon; ElasticMapReduce (EMR); MongoDB; Homegrown ML Tools; Storm; Cloudera; Apache Hadoop; Hortonworks; Spark, MapR; Cassandra; Hbase; Pentaho; Mahout; Splunk; Scala; Pig.
- Cluster 3: Python; R; Matlab; Natural Language/Text Processing; Continuum Analytics (NumPy + SciPy); Network/Social Graph; libsym; Weka.
- Cluster 4: Mac OS X; JavaScript; MySQL; PostgreSQL; D3; Ruby; Google Chart Tools/Image API; SQLite.
- Cluster 5: Unix; C++; Perl; C.
After discarding clusters 4 and 5 because they were not significant indicators of salary, O’Reilly determined that users of Cluster 2 and 3 tools earn more, with each tool from Cluster 2 contributing $1,645 to the expected total salary and each tool from Cluster 3 contributing $1,900.
The report confirms trends that have been evolving for some time: Hadoop is on the rise, cloud-based data services are important and those who know how to use the advanced, recently developed tools of big data typically earn high salaries.
“For future research we would like to drill down into more detail about the actual roles, tasks, and goals of data scientists, data engineers, and other people operating in the data space. After all, an individual’s contribution – and thus his salary – is not just a function of demographics, level/position, and tool use, but also of what he actually does at his organization,” noted John King and Roger Magoulas, writers of the report.
NEXT STORY: USGS releases open-source groundwater toolkit