Randstad Groep · Standplaats: Diemen · 17 augustus 2021
Are you an experienced Data Engineer looking for an international, creative and innovative environment? Would you like to work on a self-service data platform, making sure our data makes its way from a vast array of sources to the right place?
At the IT Department of Randstad Groep Nederland (HQ) we are looking for you! We’re looking for a Senior Data Engineer available to join our internal team immediately.
Data Engineering at Randstad Groep Nederland (HQ)
As a member of the DataHub Team you are responsible for the development and maintenance of the Randstad data lake and the services offered to data scientists and data analysts.
The DataHub Team is making use of a variety of technologies and we are responsible for our own infrastructure.
We provide a platform to distribute data to data scientists and analysts all over the organization to make use of all the data that is generated in the Randstad Group Netherlands.
You will be part of an agile team and play a vital role in the design and development of a cloud-based data platform.
Build and manage the DataHub, which includes:
A front-end data catalog
DataHub users and data science projects management
An AWS s3 based Data Lake
Develop ways to improve self-service data consumption and data publishing:
Build and manage ETL pipelines in Airflow, which are responsible of ingesting the data and making the data available to users
Develop standard ways to deliver data in the DataHub
Develop CI/CD pipelines for data consuming teams to let them develop their products
You will be responsible for producing quality code and reusable components.
Using containerization, CI/CD and other automation technologies, you will be responsible for creating a backend for high availability and scalability, while at the same time being easily deployable, manageable and secure.
Together with the rest of the team you will be involved in the full product development process, from design, implementation, to testing, documentation and automated deployment.
Respond to and resolve operational incidents, performing root cause analysis and managing changes required to prevent future occurrences.
In this team you will have a wide range of responsibilities and should be willing to adapt to many different challenges.
Discuss with the users of the platform requirements and future improvements, but also come with proposals for our users on how to use the platform.
Manage and develop our data persistence environments (data lake, storage, etc) to ensure that data is properly available to users and secured
Monitor systems for uptime and performance.
The data lake we maintain is partly in Redshift, and is moving fully towards S3. The new S3 Data Lake will be accessed through the Trino Query engine that lives on an auto-scaling EKS cluster and eats raw data via Spark through an EMR cluster that makes use of a fleet of Spot instances. We have created a Django based metadata catalog that functions at the same time as a portal to monitor the data and to provide services for our consumers.
For general usage we offer the functionality of data subscriptions through scheduled unloads to a project space on S3. Furthermore we offer tools to work with machine learning models using Sagemaker notebooks.
You will be designing and setting up infrastructure on AWS for the expanding services of our platform and develop airflow dags that represent our data pipelines. Most of our coding is done in python.
So what would we like you to know and bring to the table?
If you really want to impress us, you can do so by having experience in: containerization platforms like; Spark, Jenkins, Django, EMR, Kubernetes/EKS, Presto/Trino;
What do we offer?
Does this sound like the right next step for you? Fantastic! Apply directly by clicking apply or contact our Senior Staff Specialist for more information (firstname.lastname@example.org | 0651578290)
This is a full time (40h/week) role. Read more about working at RGN IT and our benefits.