Data Masters Internship Program

A head start on your future in Data Science

Begin your professional journey in Data Science with our innovative program that offers exciting opportunities to get a head start on your career in data.

WHY COMPLETE THE DATA MASTERS INTERNSHIP PROGRAM

Data Masters is one of the largest data consultancy companies in Southeastern Europe. Our goal is to transform the businesses of today and help them gain a competitive advantage by implementing Data Democratization and teaching them how to use the power of data in their everyday operations. 

Knowledge-sharing is an integral part of Data Masters. We strive to expand the data community and build experts in the Data Science field by imparting our knowledge and expertise to all who aspire to enter the data world. 

If this speaks to you and you want to take the first step toward a career in the Data Science field, then you are in the right place.

ABOUT THE PROGRAM

As part of this program, you will get the unique opportunity to experience real-life projects and work on Business Intelligence solutions and Data Engineering challenges to fast-track your career into Data Science.

Through this internship, you will gain valuable insights into what it’s like to solve meaningful challenges with our diverse and forward-thinking team at Data Masters. The program will show you what kind of projects we work on at Data Masters and will attempt to simulate the challenges our consultants face every day – new terminology, ambiguity about the client goals, and challenging data analysis. All those aspects form an integral part of our day-to-day work.

The program itself is divided into two segments: Business Intelligence and Data Engineering. Each task in the program aligns with a stage in different projects and follows a real-world example to bring it to life.

We recognize that these tasks are challenging and that there are undoubtedly phrases and terminology you may not have heard before – don’t worry. We have tried to make this experience as true to life as possible. Therefore, we ask that you seek our independent sources of information and do your own research, as required, to help guide you through the tasks.

Skills you will learn and practice:

Business understanding

Hypothesis farming

Communication

Programming

Exploratory Data Analysis

Data Visualization

Creativity

Mathematical Modelling

Model Evaluation

Client Communication

How it works:

Apply for a non-technical interview by filling out the form and uploading your CV.
For the exam evaluating your current knowledge, you need to choose a combination of two out of the three options: SQL, Python, and Power BI.
Get assigned and work on a Business Intelligence or Data Engineering project with a dedicated mentor.

Internship program tasks

Choose the project that fits you best

Objective:

This stage involves gathering requirements from stakeholders to understand their data needs and business objectives. It sets the foundation for the entire project by defining what data will be collected, how it will be used, and what insights are required.

Actions:

Simulate client communication to understand the data needs and business goals.

Determine what specific data will be collected based on the gathered requirements.

Focus on collecting and using data that delivers insights aligned with the organization's strategic goals.

Objective:

To design a data warehouse structure that efficiently supports analysis by defining entities, attributes, and their relationships using dimensional modeling techniques in MSSQL. This will ensure data is organized for easy access and insightful queries.

Actions:

Choose dimensional modeling techniques (e.g., star schema, snowflake schema).
Create entities, attributes, relationships, and hierarchies in MSSQL.
Create entity-relationship diagrams and data dictionaries.

Objective:

Populate the data warehouse with comprehensive data for analysis by extracting relevant information from different source systems (databases, CRM, files) using SSIS packages and CDC techniques. This ensures the data warehouse has the most up-to-date information for accurate reporting and insights.

Actions:

Identify source systems and data sources.
Create/Develop SSIS packages for extracting data from source systems
Create entities for the staging area.
Implement change data capture (CDC) for capturing incremental changes.

Objective:

This stage ensures the data warehouse contains reliable information for analysis. Extracted data is cleansed and standardized using SSIS transformations to guarantee consistency and quality. Furthermore, data validation and the application of business rules transform the data into a trustworthy and insightful format.

Actions:

Cleanse and standardize data using SSIS transformations.
Perform data validation and quality checks.
Apply business rules and calculations for insights generation.

Objective:

Deliver the transformed data for analysis and reporting by efficiently loading it into the data warehouse. This involves selecting the optimal loading strategy (batch, incremental, real-time) based on project needs. SSIS packages will then be developed to seamlessly transfer the data into the MSSQL environment, ensuring the data warehouse remains current with the latest insights.

Actions:

Choose a loading strategy (e.g., batch processing, incremental, real-time, etc…)
Develop SSIS packages for loading transformed data into MSSQL.

Objective:

This stage provides user-friendly interfaces for data access, including BI dashboards and reporting tools. By creating a dashboard to visualize and monitor key metrics, users can gain valuable insights without needing extensive technical expertise.

Actions:

Create a dashboard to visualize and monitor key metrics.

Objective:

Showcase the completed data pipeline and analytics dashboard.

Actions:

Record a video walk-through explaining your project and presenting the dashboard.

Highlight how the pipeline ingests, processes, and transforms data for meaningful insights.

Share best practices, challenges encountered, and lessons learned.

Objective:

Understand the architecture and tools required for streaming data and setting up the infrastructure.

Actions:

Learn about the overall architecture: streaming services, orchestration, data lake, data warehouse, and isualization.

Study the tools and technologies being used, including GCP, Terraform, Docker, Kafka, Spark Streaming, Airflow, dbt, and BigQuery.

Set up the foundational infrastructure by configuring GCP and Terraform.

Objective:

Set up a real-time streaming pipeline.

Actions:

Create a Kafka instance to receive messages from the streaming service.
Stream data using Kafka and process it in real time with Spark Streaming.
Periodically store processed data to the data lake.

Objective:

Implement an hourly batch job that transforms streaming data for analytics.

Actions:

Configure an Apache Airflow instance to trigger the hourly batch processing job.
Use Airflow to execute data transformations using dbt.
Populate tables in BigQuery with the transformed data to support dashboard analytics.

Objective:

Create a dashboard to visualize and monitor key metrics.

Actions:

Design and implement a Google Data Studio dashboard to visualize the processed data.
Define and analyze key metrics.
Monitor data freshness and pipeline performance.

Objective:

Showcase the completed data pipeline and analytics dashboard.

Actions:

Record a video walk-through explaining your project and presenting the dashboard.
Highlight how the pipeline ingests, processes, and transforms data for meaningful insights.
Share best practices, challenges encountered, and lessons learned.

APPLY HERE