| Dimension | Data Engineer | Data Scientist |
|---|---|---|
| Primary focus | Infrastructure, pipelines, storage, data access | Modelling, prediction, experimentation, insight |
| Typical outputs | ETL/ELT pipelines, data warehouses and lakes, APIs, streaming jobs | Predictive models, A/B test analyses, dashboards, recommendations |
| Core languages | Python SQL Scala/Java | Python R SQL |
| Key tools | Spark Kafka Airflow dbt AWS / Azure / GCP | Scikit-learn TensorFlow PyTorch Jupyter Tableau / Power BI |
| Main collaborators | DevOps, IT architects, cloud security, analysts | Product managers, marketing, finance, operations, leadership |
| Success metrics | Pipeline uptime, data freshness, data quality, system cost | Model performance (AUC, accuracy), business KPI uplift |
Source: Dreamix.eu, Rice University, University of the Cumberlands, IBM