Bachelor in Data Science
The course aims at providing hands-on experience with state of the art distributed data processing environments. First, it addresses cloud computing concepts and tools. Then, it provides a bottom-up overview of distributed storage and processing technologies, emphasizing scalability and usability. Finally, it introduces basic systems and information security concepts, as required to safely use current distributed environments.
The grade has two components:
Submitted projects must be fully authored by the students and must not contain materials (text, code, …) from third parties, obtained online, or using AI tools unless explicitly marked and authorized by the instructors. See Academic Regulations and Code of Ethical Conduct for more information.
| # | Date | Topic | Mat. | Read |
|---|---|---|---|---|
| T1 | 18/9/25 | Introduction. | 🗎 | |
| PL1 | 19/9/25 | Linux installation and basics. | 🗎 | |
| T2 | 25/9/25 | Virtualization and cloud. | 🗎 | R1 1-5; R2 9,24 |
| PL2 | 26/9/25 | Provisioning. | 🗎 | |
| PL3 | 3/10/25 | Cloud access. | 🗎 | |
| T3 | 9/10/25 | Storage management. | 🗎 | R2 20 |
| PL4 | 10/10/25 | Instance management. | 🗎 (V) | |
| T4 | 16/10/25 | File formats. | 🗎 | R3 |
| PL5 | 17/10/25 | Instance management (cont). | ||
| PL6 | 23/10/25 | Storage and filess. | 🗎 (V) | |
| PL7 | 24/10/25 | Storage and files (cont). | ||
| T5 | 30/10/25 | Query execution. | 🗎 | R4 4 |
| PL8 | 31/10/25 | Query execution. | 🗎 | |
| PL9 | 6/11/25 | Query execution (cont). | ||
| T6 | 7/11/25 | Query optimization. | 🗎 | R4 4 |
| T7 | 13/11/25 | Distributed execution. | 🗎 | R4 3; R5 |
| PL10 | Distributed execution | 🗎 | ||
| T8 | 20/11/25 | Security. | 🗎 | R6 1-3 |
| PL10 | 21/11/25 | Orquestration. | 🗎 (V) | |
| PL11 | 27/11/25 | Project. | ||
| PL12 | 28/11/25 | Project. | ||
| T9 | 4/12/25 | Cryptography. | R6 14 | |
| PL12 | 5/12/25 | Project. |
| # | Title |
|---|---|
| R1 | Fox, A., et al. Above the clouds: A Berkeley view of cloud computing Dept. Electrical Eng. and Comput. Sciences, University of California, Berkeley, Rep. UCB/EECS 28.13 (2009). |
| R2 | Nemeth, E., Snyder, G., Hein, T.R., Whaley, B., Mackin, D. UNIX and Linux System Administration Handbook (5th Edition), Addison-Wesley Professional, 2017. |
| R3 | Aditya Somani. A Data Engineer’s Guide to Columnar Storage |
| R4 | J. M. Hellerstein, M. Stonebraker, and J. Hamilton. Architecture of a Database System Foundations and Trends® in Databases, vol. 1, no. 2, pp. 141–259, 2007. |
| R5 | Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, Comm. ACM, 2008. |
| R6 | Dieter Gollmann. Computer Security. Wiley, 2011. |
| A1 | J. Pereira. Introdução ao Unix U. Minho, (2025). |
| A2 | Alex Braunton. Hands-On DevOps with Vagrant, Packt Publishing, 2018. |
| A3 | Mark Needham, Michael Hunger and Michael Simons. DuckDB in Action Manning, 2024. |