“Welcome to the Tunnel Engineering: A-I & Data Science series. In this series, we explore how artificial intelligence and data driven approaches are transforming tunnelling projects, from planning to operation.”.
“Before diving deeper, let’s place today’s topic into context. In the data science life cycle, data preparation is a foundational step. In this video, we will tackle the data preparation part of the cycle — focusing on how raw data is transformed into reliable, usable information for analysis.”.
"The data management cycle involves the systematic handling of data throughout its lifecycle. Today, we'll focus on the crucial pre processing steps: data integration, data cleaning, and data storage.".
[Audio] "Data integration involves combining data from diverse sources into a cohesive dataset. In tunnelling projects, this means integrating: field (sensor) data collected during exploratory campaigns (for example, drilling, seismics, … and excavation (for example, T-B-M--, borehole drilling, advance drilling, seismics, …) geological mapping laboratory test results (UCS, triax, …) historical records (for example, reports, …) to form a comprehensive data repository. This step ensures that all relevant information is consolidated, providing a holistic view of the project.".
[Audio] The goal of Data integration is to form a comprehensive data repository. This step ensures that all relevant information is consolidated, providing a holistic view of the project.".
[Audio] "Engineers must combine multiple data formats, ensuring consistency across datasets. This involves merging data from various formats such as C-S-V--, Excel, textfiles, scanned reports, and database files into a unified dataset. Ensuring consistency is crucial for creating a dataset that can be accurately and reliably analysed, eliminating discrepancies that could negatively influence the results.".
[Audio] "Data cleaning is the process of: identifying and correcting errors, filling in or highlighting missing values, and resolving inconsistencies. This step ensures the accuracy and reliability of data used for analysis. It involves checking for anomalies, such as incorrect or duplicate entries, and rectifying them to maintain data integrity.".
[Audio] "Removing outliers is essential in data cleaning. Outliers are data points that significantly deviate from other observations and can distort analysis. By identifying and removing or adjusting these outliers, we can prevent skewed results and ensure that the data accurately reflects the true conditions of the rock mass and tunnelling environment.".
[Audio] "Standardizing data formats ensures uniformity and compatibility across various datasets. This involves converting data into a consistent format, such as ensuring all measurements are in the same units or that time formats are standardized. Standardization simplifies data integration and analysis, making it easier to combine datasets and draw meaningful conclusions." In the context of feature scaling, data is generally processed in one of two ways: data standardization or data normalization, sometimes referred to as min max scaling. Data normalization refers to shifting the values of your data so they fall between 0 and 1. Data standardization, in this context, is used as a scaling technique to establish the mean and the standard deviation at 0 and 1, respectively..
[Audio] "After cleaning and standardization, data validation ensures that the dataset is accurate and complete. This step involves cross checking the data against known standards or expected values and correcting any remaining errors. Validation is crucial to confirm that the dataset is ready for further analysis or modelling, ensuring that all data is reliable and accurate.".
[Audio] "Finally, data storage involves securely saving the integrated, cleaned, standardized and validated data. Efficient storage solutions, like databases and cloud storage, allow for easy access and retrieval of data for analysis. Secure storage ensures that the data is protected from loss or unauthorized access, maintaining its integrity and availability for future use.".
[Audio] "In conclusion, the data management cycle is vital for ensuring that data is reliable, accurate, and ready for analysis. From integration and cleaning to standardization, validation, and storage, each step plays a crucial role in transforming raw data into valuable insights. By meticulously following these steps, engineers and analysts can make informed decisions, drive project success, and ultimately advance the field of tunnelling engineering.".