Hands-On Labs

Data Lab Manual

AI Masterclass Practical Exercises for Manufacturing Data Analysis

Module Overview

This manual provides hands-on experience with real manufacturing datasets. Each lab follows a consistent structure designed for practical learning.

6-8 hours total
60-90 minutes per lab
5 Labs + Capstone
Progressive learning
Hands-On Practice
Real manufacturing data
Learning Objectives
  • Load and explore industrial datasets using Python and pandas
  • Perform statistical analysis on manufacturing process data
  • Visualize torque-angle curves and vibration patterns
  • Prepare datasets for machine learning applications
  • Interpret analytical results in a business context
  • Build basic classification models for quality prediction
  • Make data-driven decisions for quality control and predictive maintenance
  • Communicate analytical findings to plant operations teams
Prerequisites
  • Basic Python knowledge (or willingness to learn)
  • Python 3.x environment with pandas, numpy, matplotlib, scikit-learn
  • OR KNIME Analytics Platform installed
  • Understanding of basic statistics (mean, standard deviation, correlation)
Dataset Overview
DatasetRowsColumnsSizePurpose
conditionMonitoring.csv2,000691.6 MBVibration analysis
processTemperature.xlsx2005SmallThermal modeling
angleTorque.csv4,0001,00219.9 MBTorque-angle curves
processData.csv4,00026588 KBML features
breakaway.csv2193~20 KBQuality metrics

Lab Exercises

Condition Monitoring - Vibration Analysis

60-90 min • Beginner

Vibration monitoring is a cornerstone of predictive maintenance in manufacturing. By analyzing frequency-domain vibration data from accelerometers mounted on production machinery, we can detect early signs of bearing wear, misalignment, imbalance, and other mechanical faults before they cause costly failures or quality defects.

Learning Objectives
  • Understand frequency-domain vibration analysis
  • Identify patterns across X, Y, Z acceleration axes
  • Calculate statistical features for anomaly detection
  • Visualize vibration spectra
  • Interpret vibration signatures in manufacturing context
  • Build simple anomaly detection using statistical methods
Dataset: conditionMonitoring.csv
File: conditionMonitoring.csvRows: 2,000Columns: 69Size: 1.6 MB
Column NameTypeDescriptionUnitsRange
ConditionStringMachine operating stateCategoricalOff, On, etc.
xAcc010Hz - xAcc120HzFloatX-axis acceleration at frequencies 10-120 Hzm/s²0-20
yAcc010Hz - yAcc120HzFloatY-axis acceleration at frequencies 10-120 Hzm/s²0-15
zAcc010Hz - zAcc120HzFloatZ-axis acceleration at frequencies 10-120 Hzm/s²0-170

Step 1: Load and Explore the Data

Load the dataset and perform basic exploration

Step 1
python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load the dataset
df = pd.read_csv('Dataset - conditionMonitoring.csv')

# Basic exploration
print(f"Dataset Shape: {df.shape}")
print(f"\nColumn Names:\n{df.columns.tolist()}")
print(f"\nData Types:\n{df.dtypes}")
print(f"\nFirst 5 Rows:\n{df.head()}")
print(f"\nBasic Statistics:\n{df.describe()}")

Step 2: Understand the Column Structure

Separate columns by axis and extract frequencies

Step 2
python
# Separate columns by axis
x_cols = [col for col in df.columns if col.startswith('xAcc')]
y_cols = [col for col in df.columns if col.startswith('yAcc')]
z_cols = [col for col in df.columns if col.startswith('zAcc')]

print(f"X-axis columns: {len(x_cols)}")
print(f"Y-axis columns: {len(y_cols)}")
print(f"Z-axis columns: {len(z_cols)}")

# Extract frequencies from column names
frequencies = [int(col[4:7]) for col in x_cols]
print(f"\nFrequencies (Hz): {frequencies}")

# Examine the Condition column
print(f"\nCondition values:\n{df['Condition'].value_counts()}")

Step 3: Visualize Vibration Spectrum

Plot average vibration spectrum for each axis

Step 3
python
# Plot average vibration spectrum for each axis
fig, axes = plt.subplots(3, 1, figsize=(12, 10))

# Calculate mean values for each frequency
x_means = df[x_cols].mean().values
y_means = df[y_cols].mean().values
z_means = df[z_cols].mean().values

# X-axis spectrum
axes[0].bar(frequencies, x_means, color='blue', alpha=0.7)
axes[0].set_title('X-Axis Average Vibration Spectrum')
axes[0].set_xlabel('Frequency (Hz)')
axes[0].set_ylabel('Acceleration')

# Y-axis spectrum
axes[1].bar(frequencies, y_means, color='green', alpha=0.7)
axes[1].set_title('Y-Axis Average Vibration Spectrum')
axes[1].set_xlabel('Frequency (Hz)')
axes[1].set_ylabel('Acceleration')

# Z-axis spectrum
axes[2].bar(frequencies, z_means, color='red', alpha=0.7)
axes[2].set_title('Z-Axis Average Vibration Spectrum')
axes[2].set_xlabel('Frequency (Hz)')
axes[2].set_ylabel('Acceleration')

plt.tight_layout()
plt.show()

Step 4: Statistical Feature Extraction

Calculate statistical features for anomaly detection

Step 4
python
# Calculate statistical features for each sample
def extract_vibration_features(row, axis_cols):
    values = row[axis_cols].values
    return {
        'mean': np.mean(values),
        'std': np.std(values),
        'max': np.max(values),
        'min': np.min(values),
        'range': np.max(values) - np.min(values),
        'rms': np.sqrt(np.mean(values**2))
    }

# Extract features for X-axis
x_features = df.apply(lambda row: extract_vibration_features(row, x_cols), axis=1)
x_features_df = pd.DataFrame(x_features.tolist())
x_features_df.columns = ['x_' + col for col in x_features_df.columns]

print("X-axis Feature Statistics:")
print(x_features_df.describe())

Step 5: Anomaly Detection Exercise

Simple anomaly detection using Z-score

Step 5
python
from scipy import stats

# Calculate overall vibration energy (RMS across all frequencies)
df['total_rms'] = np.sqrt((df[x_cols]**2).mean(axis=1) +
                          (df[y_cols]**2).mean(axis=1) +
                          (df[z_cols]**2).mean(axis=1))

# Calculate Z-scores
df['rms_zscore'] = stats.zscore(df['total_rms'])

# Identify potential anomalies (|Z| > 2)
anomalies = df[np.abs(df['rms_zscore']) > 2]
print(f"Number of potential anomalies: {len(anomalies)}")
print(f"Anomaly percentage: {len(anomalies)/len(df)*100:.2f}%")
Interpretation Framework
Understanding Normal Results
  • X and Y axes: Low, consistent readings (typically 2-10 m/s²)
  • Z-axis: Dominated by gravity component at low frequencies (~150 m/s² at 10 Hz)
  • Standard deviation: Relatively low within each frequency band (< 5 m/s² variation)
Problem Pattern Indicators
PatternCauseAction
Elevated readings at 1x running speedImbalanceSchedule balancing
Peaks at 2x running speedMisalignmentCheck coupling alignment
Multiple harmonic peaksBearing defectInspect bearing immediately
Broadband increase across all frequenciesLoosenessCheck mounting bolts
Gradual increase over timeWear progressionMonitor closely
Troubleshooting Guide
ErrorCauseSolution
FileNotFoundErrorIncorrect file path
Verify file location with os.listdir()
KeyError: 'xAcc010Hz'Column name mismatch
Check exact column names with df.columns
ValueError: cannot convert float NaNMissing data
Use df.dropna() or df.fillna(0)
Memory ErrorDataset too large
Read in chunks or use dtype specifications
Extension Exercises
IntermediateExercise 1.1: Condition-Based Analysis

Compare vibration signatures between different machine conditions. Create a visualization showing how the frequency spectrum changes when the machine is On versus Off.

AdvancedExercise 1.2: Frequency Band Analysis

Divide the frequency spectrum into bands (10-40 Hz, 45-80 Hz, 85-120 Hz) and calculate the energy contribution of each band.

AdvancedExercise 1.3: Anomaly Detection Model

Build an Isolation Forest model to detect anomalies in the vibration data. Compare results with the Z-score method.

Key Takeaways

  • Frequency-domain analysis reveals specific mechanical issues
  • Different axes show different fault signatures
  • Statistical features (RMS, std) enable automated monitoring
  • Z-score is a simple but effective anomaly detection method
  • Vibration monitoring prevents costly unplanned downtime

Process Temperature Analysis

60 min • Beginner

Thermal management is critical in manufacturing processes. Understanding the relationship between power consumption, cooling systems, ambient conditions, and resulting process temperatures enables optimization of energy usage and early detection of cooling system degradation.

Learning Objectives
  • Perform correlation analysis between process variables
  • Build regression models to predict temperature
  • Visualize multivariate relationships
  • Identify optimal operating conditions
  • Interpret regression coefficients in manufacturing context
Dataset: processTemperature.xlsx
File: processTemperature.xlsxRows: 200Columns: 5Size: Small
Column NameTypeDescriptionUnitsRange
idIntegerUnique observation identifierN/A1-200
power_kWFloatElectrical power consumptionkilowatts0.9-9.7
fan_RPMFloatCooling fan rotational speedRPM750-2720
ambientTemp_CFloatSurrounding air temperature°C20.7-22.7
processTemp_CFloatMeasured process temperature°C36.6-67.3
Troubleshooting Guide
ErrorCauseSolution
openpyxl not foundMissing Excel library
pip install openpyxl
ValueError: Input contains NaNMissing data in features
df.dropna() before splitting
Singular matrixMulticollinearity
Remove highly correlated features

Key Takeaways

  • Correlation analysis reveals variable relationships
  • Linear regression quantifies feature impacts
  • Coefficient interpretation connects to physical meaning
  • Model optimization enables energy savings
  • R² and RMSE measure prediction quality

Torque-Angle Curve Analysis

90 min • Intermediate

In automotive and precision assembly, tightening operations must meet exact specifications. Torque-angle curves capture the complete signature of each tightening, enabling detection of defects that simple torque-only measurements miss.

Learning Objectives
  • Load and visualize high-dimensional curve data
  • Extract meaningful features from time-series curves
  • Identify quality patterns in tightening operations
  • Compare good vs defective tightening signatures
  • Build classification models for quality prediction
Dataset: angleTorque.csv
File: angleTorque.csvRows: 4,000Columns: 1,002Size: 19.9 MB
Column NameTypeDescriptionUnitsRange
ResultStringQuality outcomeCategoricalOK, NOK
MaxTorqueFloatMaximum torque achievedNmVaries
Torque_0 to Torque_999FloatTorque at each angle stepNm0-max
Troubleshooting Guide
ErrorCauseSolution
Memory ErrorLarge dataset (19.9 MB)
Read in chunks: pd.read_csv(..., chunksize=1000)
Slow plottingToo many curves
Sample subset: df.sample(100)

Key Takeaways

  • Torque-angle curves reveal hidden quality issues
  • Feature extraction reduces dimensionality
  • Curve shape analysis enables classification
  • Real-time monitoring prevents escapes
  • High-dimensional data requires careful handling

Engineered Features Classification

90 min • Intermediate

Machine learning classification enables automated quality prediction based on process parameters. This lab covers the complete workflow from feature analysis through model evaluation, with emphasis on handling imbalanced datasets common in manufacturing.

Learning Objectives
  • Analyze pre-engineered feature sets
  • Handle imbalanced classification problems
  • Build and tune Random Forest models
  • Interpret feature importance
  • Evaluate with appropriate metrics
Dataset: processData.csv
File: processData.csvRows: 4,000Columns: 26Size: 588 KB
Column NameTypeDescriptionUnitsRange
ResultStringQuality outcomeCategoricalOK, NOK
Feature_1 to Feature_25FloatEngineered process featuresVariousNormalized
Troubleshooting Guide
ErrorCauseSolution
Class imbalance warningFew NOK samples
Use class_weight='balanced'
OverfittingComplex model
Increase min_samples_leaf, reduce max_depth

Key Takeaways

  • Class imbalance is common in quality data
  • Random Forest handles non-linear relationships
  • Feature importance guides process improvement
  • Precision-recall tradeoff affects business decisions
  • Cross-validation ensures robust evaluation

Breakaway Torque Quality Control

60 min • Beginner

Statistical Process Control is fundamental to manufacturing quality. Control charts detect process shifts, while capability indices quantify how well a process meets specifications. This lab applies these classic methods to breakaway torque measurements.

Learning Objectives
  • Create and interpret control charts
  • Calculate process capability indices (Cp, Cpk)
  • Apply Western Electric run rules
  • Identify special cause variation
  • Make process improvement recommendations
Dataset: breakaway.csv
File: breakaway.csvRows: 219Columns: 3Size: ~20 KB
Column NameTypeDescriptionUnitsRange
SampleIntegerSample identifierN/A1-219
TorqueFloatMeasured breakaway torqueNmVaries
SpecificationStringPass/Fail statusCategoricalPass, Fail
Troubleshooting Guide
ErrorCauseSolution
Negative CpkMean outside spec limits
Process centering required
Control limits too wideHigh variation
Investigate assignable causes

Key Takeaways

  • Control charts detect process shifts early
  • Cpk measures process capability vs specifications
  • Run rules identify non-random patterns
  • SPC enables proactive quality management
  • Data-driven decisions improve consistency

Capstone Project

120 min • Advanced

Multi-Dataset Integration Challenge

You are a data analyst at a production facility. Management has requested a comprehensive quality dashboard that integrates vibration monitoring, temperature control, and tightening quality data.

Project Requirements
  • Load at least 3 datasets from the lab exercises
  • Create a unified analysis combining multiple data sources
  • Build at least one predictive model
  • Generate visualizations for executive presentation
  • Document findings and recommendations
Evaluation Criteria
CriterionWeightDescription
Data Quality20%Proper handling of missing values, outliers, data types
Analysis Depth30%Meaningful insights from each dataset
Integration25%Connections between different data sources
Presentation25%Clear visualizations and documentation