✨ v1.2 SaaS-Grade Engine

Hybrid Intelligent Dataset Cleaner.

A local-first, ultra-professional data sanitization pipeline combining deterministic rules, statistical models, and deep autoencoders to guarantee 100% data integrity.

System HealthActive

Rule EngineDeterministic

Neural NetPyOD / Autoencoder

SmartImputerVariance Aware

PrivacyLocal-First (MD5)

Data Integrity First

CleanIq operates under a strict philosophy: Format > Fix > Impute > Flag > Report.

PyOD Neural Nets

Deep Autoencoders evaluate the Reconstruction Error to flag multivariate anomalies.

Local-First Privacy

Structural fingerprints are learned without sending raw sensitive data to the cloud.

Architecture

The Data Sanitization Pipeline

CleanIq operates in sequence to minimize computational overhead while maximizing anomaly detection precision. It transitions from deterministic rules to deep-learning inference.

Rule Engine

Deterministic cleaning like whitespace stripping and booleans.

Classical ML

SmartImputer strategy selection (Mean/Median/Mode) based on skewness.

Statistical Outliers

Isolation Forest and IQR checks for mathematical anomalies.

Neural Net

PyOD Autoencoder for high-dimensional structural flags.

User guides

The Core CleanIQ Workflows

Each guide maps to a real application surface so users can move from documentation into the product without hunting.

Upload

Bring CSV or TSV files into CleanIQ with plan-aware validation before a dataset record is created.

1Select a file
2Validate size and format
3Create the dataset record

Open workflow

Profile

Inspect row counts, column counts, file size, status, and quality score before any transformation is applied.

1Open dataset
2Review metadata
3Find quality signals

Open workflow

Transform

Build auditable cleaning flows with operations that are previewed before they become exports.

1Pick operations
2Preview results
3Apply with audit context

Open workflow

Export

Download cleaned data and keep teams aligned with consistent dataset history and export surfaces.

1Choose dataset
2Select format
3Download output

Open workflow

Engine Insight

SaaS-Grade Training

The training pipeline handles structural fingerprinting and synthetic corruption at scale.

Recursive Target Scanning

Recursively builds MD5 hashed schemas out of tabular data up to infinity rows, parsing thousands of sub-directories silently.

Synthetic Dirty Generator

Deliberately injects advanced Mojibake failures, stray delimiters, and swaps to push the autoencoders to the limits.

train_models.py

$ python train_models.py --no-synthetic

Found 650 training files (35.63 GB total)
Max rows per file: All (memory-efficient)
Schema diversity: 23 unique layouts
Training DeepAutoEncoder... Done.
Training IsolationForest... Done.

Plan limits

Governance & Scale

Upload, run, storage, and retention limits are calculated based on your active subscription tier.

Plan	Price	Max file	Files / month	Runs / month	Storage	Downloads	Seats	Share links	Retention
Starter	$0forever	5 MB	10	10	100 MB	500 MB	1	0	7 days
Plus	$29/month	25 MB	100	200	2 GB	10 GB	3	25	30 days
Pro	$79/month	100 MB	500	1,000	20 GB	100 GB	10	250	180 days
Enterprise	Custom	250 MB	Unlimited	Unlimited	Custom	Custom	Unlimited	Unlimited	3,650 days

Implementation Hub

Developer References

Next.js App Router Firebase Admin SDK Firestore Rules Cloud Storage Rules Papa Parse PyOD Documentation