Senior Data Solution Architect with 11+ yearsβ experience designing and optimizing scalable data solutions. Expert in ETL pipelines, big data processing, and cloud architectures (Talend, NiFi, Airflow, Informatica) across AWS, Azure, and GCP. Skilled in data warehousing (Star, Snowflake, Data Vault) and big data tools (Hadoop, Spark, Kafka, HDFS) for real-time streaming. Strong in data governance, ensuring quality, metadata management, and compliance (HIPAA, GDPR). Experienced in deploying ML models (Scikit-learn, TensorFlow, PyTorch) via Databricks. Proficient in data visualization (Tableau, Power BI, QuickSight, Plotly) to deliver insights. Adept in DevOps practices with Docker, Kubernetes, and CI/CD pipelines for efficient delivery.
Little more about me!
A quick snapshot of my toolkit
Designed and led the development of a real-time healthcare analytics platform integrating EHR and claims data using Apache Kafka, Apache Flink, and AWS Kinesis.
Enabled predictive insights for population health management and reduced data processing latency by 60%.
Deployed HIPAA-compliant data pipelines with Apache NiFi and Airflow on AWS, enhancing care quality and regulatory compliance.
Led the migration of legacy on-premises data infrastructure to a unified cloud-based lakehouse using Databricks and Delta Lake on Azure.
Streamlined ETL workflows using Apache Spark and Talend, improving data refresh rates by 70%.
Integrated machine learning models with MLflow to forecast energy demands, increasing predictive accuracy by 30%.
Developed scalable ETL pipelines with Apache Beam, Python, and Google Cloud Dataflow, processing over 10 million financial records daily.
Designed a cloud-native data lake on GCP, enabling seamless access to structured and unstructured data for cross-team analytics.
Implemented automated data validation and quality checks using Great Expectations, reducing data inconsistencies by 40%.
Designed and deployed a centralized ML Feature Store using Databricks, MLflow, and Feast, enabling 3Γ faster model iterations. Reduced fraud detection false positives by 18% through real-time feature engineering.