shorya sharmaBeyond Batch: Real-Time ETL with Spark Structured Streaming on Databricks using best practices…In the dynamic landscape of big data analytics, the demand for real-time insights has become paramount. As organizations strive to make…7 min read·Dec 26, 2023----
shorya sharmaPayment Risk and Fraud: Part 3— A Decision Tree Case Study for Enhanced Business PerformanceThis is the final part of our series where we will work on a case study and will develop a Decision tree to reduce fraud.4 min read·Nov 28, 2023----
shorya sharmaPayment Risk and Fraud: Part 2 — Understanding Different Transaction RisksWelcome back to our journey into the intricate landscape of payment risks and fraud. In Part 1, we laid the foundation, understanding the…6 min read·Nov 18, 2023----
shorya sharmaPayment Risk and Fraud : Part 1 — The OverviewPicture this: you’re buying something online, and in the blink of an eye, your payment zips through the internet to make the purchase…10 min read·Nov 11, 2023----
shorya sharmaData Engineering Interview QuestionsIn this blog, we’ll delve into some common data engineering questions and solutions, showcasing the techniques and best practices that…5 min read·Sep 23, 2023----
shorya sharmaPyspark Interview Preparation Part 3: Coding PracticeIn this blog we will cover two Pyspark Questions for Interview preperation, both the questions are in the form of case study that companies…4 min read·Sep 2, 2023----
shorya sharmaUnlocking Insights: Interpreting Clustering Results through Decision TreesIn the world of data analysis and pattern recognition, clustering stands as a powerful technique to uncover underlying structures within…5 min read·Jul 22, 2023----
shorya sharmaCredit card fraud detection with Snap ML and Scikit learnWhat is Scikit Learn?6 min read·Jul 2, 2023----
shorya sharmaMachine learning with Pyspark MLlib: Part 1 RegressionMLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. At a high level, it…7 min read·Jun 20, 2023----
shorya sharmaDeveloping a DataLake using DataprocThis is part 3 and final part of our series on basics of data engineering on google cloud and hence will be a long one by the end of this…7 min read·Apr 29, 2023----