Spark Architecture

Apache Spark is a unified, open-source, distributed data processing engine for big data. In this article, we will discuss about the Spark architecture, its distributed nature and how it achieves processing of high volume data.

Posted August 4, 2022 by Rohith ‐ 7 min read

apache spark bigdata architecture transformations distributed-system actions rdd

RDD in Spark

RDD (Resilient Distributed Dataset) in spark is a fundamental data structure of Spark. It is the primary data abstraction in Apache Spark and the Spark Core.

Posted August 31, 2022 by Rohith ‐ 5 min read

apache spark bigdata distributed-system spark-fundamentals rdd

RDD/DataFrame Type Check

RDD and DataFrame instance types can be checked in various ways in python, scala and java.

Posted August 6, 2022 by Rohith ‐ 2 min read

apache spark bigdata rdd dataframe python java scala

