bigdata

RDD in Spark

RDD (Resilient Distributed Dataset) in spark is a fundamental data structure of Spark. It is the primary data abstraction in Apache Spark and the Spark Core.

Posted August 31, 2022 by Rohith ‐ 5 min read

apache spark bigdata distributed-system spark-fundamentals rdd

Parallelize() In Spark

Parallelize() is the SparkContext method used to create rdd from the list of elements.

Posted August 31, 2022 by Rohith ‐ 3 min read

apache spark bigdata distributed-system spark-fundamentals

RDD/DataFrame Type Check

RDD and DataFrame instance types can be checked in various ways in python, scala and java.

Posted August 6, 2022 by Rohith ‐ 2 min read

apache spark bigdata rdd dataframe python java scala

Subscribe For More Content