distributed-system

Spark Session

Spark Session is the entry point for spark applications to create RDD, DataFrame, and Dataset.

Posted August 22, 2022 by Rohith ‐ 8 min read

apache spark bigdata distributed-system spark-session

Spark Context

SparkContext is the the entry point for spark application prior to spark 2.x. SparkSession was introduced as a common entry point for SparkContext, SQLContext, StreamingContext, HiveContext. SparkContext is still being used even after spark 2.x release.

Posted August 23, 2022 by Rohith ‐ 4 min read

apache spark bigdata distributed-system

RDD in Spark

RDD (Resilient Distributed Dataset) in spark is a fundamental data structure of Spark. It is the primary data abstraction in Apache Spark and the Spark Core.

Posted August 31, 2022 by Rohith ‐ 5 min read

apache spark bigdata distributed-system spark-fundamentals rdd

Parallelize() In Spark

Parallelize() is the SparkContext method used to create rdd from the list of elements.

Posted August 31, 2022 by Rohith ‐ 3 min read

apache spark bigdata distributed-system spark-fundamentals

Subscribe For More Content