Rohith

Creator of whiletrue.live.

Spark Architecture

Apache Spark is a unified, open-source, distributed data processing engine for big data. In this article, we will discuss about the Spark architecture, its distributed nature and how it achieves processing of high volume data.

Posted August 4, 2022 by Rohith ‐ 7 min read

⌖ apache spark bigdata architecture transformations distributed-system actions rdd

Data Types in SQL

Data types are used to represent the nature of the data that can be stored in the database table. For example, in a particular column of a table, if we want to store a string type of data then we will have to declare a string data type of this column.

Posted September 13, 2022 by Anusha and Rohith ‐ 4 min read

⌖ sql rdbms data-types

Spark Memory Management

The main feature of apache spark is its ability to run computations in memory. Hence, it is obvious that memory management plays a very important role in the whole system. In this article we will dive into spark memory management.

Posted August 9, 2022 by Rohith ‐ 11 min read

⌖ apache spark bigdata architecture memory jvm yarn heap off-heap distributed-system gc

Operators in SQL

The SQL reserved words and characters are called operators, which are used with a WHERE clause in a SQL query. In SQL, an operator can either be a unary or binary operator. The unary operator uses only one operand for performing the unary operation, whereas the binary operator uses two operands for performing the binary operation.

Posted September 13, 2022 by Anusha and Rohith ‐ 4 min read

⌖ sql rdbms operators

Spark Session

Spark Session is the entry point for spark applications to create RDD, DataFrame, and Dataset.

Posted August 22, 2022 by Rohith ‐ 8 min read

⌖ apache spark bigdata distributed-system spark-session