transformations

Get Parquet Schema Using Python

Parquet is widely used in data transformations. Every parquet file has schema associated with it. As it is a binary file, we cannot read the data using any text editor. In this article, we use pyarrow python package to extract the parquet schema.

Posted August 17, 2022 by Rohith ‐ 1 min read

quick-references python transformations parquet blog

Parquet Avro Type Mapping

Avro and Parquet are used in many data processing frameworks like kafka, spark, etc. It is important to know the data types supported in avro and parquet data format. In this article, we will list the avro and parquet data type mapping.

Posted August 17, 2022 by Rohith ‐ 1 min read

avro parquet data-types transformations

Data Types in Parquet

Parquet is used in many data processing frameworks like apache flink, spark, etc. It is important to know the data types supported in parquet data format.

Posted August 17, 2022 by Rohith ‐ 1 min read

parquet data-types transformations

Data Types in Avro

Avro is used in many data processing frameworks like kafka, spark, etc. It is important to know the data types supported in avro data format.

Posted August 17, 2022 by Rohith ‐ 1 min read

avro data-types transformations

Spark Architecture

Apache Spark is a unified, open-source, distributed data processing engine for big data. In this article, we will discuss about the Spark architecture, its distributed nature and how it achieves processing of high volume data.

Posted August 4, 2022 by Rohith ‐ 7 min read

apache spark bigdata architecture transformations distributed-system actions rdd

Subscribe For More Content