Create Column Filled With Null Using PySpark

Posted July 29, 2022 by Rohith ‐ 1 min read

While working with spark dataframes, sometimes developer have to create a column filled with null.

Following example explains in creating a column in PySpark DataFrame filled with null values.

Example: for example, the pyspark dataframe you are working with is df

from pyspark.sql.types import StringType
from pyspark.sql.functions import lit

df = df.withColumn("nullFilledColumnName", lit(None).cast(StringType()))

Explanation:

  • withColumn() PySpark DataFrame method is used to create new column.
  • lit() function is used to create a new column by adding values to that column in PySpark DataFrame.
  • cast() method is used to cast the datatype in pyspark

Here, we have given nullFilledColumnName as column name and lit method to fill the values with None (Internally spark converts to null - as spark runs on jvm). And we cast the column to whatever type we want, as null is the lowest level object in java there will not be any issues in converting to any type.

Read more about Java and JVM

Subscribe For More Content