Create Column Filled With Null Using PySpark
Posted July 29, 2022 by Rohith ‐ 1 min read
While working with spark dataframes, sometimes developer have to create a column filled with null.
Following example explains in creating a column in PySpark DataFrame filled with null values.
Example: for example, the pyspark dataframe you are working with is df
from pyspark.sql.types import StringType
from pyspark.sql.functions import lit
df = df.withColumn("nullFilledColumnName", lit(None).cast(StringType()))
Explanation:
withColumn()
PySpark DataFrame method is used to create new column.lit()
function is used to create a new column by adding values to that column in PySpark DataFrame.cast()
method is used to cast the datatype in pyspark
Here, we have given nullFilledColumnName
as column name and lit
method to fill the values with None (Internally spark converts to null
- as spark runs on jvm). And we cast the column to whatever type we want, as null
is the lowest level object in java there will not be any issues in converting to any type.