Transforming text columns – Cleansing, Transforming, and Shaping Data

Transforming text columns

Column transformations for text data type columns include the following:

  • Split Column – This allows you to split a column of text by delimiter, number of characters, positions, lowercase to uppercase, uppercase to lowercase, digit to non-digit, and non-digit to digit.
  • Format – This provides formatting options that are applied to every row of the selected column. Options include changing all text to lowercase or uppercase, capitalizing each word, and trimming spaces at the beginning or end of the text. This can also be used to add a prefix or suffix to text or remove control characters.
  • Extract – Extract is like split column capability, but it just leaves the extract value in the existing column rather than splitting a new column into two. For example, if there are delimiters in your text and you only need the text before the delimiter, then this can be used rather than splitting and then removing the unneeded column.
  • Parse – When working with XML or JSON data, this is used to transform the embedded XML or JSON data and turn it into tabular data.

Transforming number columns

Number data type columns can be transformed in these ways:

  • Statistics – Provides basic statistical functions to perform against the selected number column. Functions include sum, minimum, maximum, median, average, standard deviation, count values, and count distinct values.
  • Standard – Performs standard mathematic operations such as adding, subtracting, multiplying, dividing, integer-divide, modulo, percentage, and percentage of a value you enter against each data value in the column.
  • Scientific – Performs scientific mathematic operations such as absolute value, power (square, cube, n), square root, exponent, log (base-10, natural), or factorial against each data value in the column.
  • Trigonometry – Calculates trigonometric functions against each data value in the column. Functions include sine, cosine, tangent, arcsine, arccosine, and arctangent.
  • Rounding – Calculates the rounded value of each data value in the column. Specified as rounding up, down, or to a specified decimal place.
  • Information – Transforms a number into a true/false type based on whether the data value is odd or even. Can also change data values into 1 or -1 depending on the sign of the numeric value.

Author: Noah Walker

Leave a Reply

Your email address will not be published. Required fields are marked *