Thursday, September 10, 2009

Remove spaces and nonprinting characters from text

http://office.microsoft.com/en-us/help/HP100627431033.aspx
Applies to: Microsoft Office Excel 2007


Sometimes text values contain leading, trailing, or multiple embedded space characters (Unicode character set (Unicode: A character encoding standard developed by the Unicode Consortium. By using more than one byte to represent each character, Unicode enables almost all of the written languages in the world to be represented by using a single character set.) values 32 and 160), or non-printing characters (Unicode character set values 0 to 31, 127, 129, 141, 143, 144, and 157). These characters can sometimes cause unexpected results when you sort, filter, or search. For example, users may make typographical errors by inadvertently adding extra space characters, or imported text data from external sources may contain nonprinting characters embedded in the text. Because these characters are not easily noticed, the unexpected results may be difficult to understand. To remove these unwanted characters, you can use a combination of the TRIM, CLEAN, and SUBSTITUTE functions.

The TRIM function removes spaces from text except for single spaces between words. The CLEAN function removes all nonprintable characters from text. Both functions were designed to work with 7-bit ASCII, which is a subset of the ANSI character set (ANSI character set: An 8-bit character set used by Microsoft Windows that allows you to represent up to 256 characters (0 through 255) by using your keyboard. The ASCII character set is a subset of the ANSI set.). It's important to understand that the first 128 values (0 to 127) in 7-bit ASCII represent the same characters as the first 128 values in the Unicode character set.

The TRIM function was designed to trim the 7-bit ASCII space character (value 32) from text. In the Unicode character set, there is an additional space character called the nonbreaking space character that has a decimal value of 160. This character is commonly used in Web pages as the HTML entity,  . By itself, the TRIM function does not remove this nonbreaking space character.

The CLEAN function was designed to remove the first 32 non-printing characters in the 7 bit ASCII code (values 0 through 31) from text. In the Unicode character set, there are additional nonprinting characters (values 127, 129, 141, 143, 144, and 157). By itself, the CLEAN function does not remove these additional nonprinting characters.

To do this task, use the SUBSTITUTE function to replace the higher value Unicode characters with the 7-bit ASCII characters for which the TRIM and CLEAN functions were designed.
Example

The example may be easier to understand if you copy it to a blank worksheet.

ShowHow to copy an example

1. Create a blank workbook or worksheet.
2. Select the example in the Help topic.

Note Do not select the row or column headers.

Selecting an example from Help
Selecting an example from Help
3. Press CTRL+C.
4. In the worksheet, select cell A1, and press CTRL+V.
5. To switch between viewing the results and viewing the formulas that return the results, press CTRL+` (grave accent), or on the Formulas tab, in the Formula Auditing group, click the Show Formulas button.


1
2
3
4
5

A
Data
BD 122
="XY"&CHAR(7)&"453"
BD 122
MN987
Formula Description (Result)
=TRIM(A2) Removes the trailing space from the string "BD 122 " (BD 112)
=CLEAN(A3) Removes the nonprinting BEL character (ASCII value of 7) from the string value created by the expression ="XY"&CHAR(7)&"453" (XY453)
=TRIM(SUBSTITUTE(A4,CHAR(160),CHAR(32))) Replaces each nonbreaking space character (Unicode value of 160) with a space character (ASCII value of 32) by using the SUBSTITUTE function, and then removes the leading and multiple embedded spaces from the string " BD 122" (BD 112)
=CLEAN(SUBSTITUTE(A5,CHAR(127),CHAR(7))) Replaces the nonprinting DEL character (ASCII value of 127) with a BEL character (ASCII value of 7) by using the SUBSTITUTE function, and then removes the BEL character from the string "MN987" (MN987)
Function details

TRIM

CLEAN

SUBSTITUTE

No comments:

Post a Comment