Return
Batch Text Replacer - BrineSoft Online Help Prev Page Prev Page
Batch Text Replacer
Supported File Types
Back-up of Files
File Encoding
Normalize White Spaces
Ignore White Spaces
Setting It Up
Folders
Files
Strings
Options
Working with Wildcards
Command Line Options
Purchase the Product
Register the Product
Registration Benefits
System Requirements
 BrineSoft Main Page | Batch Text Replacer Page | Download this Help File

File Encoding

Auto
File encoding is automatically resolved. First, the zero width no-break space character (known also as byte order mark) is checked at the beginning of the text. If found the file encoding is determined accordingly. If not found the content of the file is examined. If the majority of even bytes is non-zero and the majority of odd bytes is zero the file is considered Unicode. If the ratio of non-zero and zero bytes is reversed the file is considered Unicode (big endian). If the above method doesn't yield a result the content is searched for an occurence of a byte higher than 127. If found then the sequence of bytes is examined. If a violation of the rules for UTF-8 encoding is not found the file is considered UTF-8. Otherwise, encoding defaults to ANSI.

ANSI
File is processed as if it were ANSI encoded regardless what the content is.

UTF-8
File is processed as if it were UTF-8 encoded regardless what the content is.

Unicode
File is processed as if it were Unicode encoded regardless what the content is. Also the byte order mark is ignored.

Unicode (big endian)
File is processed as if it were Unicode (big endian) encoded regardless what the content is. Also the byte order mark is ignored.

 

The file encoding in use also determines the encoding for the the replace-with strings. If the source file doesn't contain the byte order mark and it doesn't contain character higher than 127 it can be considered valid both ANSI and UTF-8 encoded file. A find-what string that doesn't contain a character higher than 127 will be found in either coding. If the replace-with string contains characters higher than 127 (such as letter with diacritics), the proper file encoding should be selected. In all other cases the option Auto will yield correct results.

 

See also
Overview, Setting It Up, Files