Boneyard Tools

Text File Encoding Detector

Drop in a text file to see what encoding it actually uses. This tool reads the raw bytes and checks for a byte-order mark, then scans for the bit patterns of valid UTF-8 to decide between UTF-8, ASCII, and the single-byte Latin family. It also reports the dominant line ending and whether the file is really binary. Everything is read in your browser and nothing is uploaded.

How to detect a text file's encoding

  1. Drag any text file onto the box, or click browse to pick one.
  2. Read the detected encoding, byte-order mark, and line ending.
  3. Use the binary and confidence flags to confirm it is really text.

Examples

A Windows export with a BOM

notes.txt saved as UTF-8 from a Windows editor
Encoding: UTF-8 with BOM, line ending: CRLF, confidence: high

Frequently asked questions

Is my file uploaded anywhere?

No. The file is read and analyzed entirely in your browser using JavaScript. Nothing is sent to a server, so even confidential text stays on your device.

How can it tell UTF-8 from Latin-1 without a BOM?

It scans the bytes for the bit patterns UTF-8 requires: a lead byte (110xxxxx, 1110xxxx, or 11110xxx) must be followed by the right number of continuation bytes in the range 0x80 to 0xBF. If every high byte fits a valid UTF-8 sequence it is UTF-8. If those rules are broken but the bytes are otherwise printable, it is almost certainly a single-byte code page like Windows-1252 or ISO-8859-1.

What is a byte-order mark (BOM)?

A BOM is a short signature at the very start of a file that names its encoding. UTF-8 with a BOM begins with EF BB BF, UTF-16 LE with FF FE, UTF-16 BE with FE FF, and UTF-32 adds two more bytes. When a BOM is present the encoding is certain, which is why those results show high confidence.

Why does it sometimes report Latin-1 with only medium confidence?

The bytes of Windows-1252 and ISO-8859-1 (Latin-1) overlap, and neither carries a marker, so a file using one of them cannot be told apart from the other by content alone. The tool reports the family and lowers the confidence to signal that the exact code page is a guess.

What does it mean when a file is flagged as binary?

A file is treated as binary when it contains a NUL (0x00) byte or an unusually high share of control bytes. Tabs, line feeds, and carriage returns are normal in text and never count against it. Binary files have no meaningful text encoding, so the result is simply Binary.

How does it detect the line ending?

It counts the newline styles in the bytes: CRLF (the bytes 0D 0A, common on Windows), a lone LF (0A, common on macOS and Linux), and a lone CR (0D, old classic Mac). If only one style appears it is reported by name; if several appear the result is mixed, and if there are no line breaks it is none.

Related tools