MySQL 4.0 -> 5.0, unicode

Thursday, 03. 2. 2006  –  Category: all, sw

Things:

  • Loading a ISO-8559-1 encoded dump into a UTF-8 database breaks, fields are truncated at the first non-valid character. Not terribly surprising, but MySQL is silent about the breakage.
  • Content which looks like, and is declared, ISO-8559-1 might actually get rendered by browsers as CP1252 (aka MS-ANSI WINDOWS-1252)
  • The difference between these two are that 8859 doesn’t use 0x7f to 0x9f, but Windows does – for long hyphens, ellipsis etc
  • iconv -f cp1252 -t utf-8IYF

Links:

Leave a Reply