今天解析 XML 遇到有人用编辑器编辑的 xml 中是 UTF-8 有 BOM 的编码格式,运用 SaxReader 解析时出现了异常,异常原因是有 BOM 的解析前几位是 EF BB BF,试了好几种方式不行,找了个折中的办法:读内容重新写一个文件出来……,代码来自网上,结尾给链接
import java.io.*;
public class UTF8ToAnsiUtils { // FEFF because this is the Unicode char represented by the UTF-8 byte order mark (EF BB BF). public static final String UTF8_BOM = "\uFEFF"; public static void main(String args[]) { try { if (args.length != 2) { System.out .println("Usage : java UTF8ToAnsiUtils utf8file ansifile"); System.exit(1); } boolean firstLine = true; FileInputStream fis = new FileInputStream(args[0]); BufferedReader r = new BufferedReader(new InputStreamReader(fis, "UTF8")); FileOutputStream fos = new FileOutputStream(args[1]); Writer w = new BufferedWriter(new OutputStreamWriter(fos, "Cp1252")); for (String s = ""; (s = r.readLine()) != null;) { if (firstLine) { s = UTF8ToAnsiUtils.removeUTF8BOM(s); firstLine = false; } w.write(s + System.getProperty("line.separator")); w.flush(); } w.close(); r.close(); System.exit(0); } catch (Exception e) { e.printStackTrace(); System.exit(1); } } private static String removeUTF8BOM(String s) { if (s.startsWith(UTF8_BOM)) { s = s.substring(1); } return s; } }
欢迎来到这里!
我们正在构建一个小众社区,大家在这里相互信任,以平等 • 自由 • 奔放的价值观进行分享交流。最终,希望大家能够找到与自己志同道合的伙伴,共同成长。
注册 关于