바이트 배열을 문자열로 변환 (Java)

IT story

바이트 배열을 문자열로 변환 (Java)

hot-time 2020. 9. 15. 19:24

바이트 배열을 문자열로 변환 (Java)

Google App Engine에서 웹 애플리케이션을 작성 중입니다. 사람들은 기본적으로 .htmlblobstore에 파일로 저장되는 html 코드를 편집 할 수 있습니다 .

fetchData를 사용 byte[]하여 파일의 모든 문자 를 반환 합니다. 사용자가 html 코드를 편집 할 수 있도록 html로 인쇄하려고합니다. 모든 것이 잘 작동합니다!

이제 내 유일한 문제는 다음과 같습니다.

바이트 배열은 문자열로 다시 변환 할 때 몇 가지 문제가 있습니다. 똑똑한 따옴표와 몇 개의 문자가 펑키하게 나오고 있습니다. (? 's 또는 일본어 기호 등) 특히 문제를 일으키는 음수 값이있는 몇 바이트입니다.

스마트 따옴표로 돌아오고있다 -108및 -109바이트 배열을. 이것이 왜이며 올바른 문자 인코딩을 표시하기 위해 음수 바이트를 어떻게 디코딩 할 수 있습니까?

바이트 배열에는 (알아야 할) 특수 인코딩의 문자가 포함됩니다. 문자열로 변환하는 방법은 다음과 같습니다.

String decoded = new String(bytes, "UTF-8");  // example for one encoding type

그런데-원시 바이트는 Java 데이터 유형 byte이 서명 되었기 때문에 음수 십진수로 나타날 수 있으며 -128에서 127까지의 범위를 포함합니다.

-109 = 0x93: Control Code "Set Transmit State"

값 (-109)은 유니 코드의 인쇄 할 수없는 제어 문자입니다. 따라서 UTF-8은 해당 문자 스트림에 대한 올바른 인코딩이 아닙니다.

0x93"Windows-1252"에서 찾고있는 "스마트 따옴표"는 해당 인코딩의 Java 이름이 "Cp1252"입니다. 다음 줄은 테스트 코드를 제공합니다.

System.out.println(new String(new byte[]{-109}, "Cp1252"));

Java 7 이상

원하는 인코딩을 StandardCharsetsString 의 Charset상수 로 생성자에 전달할 수도 있습니다 . 이것은 다른 답변에서 제안한 것처럼 인코딩을으로 전달하는 것보다 안전 할 수 있습니다 .String

예를 들어 UTF-8 인코딩의 경우

String bytesAsString = new String(bytes, StandardCharsets.UTF_8);

이것을 시도 할 수 있습니다.

String s = new String(bytearray);

public class Main {

    /**
     * Example method for converting a byte to a String.
     */
    public void convertByteToString() {

        byte b = 65;

        //Using the static toString method of the Byte class
        System.out.println(Byte.toString(b));

        //Using simple concatenation with an empty String
        System.out.println(b + "");

        //Creating a byte array and passing it to the String constructor
        System.out.println(new String(new byte[] {b}));

    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        new Main().convertByteToString();
    }
}

산출

65
65
A

public static String readFile(String fn)   throws IOException 
{
    File f = new File(fn);

    byte[] buffer = new byte[(int)f.length()];
    FileInputStream is = new FileInputStream(fn);
    is.read(buffer);
    is.close();

    return  new String(buffer, "UTF-8"); // use desired encoding
}

나는 제안한다 Arrays.toString(byte_array);

그것은 당신의 목적에 달려 있습니다. 예를 들어, 다음과 같은 디버그시 볼 수있는 형식과 똑같은 바이트 배열을 저장하고 싶었습니다 [1, 2, 3]. 바이트를 문자 형식으로 변환하지 않고 정확히 동일한 값을 저장하려면 Arrays.toString (byte_array)이렇게하십시오. 그러나 바이트 대신 문자를 저장하려면 String s = new String(byte_array). 이 경우 문자 형식 s과 동일합니다 [1, 2, 3].

The previous answer from Andreas_D is good. I'm just going to add that wherever you are displaying the output there will be a font and a character encoding and it may not support some characters.

To work out whether it is Java or your display that is a problem, do this:

    for(int i=0;i<str.length();i++) {
        char ch = str.charAt(i);
        System.out.println(i+" : "+ch+" "+Integer.toHexString(ch)+((ch=='\ufffd') ? " Unknown character" : ""));
    }

Java will have mapped any characters it cannot understand to 0xfffd the official character for unknown characters. If you see a '?' in the output, but it is not mapped to 0xfffd, it is your display font or encoding that is the problem, not Java.

참고URL : https://stackoverflow.com/questions/5673059/converting-byte-array-to-string-java

'IT story' 카테고리의 다른 글

Eclipse에서 코드 블록을 축소하는 방법은 무엇입니까? (0)	2020.09.15
Android WebView로 기존 .html 파일로드 (0)	2020.09.15
EmberJS / Ember Data에서 단일 경로로 여러 모델을 사용하는 방법은 무엇입니까? (0)	2020.09.15
SQL Server 브라우저를 시작할 수 없습니다. (0)	2020.09.15
Roslyn없이 웹 사이트 게시 (0)	2020.09.15

현재글바이트 배열을 문자열로 변환 (Java)

hot-time

바이트 배열을 문자열로 변환 (Java)

바이트 배열을 문자열로 변환 (Java)

'IT story' 카테고리의 다른 글

'IT story'의 다른글

티스토리툴바

바이트 배열을 문자열로 변환 (Java)

바이트 배열을 문자열로 변환 (Java)

'IT story' 카테고리의 다른 글

'IT story'의 다른글

관련글

티스토리툴바