IT story

Java 식별자에서 "연결 문자"란 무엇입니까?

hot-time 2020. 5. 6. 21:06
반응형

Java 식별자에서 "연결 문자"란 무엇입니까?


SCJP를 읽고 있는데이 줄과 관련하여 질문이 있습니다.

식별자는 문자, 통화 문자 ($) 또는 밑줄 (_)과 같은 연결 문자로 시작해야합니다. 식별자는 숫자로 시작할 수 없습니다!

유효한 식별자 이름은 밑줄 과 같은 연결 문자로 시작할 수 있습니다 . 밑줄 만 유효한 옵션이라고 생각 했습니까? 다른 연결 문자 가 있습니까?


연결 문자 목록은 다음과 같습니다. 이들은 단어를 연결하는 데 사용되는 문자입니다.

http://www.fileformat.info/info/unicode/category/Pc/list.htm

U+005F _ LOW LINE
U+203F ‿ UNDERTIE
U+2040 ⁀ CHARACTER TIE
U+2054 ⁔ INVERTED UNDERTIE
U+FE33 ︳ PRESENTATION FORM FOR VERTICAL LOW LINE
U+FE34 ︴ PRESENTATION FORM FOR VERTICAL WAVY LOW LINE
U+FE4D ﹍ DASHED LOW LINE
U+FE4E ﹎ CENTRELINE LOW LINE
U+FE4F ﹏ WAVY LOW LINE
U+FF3F _ FULLWIDTH LOW LINE

이것은 Java 7에서 컴파일됩니다.

int _, ‿, ⁀, ⁔, ︳, ︴, ﹍, ﹎, ﹏, _;

예입니다. 이 경우 tp열 이름과 주어진 행의 값입니다.

Column<Double> ︴tp︴ = table.getColumn("tp", double.class);

double tp = row.getDouble(︴tp︴);

다음과 같은

for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++)
    if (Character.isJavaIdentifierStart(i) && !Character.isAlphabetic(i))
        System.out.print((char) i + " ");
}

인쇄물

$ _ ¢ ¥ ৲ ৲ ৳ ৻ ૱ ௹ ฿ ៛ ‿ ⁀ ⁔ ₠ ₡ ₧ ₣ ₤ ₥ ₫ € ₭ ₮ ₯ ₰ ₱ ₷ ₳ ₵ ₶ ₷ ︳ ︴ ﹍ ﹍ ﹎ ﹏ ﹩ $ _ ¢ £ ¥ ₩


전체 65k 문자를 반복하고 물어보십시오 Character.isJavaIdentifierStart(c). 답은 "undertie"decimal 8255입니다.


올바른 Java 식별자의 명확한 사양은 Java 언어 사양 에서 확인할 수 있습니다 .


다음은 유니 코드로 된 커넥터 문자 목록 입니다. 키보드에서 찾을 수 없습니다.

U + 005F LOW LINE _
U + 203F UNDERTIE ‿
U + 2040 문자 넥타이 ⁀
U + 2054 반전 된 UNDERTIE ⁔
U + FE33 수직 로우 라인
용 프리젠 테이션 양식 U + FE34 수직 웨이 비 로우 라인 용 ︴
U + FE4D LOW LINE ﹍
U + FE4E 중심선 LOW LINE ﹎
U + FE4F Wavy를 LOW LINE ﹏
U + FF3F 전각 LOW LINE _


연결 문자는 두 문자를 연결하는 데 사용됩니다.

Java에서 연결 문자는 Character.getType (int codePoint) / Character.getType (char ch)Character.CONNECTOR_PUNCTUATION 과 동일한 값을 리턴하는 문자 입니다.

Note that in Java, the character information is based on Unicode standard which identifies connecting characters by assigning them the general category Pc, which is an alias for Connector_Punctuation.

The following code snippet,

for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++) {
    if (Character.getType(i) == Character.CONNECTOR_PUNCTUATION
            && Character.isJavaIdentifierStart(i)) {
        System.out.println("character: " + String.valueOf(Character.toChars(i))
                + ", codepoint: " + i + ", hexcode: " + Integer.toHexString(i));
    }
}

prints the connecting characters that can be used to start an identifer on jdk1.6.0_45

character: _, codepoint: 95, hexcode: 5f
character: ‿, codepoint: 8255, hexcode: 203f
character: ⁀, codepoint: 8256, hexcode: 2040
character: ⁔, codepoint: 8276, hexcode: 2054
character: ・, codepoint: 12539, hexcode: 30fb
character: ︳, codepoint: 65075, hexcode: fe33
character: ︴, codepoint: 65076, hexcode: fe34
character: ﹍, codepoint: 65101, hexcode: fe4d
character: ﹎, codepoint: 65102, hexcode: fe4e
character: ﹏, codepoint: 65103, hexcode: fe4f
character: _, codepoint: 65343, hexcode: ff3f
character: ・, codepoint: 65381, hexcode: ff65

The following compiles on jdk1.6.0_45,

int _, ‿, ⁀, ⁔, ・, ︳, ︴, ﹍, ﹎, ﹏, _, ・ = 0;

Apparently, the above declaration fails to compile on jdk1.7.0_80 & jdk1.8.0_51 for the following two connecting characters (backward compatibility...oops!!!),

character: ・, codepoint: 12539, hexcode: 30fb
character: ・, codepoint: 65381, hexcode: ff65

Anyway, details aside, the exam focuses only on the Basic Latin character set.

Also, for legal identifers in Java, the spec is provided here. Use the Character class APIs to get more details.


One of the most, well, fun characters that is allowed in Java identifiers (however not at the start) is the unicode character named "Zero Width Non Joiner" (&zwnj;, U+200C, https://en.wikipedia.org/wiki/Zero-width_non-joiner).

I had this once in a piece of XML inside an attribute value holding a reference to another piece of that XML. Since the ZWNJ is "zero width" it cannot be seen (except when walking along with the cursor, it is displayed right on the character before). It also couldn't be seen in the logfile and/or console output. But it was there all the time: copy & paste into search fields got it and thus did not find the referred position. Typing the (visible part of the) string into the search field however found the referred position. Took me a while to figure this out.

Typing a Zero-Width-Non-Joiner is actually quite easy (too easy) when using the European keyboard layout, at least in its German variant, e.g. "Europatastatur 2.02" - it is reachable with AltGr + ".", two keys which unfortunately are located directly next to each other on most keyboards and can easily be hit together accidentally.

Back to Java: I thought well, you could write some code like this:

void foo() {
    int i = 1;
    int i‌ = 2;
}

with the second i appended by a zero-width-non-joiner (can't do that in the above code snipped in stackoverflow's editor), but that didn't work. IntelliJ (16.3.3) did not complain, but JavaC (Java 8) did complain about an already defined identifier - it seems JavaC actually allows the ZWNJ character as part of an identifier, but when using reflection to see what it does, the ZWNJ character is stripped off the identifier - something that characters like ‿ aren't.


The list of characters you can use inside your identifiers (rather than just at the start) is much more fun:

for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++)
    if (Character.isJavaIdentifierPart(i) && !Character.isAlphabetic(i))
        System.out.print((char) i + " ");

The list is:

I wanted to post the output, but it's forbidden by the SO spam filter. That's how fun it is!

It includes most of the control characters! I mean bells and shit! You can make your source code ring the fn bell! Or use characters which will only be displayed sometimes, like the soft hyphen.

참고URL : https://stackoverflow.com/questions/11774099/what-are-connecting-characters-in-java-identifiers

반응형