'&'를 '&'로 인코딩해야합니까?

IT story

'&'를 '&'로 인코딩해야합니까?

hot-time 2020. 5. 12. 08:01

'&'를 '&'로 인코딩해야합니까?

&내 사이트에서 HTML5 및 UTF-8과 함께 ' '기호를 사용하고 있습니다 <title>. Google은 제목의 모든 브라우저와 마찬가지로 SERP에 앰퍼샌드를 표시합니다.

http://validator.w3.org 가 나에게 이것을주고 있습니다 :

& 문자 참조를 시작하지 않았습니다. (그리고 아마도로 탈출했을 것 &입니다.)

정말로해야합니까 &?

유효성 검사를 위해 유효성을 검사하는 내 페이지에 대해 소란스럽지 않지만 이것에 대한 사람들의 의견을 듣고 궁금하고 중요한 이유는 무엇인지 궁금합니다.

예. 오류에서 알 수 있듯이 HTML에서 속성은 구문 분석되었음을 의미하는 #PCDATA입니다. 이는 속성에서 문자 엔터티를 사용할 수 있음을 의미합니다. &그 자체로 사용 하는 것은 잘못이며 관대 한 브라우저에 적합하지 않으며 이것이 XHTML이 아닌 HTML이라는 사실 때문에 구문 분석이 중단됩니다. 그냥 탈출하면 &모든 것이 잘 될 것입니다.

HTML5를 사용하면 이스케이프를 해제 할 수 있지만 다음에 나오는 데이터가 유효한 문자 참조처럼 보이지 않을 때만 가능합니다. 그러나 어떤 것이 필요하고 어떤 것이 필요하지 않은지에 대해 걱정하는 것보다이 심볼의 모든 인스턴스를 피하는 것이 좋습니다.

이 점을 명심하십시오. 이스케이프하지 않고 & amp;로 이동하지 않으면 생성 한 데이터 (코드가 매우 유효하지 않은 경우)에 적합하지 않으며, 태그 구분 기호를 이스케이프 처리하지 않을 수도 있습니다. 이는 사용자가 제출 한 데이터에 큰 문제입니다. HTML 및 스크립트 삽입, 쿠키 도용 및 기타 악용으로 이어질 수 있습니다.

코드를 피하십시오. 앞으로 많은 문제를 해결할 것입니다.

유효성 검사 외에, 특정 문자를 인코딩하는 것이 웹 페이지로 올 바르고 안전하게 렌더링 할 수 있도록 HTML 문서에 중요하다는 사실이 남아 있습니다.

저에게있어 모든 상황에서 &와 같이 인코딩 &하는 것은 더 쉬운 규칙이며 오류와 실패 가능성을 줄입니다.

다음을 비교하십시오 : 어느 것이 더 쉬운가요? 이는 쉽게 까지 놈에 ?

방법론 1

앰퍼샌드 문자가 포함 된 내용을 작성하십시오.
그것들을 모두 인코딩하십시오.

방법론 2

(소금 한알로주세요)

앰퍼샌드 문자가 포함 된 내용을 작성하십시오.
사례별로 각 앰퍼샌드를보십시오. 다음을 결정하십시오.
- 그것은 격리되어 있으며, 분명하게 앰퍼샌드입니다. 예. volt & amp
  >이 경우 인코딩을 방해하지 마십시오.
- 독립 체는 아니지만 결과 엔티티가 존재하지 않으며 엔티티 목록이 절대로 진화 할 수 없기 때문에 존재하지 않기 때문에 모호하지 않다고 생각합니다. 예를 들어 amp&volt
  >이 경우 인코딩을 방해하지 마십시오.
- 고립되지 않고 모호합니다. 예. volt&amp
  > 인코딩하십시오.

나는 이것을 철저히 연구하고 여기에 나의 발견에 대해 썼다 : http://mathiasbynens.be/notes/ambiguous-ampersands

또한 세미콜론으로 끝나지 않는 모호한 앰퍼샌드 또는 문자 참조에 대한 마크 업을 확인하는 데 사용할 수 있는 온라인 도구 를 만들었습니다 . 둘 다 유효하지 않습니다. (현재 HTML 유효성 검사기가이 작업을 올바르게 수행하지 않습니다.)

HTML5 규칙은 HTML4와 다릅니다. 앰퍼샌드가 매개 변수 이름을 시작하는 것처럼 보이지 않는 한 HTML5에서는 필요하지 않습니다. "& copy = 2"는 예를 들어 & copy; 저작권 기호입니다.

그러나 다음 텍스트에 따라 인코딩하거나 인코딩하지 않기로 결정하기가 더 어려워 보입니다. 가장 쉬운 방법은 아마도 항상 인코딩하는 것입니다.

나는 이것이 "브라우저가 신경 쓰지 않을 때 왜 스펙을 따르는가"라는 질문에 더 많은 문제로 바뀌 었다고 생각합니다. 다음은 일반적인 답변입니다.

표준은 "현재"가 아닙니다. 그것들은 "미래"입니다. 개발자로서 웹 표준을 따르는 경우 브라우저 공급 업체가 해당 표준을 올바르게 구현할 가능성이 높으며 CSS 해킹, 기능 감지 및 브라우저 감지가 필요하지 않은 완전히 상호 운용 가능한 웹에 더 가까이 다가갑니다. 레이아웃이 특정 브라우저에서 깨지는 이유 또는 해결 방법을 알 필요가없는 곳.

특히 HTML5에 & amp; 특정 상황에서 HTML5 doctype을 사용하고 있으며 사용자가 HTML5 호환 브라우저를 사용할 것으로 예상하는 경우에는 그렇게 할 이유가 없습니다.

글쎄, 그것이 사용자 입력에서 온다면 명백한 이유에서 절대적으로 그렇습니다. 이 웹 사이트가 그렇게하지 않았다고 생각하십시오.이 질문의 제목은 '&'를 '&'로 인코딩해야합니까?

그것이 echo '<title>Dolce & Gabbana</title>';엄밀히 말하면 꼭 할 필요는 없습니다. 더 나을 것이지만, 사용자가 없으면 차이를 알 수 있습니다.

당신의 title실제 모습을 보여 주 시겠습니까? 제출할 때

<!DOCTYPE html>
<html>
<title>Dolce & Gabbana</title>
<body>
<p>am i allowed loose & mpersands?</p>
</body>
</html>

에 http://validator.w3.org/을 - 명시 적으로 실험 HTML 5 모드를 사용하도록 요구 - 그것은에 대한 불만이 없습니다 &들 ...

HTML에서 a &는 문자 참조 또는 엔티티 참조 중 하나의 참조 시작을 나타냅니다 . 이 시점에서 구문 분석기는 #문자 참조를 표시하거나 엔티티 참조를 표시하는 엔티티 이름을 나타내는 것으로 예상됩니다 ;. 이것이 정상적인 행동입니다.

참조 이름 또는 단지 참조 개구하지만 &공백 또는 다른 구분 기호 뒤에 같은 ", ', <, >, &, 종료 ;, 심지어 참조 일반을 대표하는 것은 &생략 할 수 있습니다 :

<p title="&amp;">foo &amp; bar</p>
<p title="&amp">foo &amp bar</p>
<p title="&">foo & bar</p>

이러한 경우에만 끝 ;또는 참조 자체를 생략 할 수 있습니다 (적어도 HTML 4에서는). HTML 5에는 끝이 필요하다고 생각합니다 ;.

But the specification recommends to always use a reference like the character reference & or the entity reference & to avoid confusion:

Authors should use "&" (ASCII decimal 38) instead of "&" to avoid confusion with the beginning of a character reference (entity reference open delimiter). Authors should also use "&" in attribute values since character references are allowed within CDATA attribute values.

If the user passes it to you, or it will wind up in a URL, you need to escape it.

If it appears in static text on a page? All browsers will get this one right either way, you don't worry much about it, since it will work.

Yes, you should try to serve valid code if possible.

Most browsers will silently correct this error, but there is a problem with relying on the error handling in the browsers. There is no standard for how to handle incorrect code, so it's up to each browser vendor to try to figure out what to do with each error, and the results may vary.

Some examples where browsers are likely to react differently is if you put elements inside a table but outside the table cells, or if you nest links inside each other.

For your specific example it's not likely to cause any problems, but error correction in the browser might for example cause the browser to change from standards compliant mode into quirks mode, which could make your layout break down completely.

So, you should correct errors like this in the code, if not for anything else so to keep the error list in the validator short, so that you can spot more serious problems.

A couple of years ago, we got a report that one of our web apps wasn't displaying correctly in Firefox. It turned out that the page contained a tag that looked like

<div style="..." ... style="...">

When faced with a repeated style attribute, IE combines both of the styles, while Firefox only uses one of them, hence the different behavior. I changed the tag to

<div style="...; ..." ...>

and sure enough, it fixed the problem! The moral of the story is that browsers have more consistent handling of valid HTML than of invalid HTML. So, fix your damn markup already! (Or use HTML Tidy to fix it.)

I was checking why Image URL's need escaping, hence tried it in https://validator.w3.org. The explanation is pretty nice. It highlights that even URL's need to be escaped. [PS:I guess it will unescaped when its consumed since URL's need &. Can anyone clarify?]

<img alt="" src="foo?bar=qut&qux=fop" />

An entity reference was found in the document, but there is no reference by that name defined. Often this is caused by misspelling the reference name, unencoded ampersands, or by leaving off the trailing semicolon (;). The most common cause of this error is unencoded ampersands in URLs as described by the WDG in "Ampersands in URLs". Entity references start with an ampersand (&) and end with a semicolon (;). If you want to use a literal ampersand in your document you must encode it as "&" (even inside URLs!). Be careful to end entity references with a semicolon or your entity reference may get interpreted in connection with the following text. Also keep in mind that named entity references are case-sensitive; &Aelig; and æ are different characters. If this error appears in some markup generated by PHP's session handling code, this article has explanations and solutions to your problem.

if & is used in html then you should escape it

If & is used in javascript strings e.g. an alert('This & that'); or document.href you don't need to use it.

If you're using document.write then you should use it e.g. document.write(<p>this & that</p>)

It depends on the likelihood of a semicolon ending up near your &, causing it to display something quite different.

For example, when dealing with input from users (say, if you include the user-provided subject of a forum post in your title tags), you never know where they might be putting random semicolons, and it might randomly display strange entities. So always escape in that situation.

For your own static html, sure, you could skip it, but it's so trivial to include proper escaping, that there's no good reason to avoid it.

If you're really talking about the static text

<title>Foo & Bar</title>

stored in some file on the hard disk and served directly by a server, then yes: it probably doesn't need to be escaped.

However, since there is very little HTML content nowadays that's completely static, I'll add the following disclaimer that assumes that the HTML content is generated from some other source (database content, user input, web service call result, legacy API result, ...):

If you don't escape a simple &, then chances are you also don't escape a & or a   or <b> or <script src="http://attacker.com/evil.js"> or any other invalid text. That would mean that you are at best displaying your content wrongly and more likely are suspectible to XSS attacks.

In other words: when you're already checking and escaping the other more problematic cases, then there's almost no reason to leave the not-totally-broken-but-still-somewhat-fishy standalone-& unescaped.

not sure if this is useful to anyone... I was fighting this for a while... here is a glorious regex you can use to fix all your links, javascript, content. I had to deal with a ton of legacy content that nobody wanted to correct.

Add this to your Render override in your master page or control:

Please don't flame me for putting this in the wrong place:

// remove the & from href="blaw?a=b&b=c" and replace with &amp; 
//in urls - this corrects any unencoded & not just those in URL's
// this match will also ignore any matches it finds within <script> blocks AND
// it will also ignore the matches where the link includes a javascript command like
// <a href="javascript:alert{'& & &'}">blaw</a>
html = Regex.Replace(html, "&(?!(?<=(?<outerquote>[\"'])javascript:(?>(?!\\k<outerquote>|[>]).)*)\\k<outerquote>?)(?!(?:[a-zA-Z][a-zA-Z0-9]*|#\\d+);)(?!(?>(?:(?!<script|\\/script>).)*)\\/script>)", "&amp;", RegexOptions.Singleline | RegexOptions.IgnoreCase);

The link has a fairly good example of when and why you may need to escape & to &

https://jsfiddle.net/vh2h7usk/1/

흥미롭게도, 나는 내 대답에서 올바르게 표현하기 위해 캐릭터를 탈출해야했습니다. (응답 패널에서) 내장 코드 샘플 옵션 을 사용하려는 경우 입력 만하면 &됩니다. 그러나 수동으로 <code></code>요소를 사용하려면 올바르게 표현하기 위해 탈출해야합니다. :)

참고 URL : https://stackoverflow.com/questions/3493405/do-i-really-need-to-encode-as-amp

'IT story' 카테고리의 다른 글

X-Requested-With 헤더의 요점은 무엇입니까? (0)	2020.05.12
구문 강조를 즉시 수행 할 수있는 텍스트 영역? (0)	2020.05.12
세로 탭이란 무엇입니까? (0)	2020.05.12
페이지를 새로 고침하면 AngularJS HTML5 모드에서 잘못된 GET 요청이 발생합니다. (0)	2020.05.12
Java에서 2 개의 XML 문서를 비교하는 가장 좋은 방법 (0)	2020.05.12

현재글'&'를 '&'로 인코딩해야합니까?

hot-time

'&'를 '&'로 인코딩해야합니까?

'&'를 '&'로 인코딩해야합니까?

방법론 1

방법론 2

'IT story' 카테고리의 다른 글

'IT story'의 다른글

티스토리툴바

'&amp;'를 '&amp;'로 인코딩해야합니까?

'&'를 '&'로 인코딩해야합니까?

방법론 1

방법론 2

'IT story' 카테고리의 다른 글

'IT story'의 다른글

관련글

티스토리툴바

'&'를 '&'로 인코딩해야합니까?