RegEx : 따옴표 사이의 값 잡기

IT story

RegEx : 따옴표 사이의 값 잡기

hot-time 2020. 5. 10. 10:25

RegEx : 따옴표 사이의 값 잡기

나는 이와 같은 가치가있다 :

"Foo Bar" "Another Value" something else

어떤 정규 표현식이 따옴표로 묶인 값을 반환 합니까 (예 : Foo Bar및 Another Value)?

나는 다음과 같이 큰 성공을 거두었습니다.

(["'])(?:(?=(\\?))\2.)*?\1

중첩 따옴표도 지원합니다.

이것이 어떻게 작동하는지에 대한 더 깊은 설명을 원하는 사람들을 위해 다음은 사용자 ephemient 의 설명입니다 .

([""'])따옴표와 일치; ((?=(\\?))\2.)백 슬래시가 존재한다면, 그것을 뒤섞 고, 그것이 발생하는지의 여부는 문자와 일치합니다. *?여러 번 일치시킵니다 (마지막 따옴표를 먹지 않기 위해 탐욕스럽지 않습니다). \1여는 데 사용 된 것과 동일한 견적을 찾습니다.

일반적으로 다음 정규식 조각은 찾고 있습니다.

"(.*?)"

이것은 욕심없는 *를 사용합니까? 연산자는 다음 큰 따옴표를 포함하여 모든 것을 캡처합니다. 그런 다음 언어 별 메커니즘을 사용하여 일치하는 텍스트를 추출합니다.

파이썬에서는 다음을 수행 할 수 있습니다.

>>> import re
>>> string = '"Foo Bar" "Another Value"'
>>> print re.findall(r'"(.*?)"', string)
['Foo Bar', 'Another Value']

나는 갈 것이다 :

"([^"]*)"

는 [^ "] 를 제외한 모든 문자에 대한 정규식입니다 ' " '
나는 비 욕심 많은 조작을 통해이를 사용하는 이유는 그냥 확실히 나는 그것이 해결받을 수 있도록 그를 계속 찾고해야한다는 것입니다.

이스케이프 된 따옴표를 처리하는 두 가지 효율적인 방법을 살펴 보겠습니다. 이러한 패턴은 간결하거나 미학적으로 설계된 것이 아니라 효율적으로 설계되었습니다.

이러한 방법은 첫 번째 문자 구분을 사용하여 대체 비용없이 문자열에서 따옴표를 빠르게 찾습니다. (이 아이디어는 대체의 두 가지를 테스트하지 않고 따옴표가 아닌 문자를 빨리 버리는 것입니다.)

따옴표 사이의 내용은 반복되는 교체 대신 롤링되지 않은 루프로 설명되어 더 효율적입니다. [^"\\]*(?:\\.[^"\\]*)*

따옴표가 균형이 맞지 않는 문자열을 처리하려면 분명히 [^"\\]*+(?:\\.[^"\\]*)*+역 추적을 피하기 위해 소유 수량 자를 사용하거나이를 에뮬레이트하는 해결 방법을 사용할 수 있습니다 . 이스케이프 처리되지 않은 다음 인용 또는 문자열 끝까지 인용 된 부분이 시작 인용이 될 수 있도록 선택할 수도 있습니다. 이 경우 소유 수량자를 사용할 필요가 없으며 마지막 따옴표 만 선택하면됩니다.

주의 : 때때로 따옴표는 백 슬래시로 이스케이프되지 않고 따옴표를 반복하여 이스케이프됩니다. 이 경우 컨텐츠 서브 패턴은 다음과 같습니다.[^"]*(?:""[^"]*)*

패턴은 캡처 그룹과 역 참조 ( (["']).....\1) 와 같은 것을 피하고 간단한 교대를 사용하지만 ["']시작 부분 과 함께 사용합니다 .

펄 같은 :

["'](?:(?<=")[^"\\]*(?s:\\.[^"\\]*)*"|(?<=')[^'\\]*(?s:\\.[^'\\]*)*')

( (?s:...)비 캡처 그룹 내에서 dotall / singleline 모드를 켜는 구문 설탕입니다.이 구문이 지원되지 않는 경우 모든 패턴에 대해이 모드를 쉽게 켜거나 점을으로 바꿀 수 있습니다 [\s\S])

(이 패턴이 작성되는 방식은 완전히 "수동식"이며 최종 엔진 내부 최적화를 고려하지 않습니다)

ECMA 스크립트 :

(?=["'])(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]*(?:\\[\s\S][^'\\]*)*')

POSIX 확장 :

"[^"\\]*(\\(.|\n)[^"\\]*)*"|'[^'\\]*(\\(.|\n)[^'\\]*)*'

또는 간단히 :

"([^"\\]|\\.|\\\n)*"|'([^'\\]|\\.|\\\n)*'

Peculiarly, none of these answers produce a regex where the returned match is the text inside the quotes, which is what is asked for. MA-Madden tries but only gets the inside match as a captured group rather than the whole match. One way to actually do it would be :

(?<=(["']\b))(?:(?=(\\?))\2.)*?(?=\1)

Examples for this can be seen in this demo https://regex101.com/r/Hbj8aP/1

The key here is the the positive lookbehind at the start (the ?<= ) and the positive lookahead at the end (the ?=). The lookbehind is looking behind the current character to check for a quote, if found then start from there and then the lookahead is checking the character ahead for a quote and if found stop on that character. The lookbehind group (the ["']) is wrapped in brackets to create a group for whichever quote was found at the start, this is then used at the end lookahead (?=\1) to make sure it only stops when it finds the corresponding quote.

The only other complication is that because the lookahead doesn't actually consume the end quote, it will be found again by the starting lookbehind which causes text between ending and starting quotes on the same line to be matched. Putting a word boundary on the opening quote (["']\b) helps with this, though ideally I'd like to move past the lookahead but I don't think that is possible. The bit allowing escaped characters in the middle I've taken directly from Adam's answer.

The RegEx of accepted answer returns the values including their sourrounding quotation marks: "Foo Bar" and "Another Value" as matches.

Here are RegEx which return only the values between quotation marks (as the questioner was asking for):

Double quotes only (use value of capture group #1):

"(.*?[^\\])"

Single quotes only (use value of capture group #1):

'(.*?[^\\])'

Both (use value of capture group #2):

(["'])(.*?[^\\])\1

All support escaped and nested quotes.

A very late answer, but like to answer

(\"[\w\s]+\")

http://regex101.com/r/cB0kB8/1

This version

accounts for escaped quotes

controls backtracking

/(["'])((?:(?!\1)[^\\]|(?:\\\\)*\\[^\\])*)\1/

The pattern (["'])(?:(?=(\\?))\2.)*?\1 above does the job but I am concerned of its performances (it's not bad but could be better). Mine below it's ~20% faster.

The pattern "(.*?)" is just incomplete. My advice for everyone reading this is just DON'T USE IT!!!

For instance it cannot capture many strings (if needed I can provide an exhaustive test-case) like the one below:

$string = 'How are you? I\'m fine, thank you';

The rest of them are just as "good" as the one above.

If you really care both about performance and precision then start with the one below:

/(['"])((\\\1|.)*?)\1/gm

In my tests it covered every string I met but if you find something that doesn't work I would gladly update it for you.

Check my pattern in an online regex tester.

MORE ANSWERS! Here is the solution i used

\"([^\"]*?icon[^\"]*?)\"

TLDR;
replace the word icon with what your looking for in said quotes and voila!

The way this works is it looks for the keyword and doesn't care what else in between the quotes. EG:
id="fb-icon"
id="icon-close"
id="large-icon-close"
the regex looks for a quote mark "
then it looks for any possible group of letters thats not "
until it finds icon
and any possible group of letters that is not "
it then looks for a closing "

I liked Eugen Mihailescu's solution to match the content between quotes whilst allowing to escape quotes. However, I discovered some problems with escaping and came up with the following regex to fix them:

(['"])(?:(?!\1|\\).|\\.)*\1

It does the trick and is still pretty simple and easy to maintain.

Demo (with some more test-cases; feel free to use it and expand on it).

PS: If you just want the content between the quotes in the full match ($0), and are not afraid of the performance penalty, use:

(?<=(['"])\b)(?:(?!\1|\\).|\\.)*(?=\1)

PPS: If your focus is solely on efficiency, go with Casimir et Hippolyte's solution; it's a good one.

I liked Axeman's more expansive version, but had some trouble with it (it didn't match for example

foo "string \\ string" bar

foo "string1"   bar   "string2"

correctly, so I tried to fix it:

# opening quote
(["'])
   (
     # repeat (non-greedy, so we don't span multiple strings)
     (?:
       # anything, except not the opening quote, and not 
       # a backslash, which are handled separately.
       (?!\1)[^\\]
       |
       # consume any double backslash (unnecessary?)
       (?:\\\\)*       
       |
       # Allow backslash to escape characters
       \\.
     )*?
   )
# same character as opening quote
\1

string = "\" foo bar\" \"loloo\""
print re.findall(r'"(.*?)"',string)

just try this out , works like a charm !!!

\ indicates skip character

echo 'junk "Foo Bar" not empty one "" this "but this" and this neither' | sed 's/[^\"]*\"\([^\"]*\)\"[^\"]*/>\1</g'

This will result in: >Foo Bar<><>but this<

Here I showed the result string between ><'s for clarity, also using the non-greedy version with this sed command we first throw out the junk before and after that ""'s and then replace this with the part between the ""'s and surround this by ><'s.

From Greg H. I was able to create this regex to suit my needs.

I needed to match a specific value that was qualified by being inside quotes. It must be a full match, no partial matching could should trigger a hit

e.g. "test" could not match for "test2".

reg = r"""(['"])(%s)\1"""
if re.search(reg%(needle), haystack, re.IGNORECASE):
    print "winning..."

Hunter

A supplementary answer for the subset of Microsoft VBA coders only one uses the library Microsoft VBScript Regular Expressions 5.5 and this gives the following code

Sub TestRegularExpression()

    Dim oRE As VBScript_RegExp_55.RegExp    '* Tools->References: Microsoft VBScript Regular Expressions 5.5
    Set oRE = New VBScript_RegExp_55.RegExp

    oRE.Pattern = """([^""]*)"""


    oRE.Global = True

    Dim sTest As String
    sTest = """Foo Bar"" ""Another Value"" something else"

    Debug.Assert oRE.test(sTest)

    Dim oMatchCol As VBScript_RegExp_55.MatchCollection
    Set oMatchCol = oRE.Execute(sTest)
    Debug.Assert oMatchCol.Count = 2

    Dim oMatch As Match
    For Each oMatch In oMatchCol
        Debug.Print oMatch.SubMatches(0)

    Next oMatch

End Sub

Unlike Adam's answer, I have a simple but worked one:

(["'])(?:\\\1|.)*?\1

And just add parenthesis if you want to get content in quotes like this:

(["'])((?:\\\1|.)*?)\1

Then $1 matches quote char and $2 matches content string.

For me worked this one:

|([\'"])(.*?)\1|i

I've used in a sentence like this one:

preg_match_all('|([\'"])(.*?)\1|i', $cont, $matches);

and it worked great.

If you're trying to find strings that only have a certain suffix, such as dot syntax, you can try this:

\"([^\"]*?[^\"]*?)\".localized

Where .localized is the suffix.

Example:

print("this is something I need to return".localized + "so is this".localized + "but this is not")

It will capture "this is something I need to return".localized and "so is this".localized but not "but this is not".

참고URL : https://stackoverflow.com/questions/171480/regex-grabbing-values-between-quotation-marks

'IT story' 카테고리의 다른 글

TSQL을 사용하여 데이터베이스의 모든 테이블을 어떻게 자르나요? (0)	2020.05.10
URI, Android KitKat 새로운 스토리지 액세스 프레임 워크에서 실제 경로 확보 (0)	2020.05.10
VIM에서 커서 뒤 또는 주변의 단어 삭제 (0)	2020.05.10
Xcode에서 x86_64 아키텍처에 대한 중복 기호 (0)	2020.05.10
TestFlight는 어떻게합니까? (0)	2020.05.10

현재글RegEx : 따옴표 사이의 값 잡기

hot-time

RegEx : 따옴표 사이의 값 잡기

RegEx : 따옴표 사이의 값 잡기

'IT story' 카테고리의 다른 글

'IT story'의 다른글

티스토리툴바

RegEx : 따옴표 사이의 값 잡기

RegEx : 따옴표 사이의 값 잡기

'IT story' 카테고리의 다른 글

'IT story'의 다른글

관련글

티스토리툴바