IT story

문자열에서 모든 특수 문자를 제거합니다.

hot-time 2020. 5. 12. 08:04
반응형

문자열에서 모든 특수 문자를 제거합니다.


가능한 중복 :
정규식 살균 (PHP)

URL에 문제가 있습니다. 무엇이든 포함 할 수있는 제목을 변환하고 모든 특수 문자를 제거하여 문자와 숫자 만 있고 공백을 하이픈으로 바꾸고 싶습니다.

이것은 어떻게 이루어 집니까? 정규식 (정규식)이 사용되는 것에 대해 많이 들었습니다 ...


쉬워요:

function clean($string) {
   $string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.

   return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
}

용법:

echo clean('a|"bc!@£de^&$f g');

출력합니다 : abcdef-g

편집하다:

간단한 질문입니다. 여러 개의 하이픈이 서로 옆에 있지 않도록하려면 어떻게해야합니까? 그것들을 단지 1로 바 꾸었습니까?

function clean($string) {
   $string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
   $string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.

   return preg_replace('/-+/', '-', $string); // Replaces multiple hyphens with single one.
}

최신 정보

아래 솔루션은 "SEO friendlier"버전입니다.

function hyphenize($string) {
    $dict = array(
        "I'm"      => "I am",
        "thier"    => "their",
        // Add your own replacements here
    );
    return strtolower(
        preg_replace(
          array( '#[\\s-]+#', '#[^A-Za-z0-9\. -]+#' ),
          array( '-', '' ),
          // the full cleanString() can be downloaded from http://www.unexpectedit.com/php/php-clean-string-of-utf8-chars-convert-to-similar-ascii-char
          cleanString(
              str_replace( // preg_replace can be used to support more complicated replacements
                  array_keys($dict),
                  array_values($dict),
                  urldecode($string)
              )
          )
        )
    );
}

function cleanString($text) {
    $utf8 = array(
        '/[áàâãªä]/u'   =>   'a',
        '/[ÁÀÂÃÄ]/u'    =>   'A',
        '/[ÍÌÎÏ]/u'     =>   'I',
        '/[íìîï]/u'     =>   'i',
        '/[éèêë]/u'     =>   'e',
        '/[ÉÈÊË]/u'     =>   'E',
        '/[óòôõºö]/u'   =>   'o',
        '/[ÓÒÔÕÖ]/u'    =>   'O',
        '/[úùûü]/u'     =>   'u',
        '/[ÚÙÛÜ]/u'     =>   'U',
        '/ç/'           =>   'c',
        '/Ç/'           =>   'C',
        '/ñ/'           =>   'n',
        '/Ñ/'           =>   'N',
        '/–/'           =>   '-', // UTF-8 hyphen to "normal" hyphen
        '/[’‘‹›‚]/u'    =>   ' ', // Literally a single quote
        '/[“”«»„]/u'    =>   ' ', // Double quote
        '/ /'           =>   ' ', // nonbreaking space (equiv. to 0x160)
    );
    return preg_replace(array_keys($utf8), array_values($utf8), $text);
}

The rationale for the above functions (which I find way inefficient - the one below is better) is that a service that shall not be named apparently ran spelling checks and keyword recognition on the URLs.

After losing a long time on a customer's paranoias, I found out they were not imagining things after all -- their SEO experts [I am definitely not one] reported that, say, converting "Viaggi Economy Perù" to viaggi-economy-peru "behaved better" than viaggi-economy-per (the previous "cleaning" removed UTF8 characters; Bogotà became bogot, Medellìn became medelln and so on).

There were also some common misspellings that seemed to influence the results, and the only explanation that made sense to me is that our URL were being unpacked, the words singled out, and used to drive God knows what ranking algorithms. And those algorithms apparently had been fed with UTF8-cleaned strings, so that "Perù" became "Peru" instead of "Per". "Per" did not match and sort of took it in the neck.

In order to both keep UTF8 characters and replace some misspellings, the faster function below became the more accurate (?) function above. $dict needs to be hand tailored, of course.

Previous answer

A simple approach:

// Remove all characters except A-Z, a-z, 0-9, dots, hyphens and spaces
// Note that the hyphen must go last not to be confused with a range (A-Z)
// and the dot, being special, is escaped with \
$str = preg_replace('/[^A-Za-z0-9\. -]/', '', $str);

// Replace sequences of spaces with hyphen
$str = preg_replace('/  */', '-', $str);

// The above means "a space, followed by a space repeated zero or more times"
// (should be equivalent to / +/)

// You may also want to try this alternative:
$str = preg_replace('/\\s+/', '-', $str);

// where \s+ means "zero or more whitespaces" (a space is not necessarily the
// same as a whitespace) just to be sure and include everything

Note that you might have to first urldecode() the URL, since %20 and + both are actually spaces - I mean, if you have "Never%20gonna%20give%20you%20up" you want it to become Never-gonna-give-you-up, not Never20gonna20give20you20up . You might not need it, but I thought I'd mention the possibility.

So the finished function along with test cases:

function hyphenize($string) {
    return 
    ## strtolower(
          preg_replace(
            array('#[\\s-]+#', '#[^A-Za-z0-9\. -]+#'),
            array('-', ''),
        ##     cleanString(
              urldecode($string)
        ##     )
        )
    ## )
    ;
}

print implode("\n", array_map(
    function($s) {
            return $s . ' becomes ' . hyphenize($s);
    },
    array(
    'Never%20gonna%20give%20you%20up',
    "I'm not the man I was",
    "'Légeresse', dit sa majesté",
    )));


Never%20gonna%20give%20you%20up    becomes  never-gonna-give-you-up
I'm not the man I was              becomes  im-not-the-man-I-was
'Légeresse', dit sa majesté        becomes  legeresse-dit-sa-majeste

To handle UTF-8 I used a cleanString implementation found online (link broken since, but a stripped down copy with all the not-too-esoteric UTF8 characters is at the beginning of the answer; it's also easy to add more characters to it if you need) that converts UTF8 characters to normal characters, thus preserving the word "look" as much as possible. It could be simplified and wrapped inside the function here for performance.

The function above also implements converting to lowercase - but that's a taste. The code to do so has been commented out.


Here, check out this function:

function seo_friendly_url($string){
    $string = str_replace(array('[\', \']'), '', $string);
    $string = preg_replace('/\[.*\]/U', '', $string);
    $string = preg_replace('/&(amp;)?#?[a-z0-9]+;/i', '-', $string);
    $string = htmlentities($string, ENT_COMPAT, 'utf-8');
    $string = preg_replace('/&([a-z])(acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig|quot|rsquo);/i', '\\1', $string );
    $string = preg_replace(array('/[^a-z0-9]/i', '/[-]+/') , '-', $string);
    return strtolower(trim($string, '-'));
}

참고URL : https://stackoverflow.com/questions/14114411/remove-all-special-characters-from-a-string

반응형