IT story

파이썬에서 파일의 MIME 유형을 찾는 방법은 무엇입니까?

hot-time 2020. 5. 25. 08:08

파이썬에서 파일의 MIME 유형을 찾는 방법은 무엇입니까?

예를 들어 BLOB와 같은 곳에 여러 파일을 저장하려고한다고 가정 해 봅시다. 웹 페이지를 통해 이러한 파일을 정리하고 클라이언트가 올바른 응용 프로그램 / 뷰어를 자동으로 열도록한다고 가정하겠습니다.

가정 : 브라우저는 HTTP 응답에서 mime-type (content-type?) 헤더가 사용할 애플리케이션 / 뷰어를 파악합니다.

해당 가정을 기반으로 파일의 바이트 외에도 MIME 유형을 저장하려고합니다.

파일의 MIME 유형을 어떻게 찾을 수 있습니까? 저는 현재 Mac에 있지만 Windows에서도 작동합니다.

파일을 웹 페이지에 게시 할 때 브라우저가이 정보를 추가합니까?

이 정보를 찾기위한 깔끔한 파이썬 라이브러리가 있습니까? WebService 또는 다운로드 가능한 데이터베이스입니까?

toivotuo가 제안한 python-magic 방법은 구식입니다. Python-magic의 현재 트렁크는 Github에 있으며 MIME 유형을 찾는 readme를 기반으로 다음과 같이 수행됩니다.

# For MIME types
>>> import magic
>>> mime = magic.Magic(mime=True)
>>> mime.from_file("testdata/test.pdf")
'application/pdf'
>>>

표준 라이브러리 의 mimetypes 모듈 은 파일 확장자에서 MIME 유형을 결정 / 추측합니다.

사용자가 파일을 업로드하는 경우 HTTP 게시물에는 데이터와 함께 파일의 MIME 유형이 포함됩니다. 예를 들어 Django는이 데이터를 UploadedFile 객체 의 속성으로 사용할 수있게 합니다.

mimetypes 라이브러리를 사용하는 것보다 더 안정적인 방법은 python-magic 패키지를 사용하는 것입니다.

import magic
m = magic.open(magic.MAGIC_MIME)
m.load()
m.file("/tmp/document.pdf")

이것은 file (1)을 사용하는 것과 같습니다.

장고에서는 MIME 형식이 UploadedFile.content_type과 일치하는지 확인할 수 있습니다.

이것은 매우 쉬운 것 같습니다

>>> from mimetypes import MimeTypes
>>> import urllib 
>>> mime = MimeTypes()
>>> url = urllib.pathname2url('Upload.xml')
>>> mime_type = mime.guess_type(url)
>>> print mime_type
('application/xml', None)

Old Post를 참조하십시오

libmagic을 감싸는 3 개의 서로 다른 라이브러리가 있습니다.

그중 2 개는 pypi에서 사용할 수 있습니다 (pip 설치가 작동합니다).

filemagic
파이썬 매직

그리고 python-magic과 유사한 또 다른 파일은 최신 libmagic 소스에서 직접 사용할 수 있으며 Linux 배포판에있을 수 있습니다.

데비안에서 python-magic 패키지는 이것에 관한 것이며 toivotuo가 사용하고 Simon Zimmermann이 말한 것처럼 사용되지 않습니다 (IMHO).

libmagic의 원래 작성자가 다른 테이크를 취하는 것 같습니다.

너무 나쁜 것은 pypi에서 직접 사용할 수 없습니다.

파이썬 2.6에서 :

mime = subprocess.Popen("/usr/bin/file --mime PATH", shell=True, \
    stdout=subprocess.PIPE).communicate()[0]

어떤 웹 서버를 사용하고 있는지 말하지 않았지만 Apache에는 Mime Magic 이라는 멋진 작은 모듈이 있습니다.이 모듈 에는 파일 유형을 결정하는 데 사용됩니다. 파일의 일부 내용을 읽고 찾은 문자를 기반으로 파일 형식을 파악하려고 시도합니다. 로 그리고 데이브 웹이 언급 MIME 형식 모듈 작동 파이썬 아래를, 확장이 편리 제공.

또는 UNIX 상자에 앉아있는 sys.popen('file -i ' + fileName, mode='r')경우 MIME 유형을 가져 오는 데 사용할 수 있습니다 . Windows에는 동등한 명령이 있어야하지만 그것이 무엇인지 확실하지 않습니다.

@toivotuo의 방법은 python3에서 가장 훌륭하고 안정적으로 작동했습니다. 내 목표는 신뢰할 수있는 .gz 확장자가없는 gzip 파일을 식별하는 것이 었습니다. python3-magic을 설치했습니다.

import magic

filename = "./datasets/test"

def file_mime_type(filename):
    m = magic.open(magic.MAGIC_MIME)
    m.load()
    return(m.file(filename))

print(file_mime_type(filename))

for a gzipped file it returns: application/gzip; charset=binary

for an unzipped txt file (iostat data): text/plain; charset=us-ascii

for a tar file: application/x-tar; charset=binary

for a bz2 file: application/x-bzip2; charset=binary

and last but not least for me a .zip file: application/zip; charset=binary

Python bindings to libmagic

All the different answers on this topic are very confusing, so I’m hoping to give a bit more clarity with this overview of the different bindings of libmagic. Previously mammadori gave a short answer listing the available option.

libmagic

module name: magic
pypi: file-magic
source: https://github.com/file/file/tree/master/python

When determining a files mime-type, the tool of choice is simply called file and its back-end is called libmagic. (See the Project home page.) The project is developed in a private cvs-repository, but there is a read-only git mirror on github.

Now this tool, which you will need if you want to use any of the libmagic bindings with python, already comes with its own python bindings called file-magic. There is not much dedicated documentation for them, but you can always have a look at the man page of the c-library: man libmagic. The basic usage is described in the readme file:

import magic

detected = magic.detect_from_filename('magic.py')
print 'Detected MIME type: {}'.format(detected.mime_type)
print 'Detected encoding: {}'.format(detected.encoding)
print 'Detected file type name: {}'.format(detected.name)

Apart from this, you can also use the library by creating a Magic object using magic.open(flags) as shown in the example file.

Both toivotuo and ewr2san use these file-magic bindings included in the file tool. They mistakenly assume, they are using the python-magic package. This seems to indicate, that if both file and python-magic are installed, the python module magic refers to the former one.

python-magic

module name: magic
pypi: python-magic
source: https://github.com/ahupp/python-magic

This is the library that Simon Zimmermann talks about in his answer and which is also employed by Claude COULOMBE as well as Gringo Suave.

filemagic

module name: magic
pypi: filemagic
source: https://github.com/aliles/filemagic

Note: This project was last updated in 2013!

Due to being based on the same c-api, this library has some similarity with file-magic included in libmagic. It is only mentioned by mammadori and no other answer employs it.

2017 Update

No need to go to github, it is on PyPi under a different name:

pip3 install --user python-magic
# or:
sudo apt install python3-magic  # Ubuntu distro package

The code can be simplified as well:

>>> import magic

>>> magic.from_file('/tmp/img_3304.jpg', mime=True)
'image/jpeg'

In Python 3.x and webapp with url to the file which couldn't have an extension or a fake extension. You should install python-magic, using

pip3 install python-magic

For Mac OS X, you should also install libmagic using

brew install libmagic

Code snippet

import urllib
import magic
from urllib.request import urlopen

url = "http://...url to the file ..."
request = urllib.request.Request(url)
response = urlopen(request)
mime_type = magic.from_buffer(response.readline())
print(mime_type)

alternatively you could put a size into the read

import urllib
import magic
from urllib.request import urlopen

url = "http://...url to the file ..."
request = urllib.request.Request(url)
response = urlopen(request)
mime_type = magic.from_buffer(response.read(128))
print(mime_type)

The mimetypes module just recognise an file type based on file extension. If you will try to recover a file type of a file without extension, the mimetypes will not works.

python 3 ref: https://docs.python.org/3.2/library/mimetypes.html

mimetypes.guess_type(url, strict=True) Guess the type of a file based on its filename or URL, given by url. The return value is a tuple (type, encoding) where type is None if the type can’t be guessed (missing or unknown suffix) or a string of the form 'type/subtype', usable for a MIME content-type header.

encoding is None for no encoding or the name of the program used to encode (e.g. compress or gzip). The encoding is suitable for use as a Content-Encoding header, not as a Content-Transfer-Encoding header. The mappings are table driven. Encoding suffixes are case sensitive; type suffixes are first tried case sensitively, then case insensitively.

The optional strict argument is a flag specifying whether the list of known MIME types is limited to only the official types registered with IANA. When strict is True (the default), only the IANA types are supported; when strict is False, some additional non-standard but commonly used MIME types are also recognized.

import mimetypes
print(mimetypes.guess_type("sample.html"))

I 've tried a lot of examples but with Django mutagen plays nicely.

Example checking if files is mp3

from mutagen.mp3 import MP3, HeaderNotFoundError  

try:
    audio = MP3(file)
except HeaderNotFoundError:
    raise ValidationError('This file should be mp3')

The downside is that your ability to check file types is limited, but it's a great way if you want not only check for file type but also to access additional information.

This may be old already, but why not use UploadedFile.content_type directly from Django? Is not the same?(https://docs.djangoproject.com/en/1.11/ref/files/uploads/#django.core.files.uploadedfile.UploadedFile.content_type)

For byte Array type data you can use magic.from_buffer(_byte_array,mime=True)

I try mimetypes library first. If it's not working, I use python-magic libary instead.

import mimetypes
def guess_type(filename, buffer=None):
mimetype, encoding = mimetypes.guess_type(filename)
if mimetype is None:
    try:
        import magic
        if buffer:
            mimetype = magic.from_buffer(buffer, mime=True)
        else:
            mimetype = magic.from_file(filename, mime=True)
    except ImportError:
        pass
return mimetype

you can use imghdr Python module.

참고URL : https://stackoverflow.com/questions/43580/how-to-find-the-mime-type-of-a-file-in-python

'IT story' 카테고리의 다른 글

키 값 쌍을 가진 array_push () (0)	2020.05.25
각도 : 약속,지도, 집합 및 반복자를 찾을 수 없습니다 (0)	2020.05.25
동기 연결에 HttpClient를 사용하는 이유 (0)	2020.05.25
Eclipse Java 디버깅 : 소스를 찾을 수 없음 (0)	2020.05.25
AngularJS에서 양식이 유효하지 않은 경우 제출 버튼 비활성화 (0)	2020.05.25

현재글파이썬에서 파일의 MIME 유형을 찾는 방법은 무엇입니까?

hot-time

파이썬에서 파일의 MIME 유형을 찾는 방법은 무엇입니까?

파이썬에서 파일의 MIME 유형을 찾는 방법은 무엇입니까?

Python bindings to libmagic

libmagic

python-magic

filemagic

'IT story' 카테고리의 다른 글

'IT story'의 다른글

티스토리툴바

파이썬에서 파일의 MIME 유형을 찾는 방법은 무엇입니까?

파이썬에서 파일의 MIME 유형을 찾는 방법은 무엇입니까?

Python bindings to libmagic

libmagic

python-magic

filemagic

'IT story' 카테고리의 다른 글

'IT story'의 다른글

관련글

티스토리툴바