모든 개발자는 데이터베이스에 대해 무엇을 알아야합니까? [닫은]
우리가 좋아하든 그렇지 않든, 대부분의 개발자는 아니지만 정기적으로 데이터베이스를 사용하거나 언젠가는 작업해야 할 수도 있습니다. 야생에서 오용 및 남용의 양과 매일 발생하는 데이터베이스 관련 질문의 양을 고려할 때 개발자가 설계하거나 작업하지 않더라도 개발자가 알아야 할 특정 개념이 있다고 말하는 것이 공정합니다. 오늘 데이터베이스. 그래서:
개발자와 다른 소프트웨어 전문가가 데이터베이스에 대해 알아야 할 중요한 개념은 무엇입니까?
응답 지침 :
목록을 짧게 유지하십시오.
답변 당 하나의 개념이 가장 좋습니다.
구체적으로 작성하십시오 .
"데이터 모델링"은 중요한 기술 일 수 있지만 정확히 무엇을 의미합니까?
당신의 근거를 설명하십시오.
개념이 왜 중요한가요? "인덱스 사용"이라고 말하지 마십시오. "모범 사례"에 빠지지 마십시오. 청중이 더 많은 것을 배우도록 설득하십시오.
귀하의 의견에 동의하십시오.
다른 사람의 답변을 먼저 읽으십시오. 하나의 높은 순위 답변은 두 개의 낮은 순위 답변보다 더 효과적인 진술입니다. 더 추가해야 할 경우 의견을 추가하거나 원본을 참조하십시오.
개인적으로 적용되지 않기 때문에 무언가를 공감하지 마십시오.
우리는 모두 다른 영역에서 일합니다. 여기서의 목표는 데이터베이스 초보자들에게 데이터베이스 디자인 및 데이터베이스 중심 개발에 대한 잘 정립되고 균형 잡힌 이해를 얻을 수있는 방향을 제공하는 것입니다.
개발자가 데이터베이스에 대해 알아야 할 첫 번째 사항은 다음과 같습니다. 데이터베이스 란 무엇 입니까? 작동 방식, 빌드 방식 및 데이터베이스의 데이터를 검색하거나 업데이트하는 코드를 작성하는 방법이 아닙니다. 그러나 그들은 무엇입니까?
불행히도, 이것에 대한 대답은 움직이는 목표입니다. 1970 년대부터 1990 년대 초 데이터베이스의 데이터베이스에서 데이터 공유를위한 데이터베이스가 사용되었습니다. 데이터베이스를 사용 중이고 데이터를 공유하지 않은 경우 학술 프로젝트에 참여했거나 본인을 포함하여 리소스를 낭비한 경우 데이터베이스를 설정하고 DBMS를 길들이는 것은 여러 번 악용 된 데이터의 관점에서 투자 회수에 막대한 투자가 필요한 엄청난 작업이었습니다.
지난 15 년 동안 데이터베이스는 단 하나의 응용 프로그램과 관련된 영구 데이터를 저장하는 데 사용되었습니다. MySQL 또는 Access 또는 SQL Server 용 데이터베이스를 구축하는 것은 일상적인 것이되어 데이터베이스는 거의 일반적인 응용 프로그램의 일부가되었습니다. 때로는 데이터의 실제 가치가 명백 해짐에 따라 초기 제한 미션이 미션 크리프에 의해 위로 밀려납니다. 불행하게도, 단일 목적을 염두에두고 설계된 데이터베이스는 전사적이며 업무상 중요한 역할을 수행하기 시작할 때 종종 실패합니다.
개발자가 데이터베이스에 대해 알아야 할 두 번째 사항 은 전 세계의 데이터 중심 관점 입니다. 데이터 중심 세계관은 대부분의 개발자가 배운 것보다 프로세스 중심 세계관과 다릅니다. 이 간격과 비교하여 구조적 프로그래밍과 객체 지향 프로그래밍의 간격은 비교적 작습니다.
개발자가 적어도 개요에서 배워야 할 세 번째 사항은 개념적 데이터 모델링, 논리적 데이터 모델링 및 물리적 데이터 모델링을 포함한 데이터 모델링입니다.
개념적 데이터 모델링 은 실제로 데이터 중심 관점에서 요구 사항을 분석하는 것입니다.
논리 데이터 모델링 은 일반적으로 개념적 데이터 모델링에서 발견 된 요구 사항에 특정 데이터 모델을 적용하는 것입니다. 관계형 모델은 다른 특정 모델보다 훨씬 많이 사용되므로 개발자는 관계형 모델을 확실히 배워야합니다. 사소한 요구 사항에 대해 강력하고 관련성있는 관계형 모델을 설계하는 것은 쉬운 일이 아닙니다. 관계형 모델을 잘못 이해하면 좋은 SQL 테이블을 작성할 수 없습니다.
실제 데이터 모델링 은 일반적으로 DBMS에 따라 다르며 개발자가 데이터베이스 빌더 또는 DBA가 아니라면 자세히 배울 필요는 없습니다. 개발자가 이해해야하는 것은 실제 데이터베이스 설계를 논리적 데이터베이스 설계와 분리 할 수있는 정도와 실제 설계를 조정하여 고속 데이터베이스를 생성 할 수있는 정도입니다.
다음으로 개발자가 배워야 할 것은 속도 (성능)는 중요하지만, 설계 범위를 측정하고 확장하는 능력, 프로그래밍의 단순성 등 다른 디자인 우수성 척도는 더욱 중요하다는 것 입니다.
마지막으로, 데이터베이스를 망친 사람 은 데이터의 가치가 데이터를 캡처 한 시스템보다 오래 지속 된다는 것을 이해해야 합니다 .
아휴!
좋은 질문. 다음은 특별한 순서가 아닌 몇 가지 생각입니다.
최소한 두 번째 정규 형식으로 정규화가 필수적입니다.
적절한 계단식 삭제 및 업데이트 고려 사항과 함께 참조 무결성도 필수적입니다.
점검 제한 조건의 적절하고 올바른 사용. 데이터베이스가 가능한 한 많은 작업을 수행하도록하십시오.
데이터베이스와 미들 티어 코드 모두에 비즈니스 로직을 분산시키지 마십시오. 가급적 중간 계층 코드에서 하나를 선택하십시오.
기본 키 및 클러스터 키에 대한 일관된 접근 방식을 결정하십시오.
인덱스를 과도하게 사용하지 마십시오. 현명하게 색인을 선택하십시오.
일관된 테이블 및 열 이름. 표준을 선택하고 준수하십시오.
데이터베이스에서 널값을 승인 할 열 수를 제한하십시오.
방아쇠를 가지고 다니지 마십시오. 그들은 사용하지만 서둘러 복잡한 일을 할 수 있습니다.
UDF에주의하십시오. 그것들은 훌륭하지만 쿼리에서 얼마나 자주 호출되는지 알지 못할 때 성능 문제를 일으킬 수 있습니다.
데이터베이스 디자인에 관한 Celko의 책을 얻으십시오. 남자는 거만하지만 그의 것을 알고 있습니다.
먼저 개발자는 데이터베이스에 대해 알아야 할 것이 있다는 것을 이해해야합니다. 그것들은 SQL에 넣고 결과 세트를 얻는 마술 장치 일뿐 만 아니라 자체 논리와 단점이있는 매우 복잡한 소프트웨어입니다.
둘째, 다른 목적으로 다른 데이터베이스 설정이 있습니다. 사용 가능한 데이터웨어 하우스가있는 경우 개발자가 온라인 트랜잭션 데이터베이스에서 히스토리 보고서를 작성하는 것을 원하지 않습니다.
셋째, 개발자는 조인을 포함한 기본 SQL을 이해해야합니다.
과거에는 개발자가 얼마나 밀접하게 참여하고 있는지에 달려 있습니다. 나는 개발자 였고 사실상 DBA 였고, DBA가 복도 아래로 내려 갔고, DBA가 자신의 영역에서 벗어난 곳에서 일했습니다. 개발자가 데이터베이스 디자인에 관여한다고 가정합니다.
그들은 최소한 처음 세 가지 정규 형식 인 기본 정규화를 이해해야합니다. 그 이상으로 DBA를 얻으십시오. 미국 법정에 대한 경험이있는 사람들 (그리고 여기에 임의의 TV 쇼가 포함됨)에는 "키, 전체 키 및 키에만 의존하는 것이기 때문에 Codd를 도와주십시오"라는 니모닉이 있습니다.
인덱스에 대한 단서가 필요합니다. 즉, 필요한 인덱스와 성능에 어떤 영향을 줄지에 대한 아이디어가 있어야합니다. 이것은 쓸모없는 인덱스가 없지만 쿼리를 돕기 위해 인덱스를 추가하는 것을 두려워하지 않음을 의미합니다. DBA에는 잔액과 같은 추가 정보를 남겨 두어야합니다.
데이터 무결성의 필요성을 이해하고 데이터를 확인하는 위치와 문제를 발견 한 경우 수행중인 작업을 가리킬 수 있어야합니다. 데이터베이스에있을 필요는 없지만 (사용자에게 의미있는 오류 메시지를 발행하기 어려울 수 있음) 어딘가에 있어야합니다.
계획을 얻는 방법과 일반적인 계획을 읽는 방법에 대한 기본 지식이 있어야합니다 (적어도 알고리즘의 효율성 여부를 알기에 충분합니다).
They should know vaguely what a trigger is, what a view is, and that it's possible to partition pieces of databases. They don't need any sort of details, but they need to know to ask the DBA about these things.
They should of course know not to meddle with production data, or production code, or anything like that, and they should know that all source code goes into a VCS.
I've doubtless forgotten something, but the average developer need not be a DBA, provided there is a real DBA at hand.
Basic Indexing
I'm always shocked to see a table or an entire database with no indexes, or arbitrary/useless indexes. Even if you're not designing the database and just have to write some queries, it's still vital to understand, at a minimum:
- What's indexed in your database and what's not:
- The difference between types of scans, how they're chosen, and how the way you write a query can influence that choice;
- The concept of coverage (why you shouldn't just write
SELECT *
); - The difference between a clustered and non-clustered index;
- Why more/bigger indexes are not necessarily better;
- Why you should try to avoid wrapping filter columns in functions.
Designers should also be aware of common index anti-patterns, for example:
- The Access anti-pattern (indexing every column, one by one)
- The Catch-All anti-pattern (one massive index on all or most columns, apparently created under the mistaken impression that it would speed up every conceivable query involving any of those columns).
The quality of a database's indexing - and whether or not you take advantage of it with the queries you write - accounts for by far the most significant chunk of performance. 9 out of 10 questions posted on SO and other forums complaining about poor performance invariably turn out to be due to poor indexing or a non-sargable expression.
Normalization
It always depresses me to see somebody struggling to write an excessively complicated query that would have been completely straightforward with a normalized design ("Show me total sales per region.").
If you understand this at the outset and design accordingly, you'll save yourself a lot of pain later. It's easy to denormalize for performance after you've normalized; it's not so easy to normalize a database that wasn't designed that way from the start.
At the very least, you should know what 3NF is and how to get there. With most transactional databases, this is a very good balance between making queries easy to write and maintaining good performance.
How Indexes Work
It's probably not the most important, but for sure the most underestimated topic.
The problem with indexing is that SQL tutorials usually don't mention them at all and that all the toy examples work without any index.
Even more experienced developers can write fairly good (and complex) SQL without knowing more about indexes than "An index makes the query fast".
That's because SQL databases do a very good job working as black-box:
Tell me what you need (gimme SQL), I'll take care of it.
And that works perfectly to retrieve the correct results. The author of the SQL doesn't need to know what the system is doing behind the scenes--until everything becomes sooo slooooow.....
That's when indexing becomes a topic. But that's usually very late and somebody (some company?) is already suffering from a real problem.
That's why I believe indexing is the No. 1 topic not to forget when working with databases. Unfortunately, it is very easy to forget it.
Disclaimer
The arguments are borrowed from the preface of my free eBook "Use The Index, Luke". I am spending quite a lot of my time explaining how indexes work and how to use them properly.
I just want to point out an observation - that is that it seems that the majority of responses assume database is interchangeable with relational databases. There are also object databases, flat file databases. It is important to asses the needs of the of the software project at hand. From a programmer perspective the database decision can be delayed until later. Data modeling on the other hand can be achieved early on and lead to much success.
I think data modeling is a key component and is a relatively old concept yet it is one that has been forgotten by many in the software industry. Data modeling, especially conceptual modeling, can reveal the functional behavior of a system and can be relied on as a road map for development.
On the other hand, the type of database required can be determined based on many different factors to include environment, user volume, and available local hardware such as harddrive space.
Avoiding SQL injection and how to secure your database
Every developer should know that this is false: "Profiling a database operation is completely different from profiling code."
There is a clear Big-O in the traditional sense. When you do an EXPLAIN PLAN
(or the equivalent) you're seeing the algorithm. Some algorithms involve nested loops and are O( n ^ 2 ). Other algorithms involve B-tree lookups and are O( n log n ).
This is very, very serious. It's central to understanding why indexes matter. It's central to understanding the speed-normalization-denormalization tradeoffs. It's central to understanding why a data warehouse uses a star-schema which is not normalized for transactional updates.
If you're unclear on the algorithm being used do the following. Stop. Explain the Query Execution plan. Adjust indexes accordingly.
Also, the corollary: More Indexes are Not Better.
Sometimes an index focused on one operation will slow other operations down. Depending on the ratio of the two operations, adding an index may have good effects, no overall impact, or be detrimental to overall performance.
I think every developer should understand that databases require a different paradigm.
When writing a query to get at your data, a set-based approach is needed. Many people with an interative background struggle with this. And yet, when they embrace it, they can achieve far better results, even though the solution may not be the one that first presented itself in their iterative-focussed minds.
Excellent question. Let's see, first no one should consider querying a datbase who does not thoroughly understand joins. That's like driving a car without knowing where the steering wheel and brakes are. You also need to know datatypes and how to choose the best one.
Another thing that developers should understand is that there are three things you should have in mind when designing a database:
Data integrity - if the data can't be relied on you essentially have no data - this means do not put required logic in the application as many other sources may touch the database. Constraints, foreign keys and sometimes triggers are necessary to data integrity. Don't fail to use them because you don't like them or don't want to be bothered to understand them.
Performance - it is very hard to refactor a poorly performing database and performance should be considered from the start. There are many ways to do the same query and some are known to be faster almost always, it is short-sighted not to learn and use these ways. Read some books on performance tuning before designing queries or database structures.
Security - this data is the life-blood of your company, it also frequently contains personal information that can be stolen. Learn to protect your data from SQL injection attacks and fraud and identity theft.
When querying a database, it is easy to get the wrong answer. Make sure you understand your data model thoroughly. Remember often actual decisions are made based on the data your query returns. When it is wrong, the wrong business decisions are made. You can kill a company from bad queries or loose a big customer. Data has meaning, developers often seem to forget that.
Data almost never goes away, think in terms of storing data over time instead of just how to get it in today. That database that worked fine when it had a hundred thousand records, may not be so nice in ten years. Applications rarely last as long as data. This is one reason why designing for performance is critical.
Your database will probaly need fields that the application doesn't need to see. Things like GUIDs for replication, date inserted fields. etc. You also may need to store history of changes and who made them when and be able to restore bad changes from this storehouse. Think about how you intend to do this before you come ask a web site how to fix the problem where you forgot to put a where clause on an update and updated the whole table.
Never develop in a newer version of a database than the production version. Never, never, never develop directly against a production database.
If you don't have a database administrator, make sure someone is making backups and knows how to restore them and has tested restoring them.
Database code is code, there is no excuse for not keeping it in source control just like the rest of your code.
Evolutionary Database Design. http://martinfowler.com/articles/evodb.html
These agile methodologies make database change process manageable, predictable and testable.
Developers should know, what it takes to refactor a production database in terms of version control, continious integration and automated testing.
Evolutionary Database Design process has administrative aspects, for example a column is to be dropped after some life time period in all databases of this codebase.
At least know, that Database Refactoring concept and methodologies exist. http://www.agiledata.org/essays/databaseRefactoringCatalog.html
Classification and process description makes it possible to implement tooling for these refactorings too.
From my experience with relational databases, every developer should know:
- The different data types:
Using the correct type for the correct job will make your DB design more robust, your queries faster and your life easier.
- Learn about 1xM and MxM:
This is the bread and butter for relational databases. You need to understand one-to-many and many-to-many relations and apply then when appropriate.
- "K.I.S.S." principle applies to the DB as well:
Simplicity always works best. Provided you have studied how DB work, you will avoid unnecessary complexity which will lead to maintenance and speed problems.
- Indices:
It's not enough if you know what they are. You need to understand when to used them and when not to.
also:
- Boolean algebra is your friend
- Images: Don't store them on the DB. Don't ask why.
- Test DELETE with SELECT
I would like everyone, both DBAs and developer/designer/architects, to better understand how to properly model a business domain, and how to map/translate that business domain model into both a normalized database logical model, an optimized physical model, and an appropriate object oriented class model, each one of which is (can be) different, for various reasons, and understand when, why, and how they are (or should be) different from one another.
I would say strong basic SQL skills. I've seen a lot of developers so far who know a little about databases but are always asking for tips about how to formulate a quite simple query. Queries are not always that easy and simple. You do have to use multiple joins (inner, left, etc.) when querying a well normalized database.
About the following comment to Walter M.'s answer:
"Very well written! And the historical perspective is great for people who weren't doing database work at that time (i.e. me)".
The historical perspective is in a certain sense absolutely crucial. "Those who forget history, are doomed to repeat it.". Cfr XML repeating the hierarchical mistakes of the past, graph databases repeating the network mistakes of the past, OO systems forcing the hierarchical model upon users while everybody with even just a tenth of a brain should know that the hierarchical model is not suitable for general-purpose representation of the real world, etcetera, etcetera.
As for the question itself:
Every database developer should know that "Relational" is not equal to "SQL". Then they would understand why they are being let down so abysmally by the DBMS vendors, and why they should be telling those same vendors to come up with better stuff (e.g. DBMS's that are truly relational) if they want to go on sucking hilarious amounts of money out of their customers for such crappy software).
And every database developer should know everything about the relational algebra. Then there would no longer be a single developer left who had to post these stupid "I don't know how to do my job and want someone else to do it for me" questions on Stack Overflow anymore.
I think a lot of the technical details have been covered here and I don't want to add to them. The one thing I want to say is more social than technical, don't fall for the "DBA knowing the best" trap as an application developer.
If you are having performance issues with query take ownership of the problem too. Do your own research and push for the DBAs to explain what's happening and how their solutions are addressing the problem.
Come up with your own suggestions too after you have done the research. That is, I try to find a cooperative solution to the problem rather than leaving database issues to the DBAs.
Simple respect.
- It's not just a repository
- You probably don't know better than the vendor or the DBAs
- You won't support it at 3 a.m. with senior managers shouting at you
Consider Denormalization as a possible angel, not the devil, and also consider NoSQL databases as an alternative to relational databases.
Also, I think the Entity-Relation model is a must-know for every developper even if you don't design databases. It'll let you understand thoroughly what's your database all about.
Never insert data with the wrong text encoding.
Once your database becomes polluted with multiple encodings, the best you can do is apply some kind combination of heuristics and manual labor.
Aside from syntax and conceptual options they employ (such as joins, triggers, and stored procedures), one thing that will be critical for every developer employing a database is this:
Know how your engine is going to perform the query you are writing with specificity.
The reason I think this is so important is simply production stability. You should know how your code performs so you're not stopping all execution in your thread while you wait for a long function to complete, so why would you not want to know how your query will affect the database, your program, and perhaps even the server?
This is actually something that has hit my R&D team more times than missing semicolons or the like. The presumtion is the query will execute quickly because it does on their development system with only a few thousand rows in the tables. Even if the production database is the same size, it is more than likely going to be used a lot more, and thus suffer from other constraints like multiple users accessing it at the same time, or something going wrong with another query elsewhere, thus delaying the result of this query.
Even simple things like how joins affect performance of a query are invaluable in production. There are many features of many database engines that make things easier conceptually, but may introduce gotchas in performance if not thought of clearly.
Know your database engine execution process and plan for it.
For a middle-of-the-road professional developer who uses databases a lot (writing/maintaining queries daily or almost daily), I think the expectation should be the same as any other field: You wrote one in college.
Every C++ geek wrote a string class in college. Every graphics geek wrote a raytracer in college. Every web geek wrote interactive websites (usually before we had "web frameworks") in college. Every hardware nerd (and even software nerds) built a CPU in college. Every physician dissected an entire cadaver in college, even if she's only going to take my blood pressure and tell me my cholesterol is too high today. Why would databases be any different?
Unfortunately, they do seem different, today, for some reason. People want .NET programmers to know how strings work in C, but the internals of your RDBMS shouldn't concern you too much.
It's virtually impossible to get the same level of understanding from just reading about them, or even working your way down from the top. But if you start at the bottom and understand each piece, then it's relatively easy to figure out the specifics for your database. Even things that lots of database geeks can't seem to grok, like when to use a non-relational database.
Maybe that's a bit strict, especially if you didn't study computer science in college. I'll tone it down some: You could write one today, completely, from scratch. I don't care if you know the specifics of how the PostgreSQL query optimizer works, but if you know enough to write one yourself, it probably won't be too different from what they did. And you know, it's really not that hard to write a basic one.
The order of columns in a non-unique index is important.
The first column should be the column that has the most variability in its content (i.e. cardinality).
This is to aid SQL Server ability to create useful statistics in how to use the index at runtime.
Understand the tools that you use to program the database!!!
I wasted so much time trying to understand why my code was mysteriously failing.
If you're using .NET, for example, you need to know how to properly use the objects in the System.Data.SqlClient
namespace. You need to know how to manage your SqlConnection
objects to make sure they are opened, closed, and when necessary, disposed properly.
You need to know that when you use a SqlDataReader
, it is necessary to close it separately from your SqlConnection
. You need to understand how to keep connections open when appropriate to how to minimize the number of hits to the database (because they are relatively expensive in terms of computing time).
- Basic SQL skills.
- Indexing.
- Deal with different incarnations of DATE/ TIME/ TIMESTAMP.
- JDBC driver documentation for the platform you are using.
- Deal with binary data types (CLOB, BLOB, etc.)
For some projects, and Object-Oriented model is better.
For other projects, a Relational model is better.
The impedance mismatch problem, and know the common deficiencies or ORMs.
RDBMS Compatibility
Look if it is needed to run the application in more than one RDBMS. If yes, it might be necessary to:
- avoid RDBMS SQL extensions
- eliminate triggers and store procedures
- follow strict SQL standards
- convert field data types
- change transaction isolation levels
Otherwise, these questions should be treated separately and different versions (or configurations) of the application would be developed.
Don't depend on the order of rows returned by an SQL query.
http://www.reddit.com/r/programming/comments/azdd7/programmers_sit_your_butt_down_i_need_to_have_a/
참고URL : https://stackoverflow.com/questions/1981526/what-should-every-developer-know-about-databases
'IT story' 카테고리의 다른 글
Python Pandas : 특정 값과 일치하는 열의 색인을 가져옵니다. (0) | 2020.05.06 |
---|---|
크론과 virtualenv (0) | 2020.05.06 |
CTOR의 의미는 무엇입니까? (0) | 2020.05.06 |
matplotlib을 사용하여 이미지를 회색조로 표시 (0) | 2020.05.06 |
Python 멀티 프로세싱 PicklingError : 피클 할 수 없습니다 (0) | 2020.05.06 |