Chapter 5
Databases
Computer systems need to keep in memory a plethora of information. Customer accounts, subscription status, or even
simply the website's content.
These pieces of information are stored via software called "database management systems." A large-scale website
utilizes multiple database technologies, each specialized for a particular use. Among others:
- Working with huge batches of data.
- Or conversely, on a single data point.
- Search engine.
- Identifying relationships.
- Storage of documents and media.
- Ultra-fast access.
Today, we will focus on relational database management systems. These are the technologies most commonly used by developers in their daily lives. You might have already heard of MySQL, Oracle, SQL Server, or PostgreSQL.
Let's build a newsletter 💪
Imagine a database as an Excel file.
A tab in the file represents what we call a "table".
A column in a table is called a "field".
A row in a table is called a "tuple", "entry", "row", or "record".
For example, if you want to model in a database this newsletter, the wonderful people who have subscribed to it, and
the mail sendings:
Table: chapter
identifier | title | publication_date | content |
---|---|---|---|
database-sql | The Tech Guide #5 - Databases | 2024-01-05 | Welcome to… |
frontend-backend | The Tech Guide #4 - Front-end and Back-end | 2023-09-10 | 👋, welcome… |
Table: subscription
identifier | |
---|---|
1 | foo@bar.com |
2 | hello@world.com |
Table: sent_mail
chapter_identifier | subscription_identifier | send_date |
---|---|---|
database-sql | 1 | 2024-01-05 |
database-sql | 2 | 2024-01-06 |
You get two tables that represent concrete objects: chapter and subscription.
Each line is manually or automatically assigned a unique identifier. It's called a "primary key".
The third table, sent_mail, uses these keys to relate chapters and subscriptions to each other.
It manipulates "foreign keys".
This way of organizing data explains the name "relational database".
SQL
To manipulate these tables, we use Structured Query Language.
SQL, pronounced S.Q.L and formerly known as sequel, aims to be a natural language close to English.
It very easily allows writing queries to carry out the 4 basic operations of data processing:
- Creation. Example: someone subscribes to the newsletter.
- Reading. Example: check who has already received the chapter to avoid sending it twice.
- Updating. Example: correcting a typo in the content.
- Deletion. Example: unsubscribing from the newsletter.
SELECT title, content FROM chapter WHERE identifier = “database-sql”
Easy, isn't it? 😉
We can do crazy stuff in SQL, but the basic syntax is really accessible to everyone.
It's exactly this query that is played if you display the page https://the-tech.guide/chapter/database-sql/.
The URL is analyzed to extract the chapter's identifier.
It is then injected into the SQL query allowing to construct the page.
The database serves in this case to code only one page and to automatically generate all the variants.
😱 And if someone tampers with the URL hoping to modify the underlying SQL query?
That's called SQL injection. It's the flaw exploited in 90% of website hacks.
Every use of data public to users must be secured.
Every day, robots randomly try thousands of injections on the most known sites.
NoSQL
Relational databases, despite their power, show limits on large volumes of data.
The advent of Cloud and BigData has seen the emergence of new technologies categorized as “NoSQL”.
Broadly, NoSQL refers to databases that are not relational. It’s a super broad designation encompassing a whole
range of technologies.
Fun fact: NoSQL technologies often use SQL or a derivative 🤪. They tried to make up for this poor naming with Not
Only SQL.
In a NoSQL database, data is organized completely differently, at least in the internal engine of the software.
This allows for easier distribution across different machines and thus infinitely increases computing power and
available memory.