DBMS Important Questions

What is the difference between primary key and unique constraints?
The primary key cannot have NULL value, the unique constraints can have NULL values. There is only one primary key in a table, but there can be multiple unique constrains.

What are the differences between DDL, DML, and DCL in SQL?
Following are some details of three :
DDL stands for Data Definition Language. SQL queries like CREATE, ALTER, DROP, TRUNCATE and RENAME come under this.
DML stands for Data Manipulation Language. SQL queries like SELECT, INSERT, DELETE and UPDATE come under this.
DCL stands for Data Control Language. SQL queries like GRANT and REVOKE come under this.

What is the difference between having and where clause?
HAVING is used to specify a condition for a group or an aggregate function used in a select statement. The WHERE clause selects before grouping. The HAVING clause selects rows after grouping. Unlike the HAVING clause, the WHERE clause cannot contain aggregate functions.

What is a view in SQL? How to create a view?
A view is a virtual table based on the result-set of an SQL statement. We can create it using create view syntax.

CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition

What are the uses of view?

1. Views can represent a subset of the data contained in a table; consequently, a view can limit the degree of exposure of the underlying tables to the outer world: a given user may have permission to query the view, while denied access to the rest of the base table.

4. Views can hide the complexity of data.

6. Depending on the SQL engine used, views can provide extra security.

What is a Trigger?
A Trigger is a code associated with insert, update or delete operations. The code is executed automatically whenever the associated query is executed on a table. Triggers can be useful to maintain integrity in the database.

What is a stored procedure?
A stored procedure is like a function that contains a set of operations compiled together. It contains a set of operations that are commonly used in an application to do some common database tasks.

What is the difference between Trigger and Stored Procedure?
Unlike Stored Procedures, Triggers cannot be called directly. They can only be associated with queries.

What is Denormalization?

Denormalization is a database optimization technique in which we add redundant data to one or more tables.

What is the purpose of normalization in DBMS?

Database normalization is the process of organizing the attributes of the database to reduce or eliminate data redundancy (having the same data but at different places).

What is the difference between a database schema and a database state?

The collection of information stored in a database at a particular moment in time is called database state while the overall design of the database is called the database schema.

What are the main differences between Primary key and Unique Key?

Given below are few differences:

The main difference between the Primary key and the Unique key is that the Primary key can never have a null value while the Unique key may consist of a null value.

In each table, there can be only one primary key while there can be more than one unique key in a table.

What is the use of the DROP command and what are the differences between DROP, TRUNCATE and DELETE commands?

DROP command is a DDL command which is used to drop/delete the existing table, database, index, or view from the database.

The major difference between DROP, TRUNCATE and DELETE commands are:

DROP and TRUNCATE commands are the DDL commands which are used to delete tables from the database.

And when we make use of a DROP command, the tables get deleted permanently all the privileges and indexes that are related to the table also get deleted. This operation cannot be rolled back and so should be used only when necessary.

However in case of TRUNCATE, only the data stored in a table is deleted and the structure of the table is preserved and you can re-insert data by the use of “INSERT INTO clause”. It can be rolled back until the commit has been made.

DELETE command, on the other hand, is a DML Command which is used to delete rows from the table and this can be rolled back, however its considered slower than truncate. Using the delete command, we can delete 1 or more specific rows from the table.

What is a functional dependency in the DBMS?

This is basically a constraint that is useful in describing the relationship among the different attributes in a relation.

Example: If there is some relation ‘R1’ which has 2 attributes as Y and Z then the functional dependency among these 2 attributes can be shown as Y->Z which states that Z is functionally dependent on Y.

What is 1NF in the DBMS?

1NF is known as the First Normal Form.

This is the easiest form of the normalization process which states that the domain of an attribute should have only atomic values. The objective of this is to remove the duplicate columns that are present in the table.

What is 2NF in the DBMS?

2NF is the Second Normal Form.

Any table is said to have in the 2NF if it satisfies the following 2 conditions:

A table is in the 1NF.

Each non-prime attribute of a table is said to be functionally dependent in totality on the primary key.

What is 3NF in the DBMS?

3NF is the Third Normal Form.

Any table is said to have in the 3NF if it satisfies the following 2 conditions:

A table is in the 2NF.

Each non-prime attribute of a table is said to be non-transitively dependent on every key of the table.

What is BCNF in the DBMS?

BCNF is the Boyce Codd Normal Form which is stricter than the 3NF.

Any table is said to have in the BCNF if it satisfies the following 2 conditions:

A table is in the 3NF.

For each of the functional dependencies X->Y that exists, X is the super key of a table.

How is the pattern matching done in the SQL?

Answer: With the help of the LIKE operator, pattern matching is possible in the SQL.’%’ is used with the LIKE operator when it matches with the 0 or more characters, and ‘_’ is used to match the one particular character.

Example:

SELECT * from Emp WHERE name like ‘b%’;

SELECT * from Emp WHERE name like ‘hans_’;

What are the different types of joins in SQL?

There are 4 types of SQL Joins:

Inner Join: This type of join is used to fetch the data among the tables which are common in both tables.

Left Join: This returns all the rows from the table which is on the left side of the join but only the matching rows from the table which is on the right side of the join.

Right Join: This returns all the rows from the table which is on the right side of the join but only the matching rows from the table which is on the left side of the join.

Full Join: This returns the rows from all the tables on which the join condition has been put and the rows which do not match hold null values.

What is conceptual design in dbms?

Conceptual design is the first stage in the database design process. The goal at this stage is to design a database that is independent of database software and physical details. The output of this process is a conceptual data model that describes the main data entities, attributes, relationships, and constraints of a given problem domain.

What is the main goal of RAID technology?

RAID stands for Redundant Array of Inexpensive (or sometimes “Independent”)Disks.

RAID is a method of combining several hard disk drives into one logical unit (two or more disks grouped together to appear as a single device to the host system). RAID technology was developed to address the fault-tolerance and performance limitations of conventional disk storage. It can offer fault tolerance and higher throughput levels than a single hard drive or group of independent hard drives. While arrays were once considered complex and relatively specialized storage solutions, today they are easy to use and essential for a broad spectrum of client/server applications.

Explain the concept of a database schema.

A database schema is the skeleton structure representing the entire database's logical view. It defines how data is organized and how their relations are associated. A schema is designed at the time of database design and is not expected to change frequently. It includes tables, views, indexes, relationships, stored procedures, and more, defining the entities and their relationships.

Sharding is a database architecture technique used to improve the performance, scalability, and availability of databases, particularly in distributed systems. The core idea behind sharding is to divide (or "shard") a large dataset into smaller, more manageable pieces, called shards, which can be distributed across multiple database servers.

Key Concepts of Sharding:

Shards:
- A shard is a horizontal partition of data in a database. Each shard contains a subset of the data, often based on some partitioning logic like user ID, geographic location, or another data attribute.
- Each shard operates independently, handling a subset of the overall dataset.
Shard Key:
- The shard key is the attribute or combination of attributes used to determine which shard a particular piece of data belongs to.
- For example, in a user database, the user ID might be used as the shard key to distribute users across different shards.
Horizontal Partitioning:
- Sharding is a form of horizontal partitioning, where rows of a table are divided into different shards based on the shard key.
- Unlike vertical partitioning, where columns are split, horizontal partitioning focuses on splitting rows.
Distributed Architecture:
- Sharding allows the database to be distributed across multiple servers, enabling the system to handle more data and more traffic than a single server could manage.

Benefits of Sharding:

Scalability:
- Sharding allows the database to scale horizontally, meaning you can add more servers (shards) to handle increased load or data growth without requiring a complete overhaul of the system.
Improved Performance:
- Since each shard contains only a portion of the total data, queries can be executed more quickly, reducing response times and improving overall performance.
Fault Tolerance:
- If one shard or server fails, only a portion of the data is affected, and the rest of the system can continue to operate. This improves the overall availability and reliability of the system.
Easier Management:
- Smaller datasets in individual shards are easier to manage, back up, and restore than a single monolithic database.

Challenges of Sharding:

Complexity:
- Implementing sharding adds complexity to the system, particularly in terms of query routing, data consistency, and transaction management across shards.
Rebalancing:
- As data grows or usage patterns change, you may need to rebalance the shards, which can be complex and time-consuming.
Cross-Shard Queries:
- Queries that need to access data from multiple shards can be challenging to optimize and may require additional overhead to join data across shards.
Data Consistency:
- Ensuring data consistency across shards can be more difficult, especially in distributed systems where network partitions or server failures may occur.

Example of Sharding:

Consider a social media application with millions of users. Instead of storing all user data in a single database, the application can shard the data by user ID. For instance:

Users with IDs 1-1,000,000 are stored in Shard A.
Users with IDs 1,000,001-2,000,000 are stored in Shard B.
Users with IDs 2,000,001-3,000,000 are stored in Shard C.

This allows the system to scale by adding more shards as the number of users grows, improving performance and reliability.

Real-World Use Cases:

MongoDB and Cassandra: NoSQL databases like MongoDB and Cassandra use sharding to handle large-scale data in distributed environments.
Twitter: Uses sharding to distribute user data and tweets across multiple servers, enabling the platform to handle billions of tweets and user interactions efficiently.

Sharding is a powerful technique for scaling large databases, but it requires careful planning and management to handle the associated complexity.

Intension Vs. Extension

Summary of Intension vs. Extension:

Intension is the schema or structure of the database, defining what data can be stored and how it is organized.
Extension is the actual data or instance currently stored in the database, representing the content at a specific point in time.

In other words:

Intension = Schema: Defines the potential structure and types of data.
Extension = Instance: Represents the actual data that exists at any given moment.

The distinction between intension and extension helps in understanding the difference between the design of a database and its actual usage over time.

In the context of database management systems (DBMS), shared locks and exclusive locks are mechanisms used in concurrency control to ensure that transactions are executed in a way that maintains data integrity and consistency. These locks are essential for managing how multiple transactions can access the same data concurrently.

1. Shared Lock (S-Lock)

A shared lock allows multiple transactions to read a particular data item simultaneously, but it prevents any transaction from modifying the data item.

Purpose: Shared locks are used when a transaction only needs to read data and does not intend to modify it.
Concurrent Access: Multiple transactions can hold a shared lock on the same data item at the same time, enabling concurrent read operations.
Restrictions: While a shared lock is held on a data item, no transaction can acquire an exclusive lock on that data item, thus preventing any write operations.
Example:
- Suppose two transactions, T1 and T2, both want to read a particular record R. Both transactions can acquire a shared lock on R simultaneously, allowing them to read the data without interfering with each other.
- However, if another transaction, T3, wants to write to record R, it must wait until both T1 and T2 release their shared locks.

2. Exclusive Lock (X-Lock)

An exclusive lock is a lock that a transaction acquires when it intends to both read and modify a particular data item.

Purpose: Exclusive locks are used when a transaction needs to write (or update/delete) data.
Exclusive Access: Only one transaction can hold an exclusive lock on a particular data item at any time, ensuring that no other transaction can read or write to that data item while it is locked.
Restrictions: While an exclusive lock is held on a data item, no other transaction can acquire either a shared or an exclusive lock on that data item.
Example:
- If a transaction T4 wants to update record R, it must acquire an exclusive lock on R. While T4 holds this lock, no other transaction can read or modify R until T4 releases the exclusive lock.

Key Differences Between Shared Lock and Exclusive Lock:

Purpose:
- Shared Lock: Allows concurrent reads but no writes.
- Exclusive Lock: Allows a single transaction to read and write the data, preventing any other transaction from accessing the data.
Concurrency:
- Shared Lock: Multiple transactions can hold shared locks on the same data item concurrently.
- Exclusive Lock: Only one transaction can hold an exclusive lock on a data item at a time.
Interaction:
- A transaction holding a shared lock on a data item can coexist with other shared locks on that same item.
- A transaction holding an exclusive lock on a data item cannot coexist with any other locks on that item (neither shared nor exclusive).

Use in Two-Phase Locking (2PL):

In the Two-Phase Locking (2PL) protocol, transactions acquire locks in two phases:

Growing Phase: A transaction may acquire as many locks (shared or exclusive) as it needs.
Shrinking Phase: Once the transaction releases a lock, it cannot acquire any more locks.
Shared locks are typically used during the growing phase when a transaction reads data.
Exclusive locks are used when the transaction intends to modify the data.

Summary:

Shared Lock: Allows multiple transactions to read the same data concurrently without allowing any of them to write to it.
Exclusive Lock: Ensures that only one transaction can read or modify a data item, preventing other transactions from accessing the data until the lock is released.

These locking mechanisms are crucial for maintaining data consistency and ensuring that transactions do not interfere with each other in a way that could lead to data anomalies.