Database Servers

----------------

Databases primarily store and retrieve data. There are lots of different databases available to choose from. Most databases have a lot in common and, at the same, time they also have a lot of differences. Here is a cool link describing many linux SQL databases and tools.

Here are some common database terms:

  • database - a collection of tables, views, triggers, and store procedures.
  • table - this is a collection of data such as a addresses, customers, or details on a piece of inventory. A table has one or more columns that describes the data that is stored in the table. A table also contains one or more records, rows, or tuples (the actual data stored in the table). A record or row is a collection of columns for a single address, customer, or piece of inventory. For example: Addresses is a table, it contains columns for the customer id, street address, city, state, zip code, and phone number. The Addresses table would contain many addresses like the following three rows:

      CustomerIDStreet AddressCityStateZipPhone
      1323 First StDenverCO80200555-1212
      2423 Second StDenverCO80201555-2323
      3523 Third StDenverCO80202555-3434
  • column - this describes is a single piece of information in a table such a name, phone number, email address, etc.
  • record or row - this is the actual data stored in the table. This is a collection of columns for a single item (or, in our example above, an address for each of three customers 1, 2, and 3 are all rows) in the table. For example, the addresses table would contain individual rows for Seva Software and any other company you do business with. Each address contains basically the same information (the same columns) for each company such as the customerid, street address, city, state, zip, and phone number but the information will be distinct for each company.
  • tuple - basically this refers to a single row or record in a table.
  • views - this a logically grouping of data (columns) from one or more tables. It looks and acts just like a table but does not contain any actual data, it pulls data from tables. A view can retrieve a sub-set of columns and/or rows from a single table and can pull data and rows from multiple tables.
  • updatable views - this allows you actually update data in table by using a view (which does not actually store any data).
  • query - a query is the same thing as a select statement. A query is very similar to a view in that it selects data from one or more tables but it can't be used in place of a view or table. In other words, views can be referenced in other views and queries but queries can't be used by other views.
  • SQL - the Structured Query Language is the primary language for talking to databases. See sevasoftware.com/languages.html#SQL for more details.
  • select - this retrieves and formats data from one or more tables or views.
  • insert - this inserts a new row into a single table.
  • update - this updates one or more rows in a table. Some databases will allow you update multiple rows in the same query.
  • delete - this deletes one more rows from a table.
  • row locking - when inserting, updating, and deleting rows in a table with more than one user connected to the database, some form of locking needs to take place to keep two users from attempting to update the same record at the same time. Without row locking, one user would loose their changes. There are several type of locks:
    • database level locks - the whole database is locked putting the database in single user mode. Some databases offer much better performance in this mode because other locking types can be ignored.
    • table level locks - this allows you lock an entire table. This is not very useful because it blocks all other users from updating (and sometimes reading) rows in the table while you own the lock.
    • page level locks - this occurs when an insert, update, or delete is occurring on a record that is stored in a page with other records. Sometimes the database server will lock the whole page rather than an individual row. This locking mechanism is typically more efficient for most databases than row level locking but it locks more rows with each update, insert, and delete..
    • row level locks - this is most useful and most costly type of lock. It locks the one row you are trying update, insert, or delete. Other users can still select, insert, update, and delete in the table as long they don't try to update the same record you are updating. This is costly because it is typically expensive for the system to track which individual rows are being updated. This is the best locking scheme when many users are trying to update the same table at the same time.
  • concurrent access - when more than user is accessing the database or a table at the same time.
  • parent/child relationships - this describes two tables that share one or more columns in common with each other. This is best described with an example using two tables:
    customers table: 
      CustomerIDCustomer NameType of Customer
      1Seva SoftwareConsulting company
      2Joes plumbingPlumbing
      3Johns ElectricElectrical
    addresses table with potentially more than one address for each company:
      CustomerIDCityStateZipPhone
      1DenverCO80200555-1212
      1AuroraCO80201555-2323
      2AuroraCO80201555-3454
    In this situation the customer is the parent table and addresses is the child table. The parent is typically the table that best describes the data used as the key shared between the parent and child tables. This child is typically extra data being tracked about the parent record.
  • foreign keys - this allows you define a relationship between a parent and child tables. Basically, it automatically installs triggers to automatically update, delete, insert the child table when rows are inserted, updated, and delete in the parent table.
  • inner joins - this involves two tables and retrieves rows only when a matching row existing in both tables. For example: select * from customers inner join addresses on customers.customerid = address.customerid; retrieves all rows in customers with a matching address. Customer 1 is retrieved twice, once for each address, customer 2 is retrieved once, and customer 3 is ignored because it has no address.
  • left outer joins - this also involves two tables and retrieves every row from table one any matching rows from table two. For example: select * from customers left outer join addresses on customers.customerid = address.customerid; retrieves all rows in customers and their matching address. Customer 1 is retrieved twice, once for each address, customer 2 is retrieved once, and customer 3 is returned with empty address data.
  • right outer joins - this also involves two tables and retrieves every row from table two any matching rows from table one. For example: select * from customers right outer join addresses on customers.customerid = address.customerid; retrieves all rows in customers and their matching address. Customer 1 is retrieved twice, once for each address, customer 2 is retrieved once, and customer 3 is not returned because it does not exit in addresses.
  • triggers - triggers are attached to tables and are used for managing the integrity of the data. Triggers can be used to verify that the data being inserted or updated is accurate and complete. Triggers can be used to update other pieces of information based on certain changes. There are 6 kinds of triggers:
    • before insert - this is called just before an insert occurs. This is useful for verifying that the data in the update is complete and accurate. Many databases will let you raise an error to reject the insert.
    • after insert - this is called just after an insert occurs. This is useful for updating any other data in the database. At this point the database has accepted the new record.
    • before update - this is called just before an update occurs. This is useful for verifying that the data in the update is complete and accurate. This type of trigger typically has access to the data currently in the table (the old row) and the newly updated data (the new row). Many databases will let you raise an error to reject the update.
    • after update - this is called just after an update occurs. This is useful for updating any other data in the database. At this point the database has accepted the update. This type of trigger typically has access to the data currently in the table (the old row) and the newly updated data (the new row).
    • before delete - this is called just before a delete occurs. This is useful for verifying that it is okay to delete the record. Many databases will let you raise an error to reject the delete.
    • after delete - this is called just after a delete occurs. This is useful for updating any other data in the database.
  • stored procedures - many database systems have processes used to track the integrity of the data and to periodically update the content of data. You can write these processes or procedures in external programs or save and execute them as store procedures in the database. If you put the code in an external process and two external processes need to call the same procedure, you may need to write the procedure twice, once for each external process. If you store the procedure in the database, each eternal process only need to execute the store procedure. Stored procedures are useful for keeping all your data related processes grouped together in the database. Often times stored procedures will operate on the database faster then most external processes.
  • on-line backups - this allows you safely dump the entire database into external files while the database is active and in use for backup up the data in your database. Taking the database off-line for backups may not be practical for all situations.
  • transaction - a transaction is a group of insert, update, and delete statements applied to one or more tables. Often times, we need all of the transactions in the group to complete as a whole. If any part of a transaction fails then we want to be able to rollback or undo all of the transaction. A half completed transaction usually mean half updated data and this is very undesirable. As a result, we begin a transaction, process inserts, updates, and deletes, and, if all goes well, we commit the transaction to the database. The commit applies all the data changes to the database. If anything goes wrong and/or one of the insert, updates, or deletes fails, we can rollback the transaction. After a rollback, the database will not have any record of any of the inserts, updates, or delete we just tried to do. This gives us a change to correct the failure and try the transaction again until it completes successfully.
  • transaction logging - this allows you to track every insert, update, and delete to the system and determine who applied the insert, update, and delete. Transaction logs are valuable for restoring and recovering from database failures. Transaction logs can also be used to update other copies of the database where real time updates are not needed.
  • replication - typically, replication applies inserts, updates, and deletes to more than one database server in real time. This is valuable if you need two active copies of a database. Two active copies of a database is valuable when one server is not fast enough to meet the needs of your users or you need a real time back up or your database. Most database solutions are adequately covered with regular backups and proper transaction logs. Some more intensive database solutions could benefit from have two (or more) complete copies of the database.

Here are some great questions to ask when researching if a database is right for you:

  • How many records or rows can a table hold?
  • Do I need stored procedures and are they supported? What languages can be used for create stored procedures (PL/SQL, Perl, Ruby, C, etc)?
  • Do I need triggers are they supported?
  • How much database tuning is needed to keep the database healthy? Sometimes, this can depend on how much data and how many inserts, updates, and deletes are done.
  • How is file space or disk space to a table?
  • What platforms (operating systems) do I need to run the database server on?
  • How many users will be connected at the same time?
  • Do I need a replicated database?
  • Would you like some help determining the right database for you? Please take a look at the Outsourcing, Consulting, and Database Design sections of the services web page to see how I can help.

----------------

ArunaDB

ArunaDB is a new database server written in Ruby. It is an interesting blend of a relational database and an objected oriented database attempting to take full advantage of the power of the Ruby programming language. It supports views, triggers, stored procedures, online backups and many other features. It is free and open source. It does NOT yet support ANSI SQL. It is a fairly new product and may lack some of the features available in some of the other products listed here.

----------------

PostgreSQL

PostgreSQL is full a featured database server. It runs on Windows NT, linux, BSD, many other flavors of unix. It provides triggers, stored procedure, updatable views, ANSI SQL support with an ODBC driver similar to Oracle and SQL Server. The greatest thing about PostgreSQL is that it is free and open source. It is a solid choice for small to medium sized companies even though it may not be as mature as and is not as widely used as Oracle or SQL Server. It's an especially good choice for those on a tight budget who need a feature rich database server.

I created a utility that allows you export a Microsoft Access 97 database to PostgreSQL, please visit sevasoftware.com/access for details.

I also created a web page describing how I got a PostgreSQL server up and running on Windows NT please visit sevasoftware.com/PostgreSQL_running_on_NT.html for details.

Here is a nice link offering a lot of information for PostgreSQL: http://home.wanadoo.nl/techlinq/postgresql.html

----------------

Oracle

Oracle is the most popular choice for storing large amounts of information. It is one of the more mature databases available and is used by a lot of very large organizations. Oracle is also one of the more expensive solutions. Oracle runs on many platforms including Microsoft Windows, linux, and many other larger and faster computers (when compared to Intel PCs) such as Hewlett Packard hardware. Oracle makes it fairly easy to migrate from one operating system to another. Basically, this means that, Oracle allows you start out on inexpensive Intel PC's and easily upgrade to larger faster hardware when you need more power. Oracle Forms, Power Builder, and some times Visual Basic are popular front ends (user interfaces) for Oracle.

----------------

Microsoft SQL Server

SQL Server is primarily a back end allowing you to store and ret rive lots of data. Visual Basic and Power Builder are typically used to create forms and reports allowing users to access the data in SQL Server. SQL is expensive, although, it is cheaper than Oracle. It runs only on Microsoft Windows. It is a popular choice among organizations that are primarily Microsoft Window's based and a good choice for small to medium sized businesses.

----------------

Microsoft Access

Microsoft Access provides both a back end for storing and retrieving data and a nice GUI front end for accessing your data. It has a very good GUI front end and can process complicated SQL quickly. It is easy to create and store information and just as easy to create forms and reports allowing to access the information. It's great for small database solutions (1-20 users). It lacks views and transactions and it can be a little slow with lots of concurrent users, you can upgrade the back end to Microsoft's SQL Server to eliminate these limitations. Access get's more expensive with each new version and each end user must have Access installed on their computer. It runs only on Microsoft Windows.

I created a utility that allows you export a Microsoft Access 97 database to PostgreSQL, please visit sevasoftware.com/access for details.

----------------

MySQL

MySQL provides both a back end for storing and retrieving data and a GUI front end for accessing your data. MySQL is open source, inexpensive, and it's been around a while. It runs on lots of different operating systems. It's very popular for storing data for web sites. Until recently, MySQL was lacking some important features such transactions and triggers.