×
Databases Exercises

Databases

Standard and Higher Level

A database is a systematically organised or structured repository of indexed information (usually as a group of linked data files) that allows easy retrieval, updating, analysis, and output of data. This data could be in the form of graphics, reports, scripts, tables, text, etc., representing almost every kind of information. Most computer applications (including antivirus software, spreadsheets, word-processors) have databases at their core.

Examples of databases

Common uses of databases include the following:
  • Business Databases - holding information on products, customers, suppliers and more. By analysing the data collected through the business and stored in the database, businesses can customise and refine their business processes accordingly.
  • Transport Databases - Flights, trains, buses, traffic data. Accurate real-time data can be used to help people make bookings online all over the world.
  • Government Databases - Holding data on the population of a country, such as tax details, medical history, educational records, and criminal data. Some suggest that governments know too much about the population and their right to privacy is being challenged.
  • Online Databases - CMS (Content Management Systems) form the backbone of many websites such as blogs and news sites. Everytime there is new content, it is added to the database, and the website renders that new content without any additional work to the site being required.
  • Search Engines - Search engines like Google and Bing store records of websites and use technology called 'spiders' to 'crawl' them and check for relevant keywords and content to help index the search results more efficiently. They also keep track of what searches users have made over time and tailor the search results for them as well. Social media is essentially a big database of content, storing information on users, likes, comments, preferences, videos viewed etc...
  • Database Software

    Database Management Systems (DBMS) are used to create and edit Databases. It is also known as Relational Database Management Systems (RDBMS) when it can create relationships between data stored.

    Database Structure

    When it comes to organising and storing data in a database, there are several components:

  • Fields - An individual piece of data about one particular entity. For example, the phone number of a customer.
  • Records - A collection of fields that all relate to one thing. For example, the full details of a customer (Name, Address, Contact Number, Email etc...)
  • Tables - A collection of records that are all related to the the same thing. For example, a collection of records about customers, would be in the Customers Table.
  • Every table in a database should have a field called the PRIMARY KEY. A primary key acts as a unique identifier for the record. For example, there might be 30 copies of the same book in a library with the same ISBN, so in a database there would be a need for a field called 'Book_ID' to uniquely identify each individual copy of the book. That way, it is clear which of the books is loaned out or in stock at any one time.

    The same field in a related table to help create relationship is called a FOREIGN KEY, but we'll cover that later.

    Each field in a table must be allocated a data type, as the image above shows, in that example, AutoNumber, Number and Text have been used to define what type of data the field will hold. In Microsoft Access there are certain possibilities that can be assigned, including those shown, but in Filemaker Pro the options are slightly different. The consistent options however are Text, Number, and Date. (Pro tip - fields that contain telephone numbers should be stored as text, because there are no mathematical calculations carried out with a phone number.)

    Flat File Databases

    A flat file database is a database stored as an ordinary unstructured file called a "flat file". It has no structure for indexing and there are usually no structural relationships between the records. Essentially, all of the data is held in one 'table'. Often stored as a plain text file with one record per line.
    The main issue with flat file databases is that you end up with a lot of redundant data, which is data that is unnecessarily repeated. For example, if we stored information on books, we would have to write the author's name every single time one copy of the book was added to the database; It's the same information over and over.
    Redundant data causes the following issues:

  • Data integrity is compromised. If data is entered incorrectly in one record, it would likely be copied for all new records.
  • Another data integrity issue is if an author's name needed altering, the user would have to go through and manually change every single instance of their name in all records. This would be time consuming, and most likely would lead to errors of some records being missed.
  • With huge amounts of unnecessarily repeated data, the storage space needed on a disk would increase by a lot.
  • Relational Databases and Normalisation

    To avoid the issues associated with flat file databases, one can create a a relational database where tables are created about entities, then joined together via key fields. The process of removing all redundant data in a database is called Normalisation.
    The image below shows a separation of entities into tables and then the relationships between the key fields and related field in the other table.
    It shows ONE author can have MANY books, ONE book can be loaned MANY times, and ONE student can have MANY loans. As you might have guessed, these are all one-to-many relationships.
    In the example shown, the Primary Key fields are AuthorID, BookID, LoanID, and StudentID (the key icon is shown next to them). In the tables where they are linked and the field is repeated in order to create the relationships, this field is called the Foreign Key

    The main point of relationships is to eradicate redundant data and by organising the database structure in this way it means for example that if one were to update the Author's name in the Authors table, then that data would automatically be update in all other places that it appeared in the database (in queries, or reports etc...)

    2 key points to remember abour relationships - 1) They link two fields with the same name and data type. 2) They occur between a primary key and a non-primary key.

    Queries

    A query is a request for data or information from a database table or combination of tables. For example, for the database in the relationships example above, a query could be:

    • All students who currently have a book loaned out.
    • Which authors have the most books loaned out?
    • Which book price range is the most popular?
    • Which books do students who are studying courses longer than 3 years loan out?
    • All books by authors who have names beginning with R.
    Queries allow users to extract some meaning from data in a database. Remember, data is just a raw stream of facts; Information is that data processed into something meaningful and useful. They can use simple statements such as the ones shown above or use Boolean operators (e.g. AND, OR and NOT) to combine query requests.
    Parameter queries work in the same way as a regular query but allows the search criteria to be defined as the search takes place. In other words, typing the required text into a search box when the query is run, like typing in a user's surname to find all records related to them.

    Structured Query Language

    SQL (Structured Query Language) is a standardized programming language used for managing relational databases and performing various operations on the data in them. Initially created in the 1970s, SQL is regularly used by database administrators, as well as by developers writing data integration scripts and data analysts looking to set up and run analytical queries.
    Here are some examples of SQL:

    • SELECT * FROM Books WHERE Price = '9.99' - This would return all books from the database whose price is 9.99.
    • SELECT * FROM Students WHERE NOT Course studied = 'Science' - This would return all students who do not study science.
    • INSERT INTO Books VALUES '978-3-16-148410-0', '3', '1/6/2014', '14.99' - This would add a new record to the books table with the values shown in order (the BookID field would be generated automatically).

    Forms

    Forms are used for entering data into a database via a graphical user interface. Users don't need to see the underlying data structures, they can just use a customised well designed easy to use form, with text fields, buttons for navigation and performing operations on records such as saving, or deleting.
    As well as entering data, forms are a good way to display one record at a time on a screen.
    Forms can be customised with images and colour schemes to fit with a company theme.
    Another feature of forms is to add validation and verification to each field. To ensure that the data going into the database is in an acceptable format (e.g. phone numbers no longer than 8 digits.)

    Reports

    A way to present data stored in a database that looks professional is in the format of a report. Much the same as forms, you can customised the look and feel with fonts, colours, images etc.. Reports can combine data from various related tables and can be presented in a way that looks better on screen, or in a format that is suitable for printing. They can also include consolidated fields such as totals, averages etc...

    Data Integrity

    For a database to be useful and effective, the main component, data, should be accurate, relevant and up-to-date. Confidence in databases will determine how successful they are as if they are inaccurate, people will not want to use them. The phrase Garbage In Garbage Out applies to computing in many fields, but especially with databases. If data is not entered correctly, then queries and reports will also be inaccurate.
    Data in a database might be out of date, entered wrongly, corrupted, changed accidentally, changed on purpose, incomplete or even missing completely. All of which would result in a database being useless.

    Validation

    Validation is ensuring that content in a database is entered in an acceptable correct format. There are numerous checks that can take place to validate data:

    • Range check - Checking that data falls between a certain range. E.g. between 0 and 100 for a percentage field.
    • Length check - Checking that data is a certain length. E.g. Credit card number should be 16 digits.
    • Presence check - making sure that data has been entered.
    • Consistency check - comparing fields to ensure that they don't contradict each other. E.g. checking that a graduation date happens after an enrollment date.
    • Check digit - A check using mathematics for numeric fields. E.g. The final digit of a Universal Product Code (bar code) is a check digit.
    • Value list - Only allow a preset number of possible values. E.g. Birth month for a DOB field.

    Verification

    Verification is ensuring that content in a database is actually the correct data. One method of ensuring this, is by double data entry (like when choosing a password to open an online account). Another method is to visually compare data entered to the manual documents that the data comes from. A final method is speaking directly to the data subject (the person who the data is about) to check the data is accurate.

    Privacy

    Databases can prove very useful to businesses looking to improve their services by understanding how they are performing, how customers feel and what decisions they should make in future regarding the direction of the business. With the volume of data collected every day, there have been many concerns raised about the use of data, how it is collected, and whether subjects are aware it's being collected and how it is being used.
    Data matching is the building up of a profile of a user by comparing data from various different databases. Governments often use data matching to work out who is committing fraud or cheating government systems. For example, someone claiming to need government assistance as they claim they can't work because of medical reasons could be found out if their medical records have no history of such medical ailments.
    Data mining is the process of identifying trends and patterns in data through analysing millions of records quickly; Something that would take a human being a long time to do. Businesses such as Amazon use data mining to help recommend products for customers by analysing their browsing history on the site as well as past purchases combined with numerous other factors such as geographic location or age. Data mining is used in health to analyse which areas of a country are most likely to need certain medicines and treatments. Data is analysed to identify trends in illnesses and what time of year they occur so the logistics and distribution of the drugs can be the most efficient.

    Data protection looks at protecting data privacy and includes the following ideas: Consent - people should give permission (or not) companies or institutions to store data on them. Most companies have a long terms and conditions that a user must tick before using the service. Purpose - the purpose of the data collection should be clear, and the data collected should not be used for any purpose other than that stated. Distribution - this looks at the whether your data can be passed on to third parties or not. E.g. a hospital selling medical records to a drugs company for research. Rights of access - People should have access to data stored on them and the right to edit the data if it is inaccurate. Length of storage - it should be clear when data is collected, for how long it is intended to be held, and deleted after that time. Security - The measures in place to ensure that data held on an individual is not open to unauthorised access.

    Security

    Preventing unauthorised access in an important element of storing data. Databases are constantly targeted by hackers trying to gain unauthorised access to data. Even the theft of the actual physical servers has been done in the past.
    Encryption is one method of helping to prevent the interception of data when being sent over a network. Database auditing also helps in this respect as any activity on the database such as adding, editing or deleting records can be stored in a log. This can help to discover unauthorised access with potentially malicious intent.

    Other Resources

    Paper 1

    Slightly different examinations for SL and HL, based on answering 20 mark questions on a selection of the course topics.

    Paper 3

    For HL students only, answering questions on the pre-seen case study. This year is 'On the road to driverless cars'.

    Extended Essay

    Details and recommendations for students considering writing their Extended Essay in ITGS.

    Command Terms

    How to approach examination questions that use each of the prescribed command terms for ITGS.