CPSC 310: Database Systems /
CPSC 603: Database Systems and Applications
Fall 2007
Project: Personal Database Application (PDA)
Source: Arthur Keller, CMPS 180, University of California Santa Cruz,
Winter 2002, http://www.cse.ucsc.edu/classes/cmps180/Winter02/.
Adapted for TAMU CS Department.
[Overview]
[Logistics]
[Part 1]
[Part 2]
[Part 3]
[Part 4]
[Part 5]
Your CPSC 310/603 programming project will be to build a substantial
database application for a real-world domain of your choosing. You
will design a schema on paper. Then, you will create an actual
database using MySQL. You will populate the database, write
interactive queries and modifications on the database, and
develop programs that manipulate the database.
Part 1: Choose a project and design an E/R diagram. Due Tue, 9/18/07.
Part 2: Relational design. Due Tue, 10/02/07.
Part 3: Create database in MySQL. Due Tue, 10/23/07. (*new due date*)
Part 4: Queries, updates, and indexes. Due Thu, 11/08/07.
(*even newer due date*)
Part 5: Embedded SQL. Due Tue, 12/4/07. (*newest due date*)
The project is worth 35% of your course grade, broken down like this:
Parts 1, 2 and 3 are worth 5% each. Parts 4 and 5 are worth 10% each.
Back to beginning
- Every student will complete this project individually.
- Turning in project work:
- All parts of the project will be turned in electronically using
the turnin program in the CS Department intranet (CSNet).
Instructions are
here.
No hand-written or hand-drawn material will be accepted.
Every submission must be turned in with this cover sheet, which should be signed and turned in by hand in class.
- ONLY use the file names given below for each part.
- For each part of the project, bundle all the files into one
archive (.tar or .zip file).
- Submission file format: (1) for diagrams, TA can only open .pdf, .doc, .JPG or .ppt files; (2) .SQL (SQL script files) and plain text files (i.e. .txt files;) will be preferred for all other submissions. Please don't use other file formats.
- MySQL:
- We will use the DBMS MySQL for the project. You should start
getting familiar with MySQL via their
web page.
A good place to start is the
tutorial.
- An account for using MySQL on the CS department machines has
already been made for you. Your account name is your cs user name.
The initial password is your UIN. (Note: if you already had a
MySQL account, your password has also been reset, but your data
should be untouched.) Instructions for changing your MySQL password
are here.
If you have problems with your MySQL account, please contact
the CS Helpdesk.
Back to beginning
Due: Tuesday, Sep 18, 2007, 12:45 PM.
What to turn in:
- (1 pt) Written description of database application (File Name: description.txt)
- (4 pts) E/R diagram (File Name: ERdiagram.doc or ERdiagram.ppt or ERdiagram.pdf)
- NOTE: In order for students to understand database better after this project,
the database schema should be complex enough. The course project should contain at least FIVE
nontrivial entity sets and each entity set should have at least THREE nontrivial attributes.
Your first step is to identify the domain you would like to manage
with your database, and to construct an Entity-Relationship diagram
for the database. We suggest that you pick an application that you
will enjoy working with, since you'll be stuck with it for the whole
semester! It's especially nice if you pick an application where you
can populate your database using real, as opposed to fabricated,
data. As the project progresses, you'll end up creating two actual
databases, a small one (10's of entities/relationships) and a large
one (1000's or 10,000's of entities/relationships). Many students find
that it only makes sense for their small database to contain realistic
data, while the large one is made up of synthetic (computer-generated
fake) data. If you have an application where you can get a large
amount of real data, all the better, but it's not necessary.
Try to pick an application with a schema that is relatively
substantial, but not too enormous. For example, your E/R design should
have in the range of six or so entity sets, and a similar number of
relationship sets. This is a ballpark figure only - shooting for
somewhere between 5 and 10 is fine - you'll sense if your design is
too simple or too complex. You should certainly include different
kinds of relationships (e.g., many-one, many-many) and different kinds
of data (strings, integers, etc.), but your application need not
necessarily require advanced features such as weak entity sets, "is-a"
relationships, or roles.
- Write a short description (no more than one page) of the database application you propose to work with throughout the course. Your description should be brief, relatively complete and clear. If
there are any unique or particularly difficult aspects of your
proposed application, please point them out. This part is worth 1 point and your description will be
graded on suitability and conciseness.
- Specify an E/R diagram for your proposed database. Don't forget to
underline key attributes for entity sets and include arrowheads
indicating the multiplicity of relationship sets. If there are weak
entity sets or "is-a" relationships, make sure to notate them
appropriately. This part is worth 4 points. Points will be subtracted for
poor or wrong design, wrong shapes or arrows, missing or wrong key attributes, wrong relationships, not complex enough, etc.
If you're having trouble thinking of an application, take a look at
any Web shopping site. They all have a similar theme: products,
customers, orders, shopping baskets, etc., and typically make for an
interesting and appropriately sized application. If you're still
having trouble, or if you're unsure whether your proposed application
is appropriate, please feel free to consult with the TA. In
fact, to alleviate problems, we are encouraging all students to visit
office hours to consult with the TA on your chosen
application and E/R design.
Coming up with a good design now will
pay off greatly as the project progresses!!!
MAKE SURE TO SAVE A COPY OF YOUR E/R DIAGRAM - YOU WILL NEED
IT FOR SUBSEQUENT PROJECT PARTS.
Back to beginning
Due: Tuesday, Oct. 2, 2007, 12:45 PM.
What to turn in:
- (1 pt) E/R diagram for your database (File Name: ERdiagram.doc or ERdiagram.ppt or ERdiagram.pdf)
NOTE: You new E/R diagram MUST incorporate any feedback
you received from the TA or Dr. Welch.
Otherwise, 0 points will be given for this part.
- (4 pts) Schemas for all relations in your database, with keys capitalized
(File Name: schema.txt)
- if you changed your E/R diagram, explain which parts in the E/R diagram
are changed and the reasons
- for each relation, list the completely nontrivial functional
dependencies
In this second part of the project, you will produce a relational
schema from the entity- relationship diagram you came up with in
Part 1.
- Please attach a copy of your E/R diagram from Part 1. If you
would like to make changes to your original E/R diagram at this point
(due to staff feedback or any other reason), you may do so. The new
E/R design will be used as a basis for grading
part 2.
- Using the method for translating an E/R diagram to relations,
produce a set of relations for your database design. As usual, please
be sure to capitalize key attributes in your relations.
- For each relation in your schema, specify a set of completely
nontrivial functional dependencies for the relation. Any functional
dependencies that actually hold in the real- world scenario that
you're modeling should be specified, or should follow from the
specified dependencies. Don't worry if you find that some of your
relations have no nontrivial functional dependencies.
- Is each relation in your schema in Boyce-Codd Normal Form (BCNF)
with respect to the functional dependencies you specified? If not,
decompose the relation into smaller relations so that each relation is
in BCNF. Be sure to capitalize key attributes in your new
relations. Don't worry if you don't have any BCNF violations - many
PDAs will not have any.
- Are there any nontrivial multivalued dependencies that hold on
any of the relations in your schema? (You needn't consider MVD's that
are also functional dependencies.) If so, specify the multivalued
dependencies, then decompose the relations into smaller ones so that
each one is in Fourth Normal Form (4NF). Be sure to underline key
attributes in your new relations. Don't worry if you don't have any
4NF violations - most PDAs will not have any.
- Now that you've decomposed your relations as far as possible, are
there any relations that could be combined without introducing
redundancy (i.e., without creating BCNF or 4NF violations)? If so,
combine them.
- Is there anything you still don't like about the schema (e.g.,
attribute names, relation structure, etc.)? If so, modify the
relational schema to something you prefer. You will be working with
this schema quite a bit, so it's worth spending some time now to make
sure you're happy with it.
ONCE AGAIN, SAVE A COPY OF YOUR FINAL RELATIONAL SCHEMA (ITEM 7 ABOVE),
AS IT WILL BE NEEDED FOR SUBSEQUENT PROJECT PARTS.
Back to beginning
Due: Tuesday, Oct. 23, 2007, 12:45 PM. (*new due date*)
What to turn in:
- (2 pts) File 1: mySQLcommands.sql
- SQL commands for creating each table in you PDA
- SQL commands for inserting 5 records into each table
- SQL commands for showing the content of each table (SELECT * FROM table_name)
- (1 pt) File 2: mylog.txt
- In this file, collect all screen output results while the above SQL commands are executed such that we can
see how your all SQL commands were executed
- (2 pts) File 2: myBigDB.txt
- In this file, explain how you obtained the data for your
big database. If you got it from somewhere, give your source.
If you created synthetic data, include the source code of your
program. The point is to convince us that you have actually
done item 4 below.
In this part of the project, you will create a relational schema for
your PDA in the MySQL database system, and you will populate the
tables in your database with initial data sets.
- Familiarize yourself with the MySQL relational DBMS by reading the
on-line documentation (referenced above under logistics),
logging into MySQL, trying some of the examples in the document, and
experimenting with the various commands. You don't need to turn
anything in for this part.
- Create relations for your PDA based on your final relational
schema from Part 2. Use the
CREATE TABLE command to specify each
relation, its attributes and attribute types.
If you have an attribute that
represents a date and/or time, you may want to look at
this page.
-
Turn in a script log
showing a MySQL session in which your relations are created
successfully, as described next.
For each relation in your PDA, create an execution script file
and a few (approximately 5-10) records of
"realistic" data. Then execute the script file from the mysql command
line using the \T option. Turn in a listing showing the contents of
the files you created, the successful loading of the data into MySQL,
and the execution of "SELECT *" commands to show the contents of each
relation.
- Write a program in any programming language you like that creates
large files of records for each of your PDA relations. If you have
available real data for your PDA, then your program will need to
transform the data into files of records conforming to your PDA schema
and to MySQL's load format. The rest of you will need to write a
program to fabricate data: your program will generate either random or
nonrandom (e.g., sequential) records conforming to your schema. Note
that it is both fine and expected for your data values, strings
especially, to be meaningless gibberish. The point of generating
large amounts of data is so that you can experiment with a database of
realistic size, rather than the small "toy" databases often used in
classes. The data you generate and load should be on the order of:
- At least two relations with thousands of tuples
- At least one relation with hundreds of tuples
If your application naturally includes relations that are expected to
be relatively small (e.g., schools within a university), then it is
fine to use some small relations, but please ensure that you have
relations of the sizes prescribed above as well. When writing a
program to fabricate data, there are two important points to keep in
mind:
- Make sure not to generate duplicate values for key attributes.
- Your PDA almost certainly includes relations that are expected
to join with each other. For example, you may have a Student relation
with attribute courseNum that's expected to join with attribute number
in relation Course. When generating data, be sure to generate values
that actually do join - otherwise all of your interesting queries will
have empty results! There are a couple of ways to properly generate
joining values. One way is to generate records for multiple relations
(e.g., Course and Student) at the same time. Another way is to
generate the records for one relation first, and then use the joining
values for the other relation. For example, you could generate records
for relation Course first, then use the Course.number values when
creating values for Student.courseNum.
Back to beginning
Due: Thursday, Nov. 8, 2007, 12:45 PM (*even newer due date*)
What to turn in:
- (6 pts) File 1: queryUpdateIndex.sql -- put all SQL commands in this file.
- At least ten SQL data retrieval (select) commands
- three general queries
- two "join" queries (related with three tables)
- one "union" query
- one "group by" query
- one "order by" query
- one "DISTINCT" query
- one "Aggregate" query
- At least six SQL data modification commands
- two update commands
- two deletion commands
- two insertion commands
- Create at least three useful indexes for your PDA
- (1 pt) File 2: queryUpdateIndexLog.txt
- a log file showing a MySQL session in which your SQL commands
are run successfully on both your small and large databases
- (3 pts) File 3: benchmark.txt
- Run all above SQL commands with and without indexes, record running time, and compare the differenece between
with and without indexes.
In this part of the project, you will issue SQL queries and updates
against your PDA database, and you will experiment with the use of
indexes. Since you will be modifying your data as part of this
assignment, we strongly suggest that you adopt a routine for getting
repeated "fresh" starts with MySQL.
You need to save and keep all SQL commands such as creating/dropping database,
creating tables, inserting real data and/or synthetic data, querying tuples,
updating tuples, etc., in script files.
If needed, you just can run these script files again in order to re-setup
your database.
(a) Queries and Updates
Please note:
- For this assignment you will be invoking your SQL commands
interactively through mysql. Of course you should certainly build a
script file, rather than typing in the queries each time you run them.
- Please write "interesting" queries, in particular,
- three general queries with at least two conditions
(For example, SELECT t1.c1, t2.c2 FROM t1, t2 WHERE t1.c1 > 10 and
t2.name = `Texas`),
- two "join" queries (related with three tables),
- one "union" query,
- one "group by" query,
- one "order by" query,
- one "DISTINCT" query, and
- one "Aggregate" query).
You should try to use most or all of the SQL constructs discussed in class and in the textbook.
You will not receive full credit if your queries and modifications are all extremely
simple.
- We suggest that you experiment with your SQL commands on your
small hand-created database before running them on the large database
for which you generated data. Initial debugging is much easier when
you're operating on small amounts of data. Once you're confident that
your commands are working, try them on your large database. We do
expect that the commands you turn in work properly on both your small
and large databases.
- If you discover that most or all of your "interesting" queries
return an empty answer on your large database, check whether you
followed the instructions in Part 3 for generating data values
that join properly. You may need to modify your data generator.
- Turn in a copy of all of your SQL commands, along with a script
illustrating their execution on both your small and large
databases. Your script should be sufficient to convince us that your
commands run successfully, but you can and should truncate query
results after a few lines. Please do not turn in query results that
are hundreds of lines long.
(b) Indexes
In Part 4 (a),
you may have discovered that some queries run very slowly
over your large database. As discussed in class, an important
technique for improving the performance of queries is to create
indexes. An index on an attribute A of relation R allows the database
to quickly find all tuples in R with a given value for attribute A
(which is useful when evaluating selection or join conditions
involving attribute A). An index can be created on any attribute of
any relation, or on several attributes combined.
See this
page for more information on how to create an index in MySQL.
- Create at least three useful indexes for your PDA. Run your queries
from part 4(a) on your large database with the indexes and without the
indexes. Turn in a script showing your commands to create indexes, and
showing the relative times of query execution with and without
indexes. Here too, please truncate any large query results.
- You can use an execution script file that creates your indexes to ease
your testing. Use the \T command (Set outfile [to_outfile]. Append
everything into given outfile) to call your queries script file both
before and after you create the indexes. Surround the calls to
execute the queries files with lines that execute time functions in
MySQL; so you can measure elapsed time. You can then easily calculate
elapsed time manually and add it to your script log as a comment. You
can note your added comments by enclosing them with /*...*/. For an
added challenge you can try to get your execution script file to
calculate the elapsed time and display it for you.
Please note:
- MySQL automatically creates indexes for attributes declared as primary
keys.
- Actual timings will be affected by external factors such as system
load. However, for some of your queries, with appropriate indexes you
should see a consistent dramatic difference between the execution
times with indexes and the times without. If others of your queries do
not show performance improvement even when relevant indexes are
created, please include a short note suggesting why this might be the
case. The plans described by the EXPLAIN should be helpful in
understanding why things take as they do.
Back to beginning
Due: Tuesday, Dec. 4, 12:45 PM (*newest due date*)
What to turn in
- (1 pt) File 1: explanation.txt -- written explanation of how your program works. This is not line-by-line
commenting, but a high level global idea of what is going on in your
program.
- (4 pts) commented source code for your program. You will be
graded on your programming style (good design, modularity, good variable
names, documentation, etc.) as well as correctness.
Five-minute DEMO (5 pts)
You will show the following things to TA in a lab on the second floor in HRBB building.
- Start PuTTY (a free SSH Client, already on computers in open labs), login to your computer science account, connect to database.cs.tamu.edu, show what is in your database before your application is run. Delete
all tables from the database your application will interact with. Here PuTTY is used to monitor whether your application can save data into, retrieve data from and modify the database correctly.
- Run your embedded SQL based application and do the following operations:
connecting to a database, populating your database, querying, inserting, deleting and updating. You will be asked to use PuTTY to show the states of your database before and after your operations while your application is running.
- If only part of your application works, you will receive
appropriate partial credit.
In this part of the project, you will
interact with your PDA database from an external program. Your task
is to build a moderately user-friendly interactive application program
front end to your PDA using the Java programming language. Your
program should consist of a continuous loop in which:
- A list of at least five alternative options is offered to the user.
(An additional alternative should be quit.)
- The user selects an alternative.
- The system prompts the user for appropriate input values.
- The system accesses the database to perform the appropriate queries
and/or modifications.
- Data or an appropriate acknowledgment is returned to the user.
You should include both queries and modifications among your
options.
The user should not be aware of the database schema and is
not to simply enter SQL commands.
Both input and output should be in a format more convenient and
pleasing than raw interactive SQL.
As in Part 4, please include some
"interesting" queries or modifications, i.e., operations that require
some of the more complex SQL constructs such as subqueries,
aggregates, set operators, etc.
As a general example, if your PDA is a campus applicant database, then
your interface might include in its menu a number of useful queries on
the database, with some queries performing
statistical analysis requiring multiple levels of grouping, and other
queries being simpler.
Your application code should interact with the database using the JDBC
call-level interface for Java programs.
You can refer to following links.
We are not expecting anything
particularly fancy in terms of the interface itself. For example, in
Java a menu printed via print function is fine. Also, handling of SQL
errors can be quite simple. You can write a routine that just prints
the error message from MySQL, or model your error handler after a
sample program.
Back to beginning