DAG Taxonomy

I have decided to create a taxonomy database library based on direct acyclic graph(DAG), actually it is based on polytree. Like the tree taxonomy, except allow one node to have more than one direct parent node.
Like this structure, that can't be constructed from tree based taxonomy.

A hash table storing nodes. Indices are integers.
The node stores three sets of integers, the indices of direct parent nodes, the indices of direct child nodes, and the items.
The items are integers only, each item refers to the item wish to be classified. Those items can be thought as ids. When the database return items, those items can be used in a SQL query to request the content of items.

These are specs of the database, and all the possible operations:

  1. Return all items directly under a node.
  2. Return all items under a node or it's child nodes.
  3. Return relationship from node A to node B.(parent, child or neither)
  4. Return all items under a complex query. For example, items that are classified as part of node A but not node B
  5. Insertion of new node
  6. deletion of nodes
  7. Find the smallest graph that contains some specific nodes as it's child node. Useful when there are things having the same name but different meaning. So with this process, it can return stuff like Graph(Graph Theory) and Graph(Plotting Functions)
  8. Move a node from one place to another.(supply old parent and new parents, or supply a entirely new node)
  9. Save/Load the database to/from a file
  10. Normalization process of the database:
    1. If an node has an direct item that is also an direct item of any of it's child node. Remove the item from node.
    2. If an node has a direct parent node P1 and P2, if P1 is a child node of P2, then remove P2 from the direct parent node
    3. If an node has a direct child node C1 and C2, if C1 is a parent node of C2, then remove C2 from the direct parent node
    4. If nodes refer to nodes that doesn't exist, remove that reference
    5. Each time, insertion, alter, and deletion of nodes and items are operated, normalization operation starts, so the database is always normalized

In the end the database looks like this:

There will be a separated database handling converting integers ids into it's respectable names.

Slowly progressing. I only need to finish it by the time I start building a math problem database. C++ is hard.


Post new comment

The content of this field is kept private and will not be shown publicly.
If you have a Gravatar account, used to display your avatar.
  • Allowed HTML tags: <img> <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <span> <fn>
  • Lines and paragraphs break automatically.
  • Textual smileys will be replaced with graphical ones.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>. Beside the tag style "<foo>" it is also possible to use "[foo]".
  • Use <fn>...</fn> to insert automatically numbered footnotes.

More information about formatting options

What is 32 + 5?
To combat spam, please solve the math question above.
Honey Pot that kill bots