×

Activity Summary

If you’re accessing this activity directly, did you know there are nine other activities in this series up on our website? Visit our artificial intelligence page here to see a breakdown of the activities and our recommended order to complete them in! Also, these activities introduce AI concepts and terminology. If you find yourself unfamiliar with any of the words in this activity, the landing page also has a glossary of AI terms. Happy space-station-fixing!

You and your group mates are astronauts and scientists aboard the Actua Orbital Station. Unfortunately, your station just got bombarded by magnetic rays and your electronics have begun to shut down! The only one who can save you is the station’s AI, DANN. DANN stands for Dedicated Actua Neural Network, and it’s gone a little loopy. Brush up on your technical skills, learn about AI, and save yourself and your crewmates!

Now that we’ve learned the basics of AI in “Introduction: What is AI?”, we can begin to fix DANN, and we can start with our scanner! The Actua Orbital Station has a large scanner to monitor and track space objects, both near and far. It looks like after the damage from the magnetic rays, the space object classifier was reset. As one of your repairs aboard the station, Mission Control has asked that you propose and evaluate a decision tree so that you can bring the space object classifier back online. Once we do that, we can move on to studying an experiment in “Regression Analysis: Making Predictions using Data”.

In this activity, Participants will create and evaluate decision trees. Decision trees are an approach to sorting objects or data into different types by asking questions about them. Participants will develop different questions for a decision tree by looking at an example dataset. They will then test how well their decision tree works by seeing if it can correctly label a testing dataset.

Activity Procedure

Opening hook: Twenty Questions

Twenty Questions is a game where you try to guess an object or animal by asking a series of questions. Each question that you ask is used to narrow down what the object or animal might be. With a partner or in small groups:

  1. Designate one group member as the “answerer”. This person is responsible for answering the questions that are asked and can only “yes” or “no”. 
  2. The answerer must secretly choose an item from the list below.
    Airplane

    Bear

    Cell phone

    Desk

    Eagle

    The Moon

    Pigeon

    School bus

    Space shuttle

    Truck

    Turtle

    Rock

  3. Everyone else in the group must take turns asking questions to try to figure out what the chosen item is. Only questions that can be answered with “yes” or “no” can be asked.
  4. Have someone record each question that is asked.
  5. You can ask up to 20 questions.
  6. If you successfully guess the item, choose a new answerer and play again (go back to step 2). Make sure that everyone has a chance to be the answerer before continuing.
  7. Once you’re done playing, take a look at the questions that have been asked:
    • Do any of the questions apply to more than one list item?
    • How can you use the questions you have to divide the list items into two groups? What about three groups? Four groups? 

 

Activity 1: Making decisions with trees

Imagine you have a pile of different objects or things that have all been mixed together. Decision trees are a way to un-mix things (e.g. animals, objects) and divide them into groups. These groups are sometimes called labels or classes. You can  do this by asking questions about features, also known as the details that can be used to identify objects. The goal of a decision tree is to make the different types of objects in your pile as separate as possible.
To learn about decision trees, let’s work through an example together. We’ll be trying to divide up the celestial objects below.

Features Label Name (extra)
Structure Orbit
Solid Planet Moon Europa
Solid Planet Moon Triton
Gaseous Sun Planet (gas giant) Jupiter
Gaseous Sun Planet (gas giant) Uranus
Solid Sun Planet (terrestrial) Venus
Solid Sun Planet (terrestrial) Earth

 

  1. Look at the table above. How many different types of objects are there? (Hint: look at the label column)
    • There are three different types of objects: moons, and two types of planets (gas giants and solid/terrestrial).
  2. We need to use the information in the table to come up with questions that identify each type of object. Look at the “structure” feature and take a moment to formulate the first question that you think we should ask to start sorting our data.
    • The question should be something like, “Is the structure of the object gaseous?”
  3. We can also start drawing the beginning of our decision tree:
    • Draw the top of the tree. This is called the “root node”. Write the first question beside it.
    • Draw two lines coming out of the root node. These are called “branches” and there should be one for each possible answer to the node (“yes” and “no”). Label the branches.
    • Look at each item in the table above and sort each item using the first question (i.e. ask the question and put the item in either the “yes” group or the “no” based on the answer).
    • Once you’re done sorting the items based on the first question, look at the groups that have been made. Do any of the groups have only one type of object in them?
  4. You should have a group that only contains gas giants. This is good, because it takes care of one of the types of objects that we started with.
  5. You may notice that there’s a bit of extra text included: “Classify as ‘planet (gas giant)’ 100%”. What do you think this text means? Why do you think the 100% means?
    • This marks a decision to stop asking any more questions, since we think the items are as separate as possible. We instead choose to “classify” them, i.e. put a label on them.
    • This 100% represents how confident we are  in our decision. Since there are only gas giants in this category, we can have full confidence in our decision.
  6. The other group still has more than one type of object in it. We can try posing more questions to separate it further. What do you think we should ask for the next question? (Hint: look at the table and think about which features we’ve already asked about and what remains to be asked)
    • The question should be something like, “Does the object orbit around the Sun?”
  7. Follow the same steps as above to continue drawing your tree.
    • Look at each item that was in your second group and sort each item using your second question.
    • Once you’re done sorting the items based on the second question, look at the groups that have been made. Do any of the groups have only one type of object in them? If so, add decisions to them like we did for the gas giants.

 

 

 

 

 

 

The example above is notable because after two questions, we have a strategy for separating our objects into the three classes that we had defined in a way that makes sure they’re not mixed with any other classes (which is why we can have 100% confidence in each decision). This is not always the case. It’s possible to have data that you can’t fully separate.

What if we included a new object? Include Juno (solid, orbit: Sun), an asteroid, in our objects and then try to separate it out. How does this change our tree? Our last tree might look instead like:

 

Notice how our confidence in our classification decision has changed:

  • Since only 2 out of 3 objects in this group are a planet, we only have a 66% chance that we’re correctly classifying an object as a terrestrial planet in this category.
  • Since 1 out of 3 objects in this group is an asteroid, there’s a 33% chance that an object in this category should be classified as an asteroid.

 

Activity 2: Classifying Space Objects

Examining the space objects dataset

Now let’s take a look at the space objects dataset that we will be working with. This dataset has been generated to resemble the data that the station will use. The space objects dataset contains five labelled types of celestial objects (asteroid, comet, junk, meteoroid, satellite), and has three features for each object:

  • material composition, either organic (containing water or other organic compounds) or inorganic (metal and rocks)
  • size, measured as the approximate diameter of the object (in metres)
  • distance, as either local (inside or near the Earth’s orbit) or far (well outside Earth’s orbit)

There are two subsets within this dataset:

  1. a training dataset, which you will use to build your decision tree
  2. a testing dataset, which you can use to test your decision tree

You will use the training dataset to create the questions that you will use to separate the data, then test to see how accurately your tree performs using your testing dataset. Both sets are included as appendices.

Creating your decision tree

The goal of a decision tree is to separate data into distinct groups (as much as possible—perfect separation may not be possible) based on known, comparable features. This separation is achieved by asking questions of the data that results in it being split. The space objects dataset has three features that we can use.

Two of these features, “material composition” and “distance”, are of a type that we dealt with in the previous example: they are categorical data. This means that they can only take a specific, limited range of values. The third feature, “size”, is continuous data, meaning that it can be any number. For each of the categories of space objects in this activity, though, size will be within a certain range. With this in mind, { individually / in small groups / as a large group}, complete the tasks below.

Task 1: Categorical features
  1. Develop questions based on the categorical features. If you get stuck, think back to how questions were used in the example. “Is the structure of the object solid?” Adapt those questions to the features of the space object dataset.
  2. Use the questions that you developed to draw a decision tree. This means choosing which questions to ask and in what order you want to ask them.
  3. Apply your decision tree to the training dataset, noting at each step the composition of the groups and at each step along the way and the probabilities for the final classification decisions (as the fraction/percentage of each object type in the group).
Task 2: Categorical and continuous features
  1. Now, develop questions based on the continuous data feature. If you get stuck, here’s a hint for how to deal with continuous data: think about the range of sizes (minimum and maximum) for each object and how you might be able to use this to separate them. Are some objects obviously bigger or smaller than others?
  2. Create a new decision tree that combines both types of features.
  3. Apply your decision tree to the training dataset, noting at each step the composition of the groups and at each step along the way and the probabilities for the final classification decisions (as the fraction/percentage of each object type in the group).

Activity 3: Evaluating a tree

After you’ve developed your decision tree, you can use the testing dataset to see how well it works. To do this:

  1. For each object in the testing dataset, apply your decision tree to see where it would be classified. Keep track of these outcomes, since you will need them in the next step.
  2. Once you have sorted each row of the testing dataset into the leaf nodes of your tree, score the outcome for each leaf by multiplying the number of objects by their classification confidence (the probability of being right). For example:
    • If you have 5 asteroids at a leaf node that has a 100% confidence for the decision “Classify as asteroid”,
      5 × 100% = 5 × 1 = 5.
    • If you have 3 comets and 2 meteoroids at a leaf node that has a 60% confidence for the decision “Classify as comet” and a 40% confidence for the decision “Classify as meteoroid”,
      (3 × 60%) + (2 × 40%) = (3 × 0.6) + (2 × 0.4) = 2.6.
  3. Add up the scores across all of your leaf nodes. This number should not exceed the number of objects in the testing dataset. The highest possible value, in this case, should be 10, but that assumes that the data is perfectly sorted, which may not be the case.
  4. If you have time, try this testing process on both of your trees and compare your results. 

 

Appendix A: Background Information

Decision trees

A decision tree is structured as an upside-down tree. The top of a decision tree is called the “root” or “root node”. Questions are asked at nodes, so this is where the first question is asked. From the root, there are two branches, each representing an answer to the question, and at the end of each of these branches is another node. When there are no more questions to ask at a node, no branches get added and that node is called a “leaf”. While the decision tree questions that are asked can have multiple answers, most often they are binary, i.e. there are only two possible answers that the data can fit into.

Reflection & Debrief

Reflection and Debrief

Having tested your decision tree, compare your scores and trees in groups or as a class. A higher score indicates a tree that should be more effective at sorting data. Consider the following questions:

  • What approaches were used to decide on which questions should be asked?
  • How did you determine the order to ask your questions in?
  • How many layers/levels did your tree have? Can you see any connection between the number of layers/levels and a tree’s testing score?
  • Did you ever repeat a question?

Considering both the example and the task dataset,

  • What impact does training data have on the structure and design of a decision tree?
  • What impact do the data features have on the structure and design of a decision tree?
  • How can you prove that your decision tree is as effective as possible?

Decision trees are just one example of a strategy that can be used to classify or sort data. Data, however, isn’t always available in easily readable formats like tables and spreadsheets. In subsequent activities, you will look at other ways that classification can occur, including using machine vision and listening to recognize specific images and sounds.

Extensions & Modifications

Extensions

Expand the example (planets and moon in the solar system) dataset
  • The example was done with only a small subset of the possible objects and the categories in our solar system. Instead of using the generated dataset, have students work together to build a new set of objects.
Optimizing the decision tree
  • The scoring method provided in Activity 3 works to give a general sense of how good a tree performs and it should let you compare different trees to each other. Comparing the trees made by different groups, what trees performed the best?
  • Is there a relationship between the number of layers (height) of a tree and how well it performed?

Modifications

Simplify the dataset
  • The dataset for the activity is designed so that it can’t be perfectly separated (“junk” and “satellite” have overlapping size ranges and are both “local” and “inorg”). If you think that this would be confusing, you can remove one of those categories or instruct students to create a “junk or satellite” category (e.g. “human-made objects).
Generate new or different data
  • The space objects dataset is generated using a small program written in Python, found here. There is a configuration section at the top where you can adjust the parameters used to generate a dataset.
  • If you are a Google Sheets user, you can copy and paste the output from the data generator into a Sheets document and use the “Data > Split text to columns” menu option to have the data formatted appropriate in your spreadsheet.

Downloads

This website uses cookies to ensure you get the best user experience. By continuing to use this website, you consent to our use of cookies.
Accept