As data scientists, we work in concert with other members of an organization with the goal of making better decisions. This often involves finding trends and anomalies in historical data to guide future action. But in some cases, the best aid to decision-making is less about finding “the answer” in the data and more about developing a deeper understanding of the underlying problem. In this post we will focus another tool that is often overlooked: interactive simulations through the means of agent based modeling.
Agent based modeling involves the description of individual agents that interact with each other within an environment and seeing how their behaviours combine to produce macro-level system behaviours.
Agents can be modeled at whatever level seems natural to our understanding of the system: individual humans, client cohorts, departments, competing firms, computer programs or similar entities can all be agents. The environment may contain constraints or resources that influence and/or are influenced by the agents’ behaviors. The agents may interact directly or indirectly via the environment.
This conceptual framework allows you to model knowledge or assumptions at the smallest scale, where they can often be reasoned about more intuitively, rather than at the aggregate level. It also allows you to construct models that may not faithfully represent the system as it currently exists (e.g., more or fewer agents, different boundary conditions), but let you reason about the system and explore its complexities and possible outcomes.1
Agent based modeling is one of many possible frameworks for interactive simulation, but its particular benefits map well to business analysis and development at young, growing companies:
Let’s start with a classic example that shows complex macro-level behavior emerging from a simple set of agents and some basic interactions between them. As we will discuss more later, since the focus is on helping users understand the system dynamics, we put a lot of weight on trying to make the user interaction simple, enjoyable and enlightening.
In this simulation, each circle represents an agent with a location and a direction. The agents are initialized randomly. At each timestep, each agent slightly modifies its direction according to the directions of those around it, according to the algorithm below.
(Note that the environment wraps around, so circles disappearing on one side reappear on the other.)
# direction modification at each timestep (translated to Python for readability) def update_direction (agent, other_agents): for o in other_agents: if (agent['id'] != o['id']) and (agent['color'] == o['color']): dist = ((agent['x'] - o['x'])**2) + (agent['y'] - o['y'])**2))**0.5 weight = 0.00005 / (dist**2) delta_angle = o['angle'] - agent['angle'] agent['angle'] += max(-5, min(5, delta_angle * weight))
This next example highlights the ability of agent based models to show links between significant macro-level patterns and the often subtle preferences that guide micro-level behaviors of individuals. The nature of these links may not be intuitive; interactive simulation helps to clarify our understanding of the system dynamics. This example also provides a good opportunity to discuss the coupling of interactive simulations with the results of offline analysis with the same model.
The following simulation is an implementation of Schelling’s segregation model from 1971, which was originally carried out manually with a checkerboard. This model illustrates how urban segregation can happen even with fairly limited preference for people to live beside similar people.
# decision made by each agent at each timestep def to_stay_or_move (agent, direct_neighbors, open_spaces): n = sum([ d['color'] == agent['color'] for d in direct_neighbors]) ratio_neighbours_same_as_me = n / float(len(direct_neighbors)) if ratio_neighbors_same_as_me < threshold: agent['location'] = random.sample(open_spaces,0)
As a complement to these interactive simulations, we can also provide decision makers with outputs more akin to the traditional analytical outputs. The models and their outputs can be made subject to many of the same interrogation techniques applied to other models used in data science, including Monte Carlo analysis and various sorts of sensitivity analysis and optimization methods.
In addition to the various interrogation techniques that can be used with the models, agent based modeling also overlaps with other existing data science workflows in that agent based models can (and often should) use very detailed boundary conditions, initial state values and parameter values derived from (or inferred from) historical data.
As such, we feel that agent based modeling fits within our remit as data scientists. But how does a data scientist (or data science group) get started with it and integrate it into their toolkit?
There is a plethora of existing tools for agent based modeling. Among the most popular are NetLogo, Repast, MASON and AnyLogic, the first three of which are free. Depending on your modeling needs, one of these tools may be suitable. You may also want to consider tools for “multi-agent modeling” from other fields such as robotics (e.g., Ptolemy II).
But for the data science community, it would be nice to have something in R or Python so that we could more easily integrate it with our existing workflows. For this post we will focus on Python based libraries as I’m biased towards Python :). The mesa project is worth considering (particularly if you have already made the switch to Python 3). You may also want to look at the more general simulation library SimPy.
Building in Python
Many large agent based models are custom-built to suit the problem and platform, without the use of existing tools or libraries. And even if you do choose one of the existing tools above for your project(s), it is still a worthwhile exercise to build some models from scratch in Python. The concept of agent based modeling maps very well to object-oriented programming: types of agents are classes, individual agents are objects which can have states (object variables) and response methods (object methods).
You can find a detailed tutorial on using Python to build a Schelling segregation model here. Below we present a more minimal agent based model for demonstration.
# agent definitions class PingPongAgent(): def __init__(self, sound): self.sound = sound def respond(self,msg): if msg == 'hit_it': return self.sound return None # model definition class PingPongModel(): def __init__(self): self.current_turn = 0 self.agents =  for agent_sound in ['ping','pong']: self.agents.append(PingPongAgent(agent_sound)) def simulate_timestep(self): output = self.agents[self.current_turn].respond('hit_it') self.current_turn += 1 if self.current_turn >= len(self.agents): self.current_turn = 0 return output
>>> model1 = PingPongModel() >>> for i in range(5): ... print model1.simulate_timestep() ping pong ping pong ping
A further refinement on this simple example is to use the multiprocessing package from the standard Python library. This is an essential step in moving toward distributed computing for models with hundreds, thousands or even millions of agents.
import multiprocessing import random random.seed(0) # agent definitions class PingPongAgent(multiprocessing.Process): def __init__(self, msg_pipe, sound): super(PingPongAgent, self).__init__() self.msg_pipe = msg_pipe self.sound = sound def run(self): running = True while running: msg = self.msg_pipe.recv() if msg == 'stop': running = False if msg == 'hit_it': self.msg_pipe.send(self.sound) # model definition class PingPongModel(): def __init__(self): self.current_turn = 0 self.agents =  self.msg_pipes_to_agents =  for agent_sound in ['ping','pong','punt','pass','play']: parent_conn, child_conn = multiprocessing.Pipe() self.msg_pipes_to_agents.append(parent_conn) p = PingPongAgent(child_conn, agent_sound) self.agents.append(p) p.start() def simulate_timestep(self): this_turn = random.sample(range(len(self.agents)), int(random.uniform(0,len(self.agents)))) for i in this_turn: self.msg_pipes_to_agents[i].send('hit_it') output =  for i in this_turn: output.append( self.msg_pipes_to_agents[i].recv() ) return output def terminate_simulation(self): for i,a in enumerate(self.agents): self.msg_pipes_to_agents[i].send('stop') self.msg_pipes_to_agents[i].close() a.join()
>>> model1 = PingPongModel() >>> for i in range(5): ... print model1.simulate_timestep() ['pass', 'pong', 'ping', 'play'] ['pass', 'pong'] ['punt', 'pass'] ['pong', 'pass'] ['pong', 'pass', 'punt'] >>> model1.terminate_simulation()
Watch out for the overhead imposed by process proliferation: you will likely want to group agents within processes as the number of agents increases. Further challenges await if you move to a distributed environment. And for speed, you may also want to rewrite ssome core components in a lower-level compiled language, use numba, or break from agent based orthodoxy in some places by using numpy arrays instead of sets of agent objects. But you can probably see a general path from here, or at least by noodling through some code along these lines you will be better informed about as you choose from the tools noted above.
The goal is to improve decision makers’ understanding of the dynamics of a problem. Easily accessible, beautiful and intuitive model interfaces go a long way in this direction.
The NetLogo interface has been a major part of that tool’s popularity. (The interactive simulation interfaces shown above are heavily influenced by it.) Laying out the interface of simple sliders, toggles, buttons and real-time output displays is an essential part of constructing any NetLogo model. NetLogo Web allows you to play with NetLogo models directly in the browser. So if you are willing to learn the Logo language, find the interface suitable, and your models are not too large, then this may be a good complete solution for you.
Agent based modeling is often overlooked as a potential tool in the data science community. But it does append itself nicely to other techniques and languages used by data scientists. Through the use of interactive simulation it can aid decision makers to better understand the problem or system under consideration, which ultimately extends the ways in which data science can help us to make better decisions.
1 Perhaps its greatest potential as an analytical modeling tool (independent of interactive simulation) is in risk assessment, as noted in this Economist article on the use of agent based modeling for financial system analysis. ←