Introducing Svelto ECS 2.5

Svelto ECS so far…

As I wrote in my previous articles, Svelto.ECS wasn’t born just from the needs of a large team, but also as result of years of reasoning behind software engineering applied to game development (*) and that’s why the main Svelto.ECS goals are currently different than the ones set by Unity ECS. They may converge one day, but at the moment they are different enough to justify the on going development of Svelto.ECS. The biggest difference is that Svelto.ECS hasn’t been built just for performance. Performance gain is just one of the benefits in using Svelto.ECS, as ECS in general is a great way to write cache-friendly code, albeit the main reasons why Svelto.ECS has been written orbit around the shifting of paradigm from Object Oriented Programming, the consequent improvement of the code design and maintainability, the approachability by junior programmers that won’t need to worry too much about the architecture and can focus on the solution of the problems thanks to the rigid directions that the framework gives.

I haven’t used Unity ECS so far, although I have seen some examples, so I can’t give much feedback about it at this moment, but what I have learned from Svelto.ECS is the importance of defining the concept of Entity at code level to keep coders from abusing the pattern and fall in the trap to write spaghetti code again. Let’s be clear, there is a very fine line between writing sensible ECS code and writing plain C code with global accessible data. The anchor that should prevent the coder from lose his way is to see a Systems as a single-responsibility behavior of a specific set of entities and not as a container of generic logic using global accessible data.

All that said, let’s see what is new in Svelto.ECS 2.5.


introducing new struct friendly features in Svelto.ECS 2.5

Until now Svelto.ECS pushed the user to use objects over structs for normal use when high performance wasn’t required. Svelto.ECS 2.5 inverts this trend, making the use of structs actually the standard and the use of classes the exception. Let’s see how this is achieved:

As you know, an Entity is built through an EntityDescriptor and an optional set of Implementors. An EntityDescriptor can generate the so called “EntityViews” and “EntityStructs“. In my examples I didn’t use many EntityStructs, but with the newest version of Svelto I started to rewrite and change a lot of EntityViews into EntityStructs as you can see in the new Survival Example Code.

Implementor is a very interesting concept in Svelto.ECS. They are meant to act as a bridge between the underlying platform (like Unity) and the application engines. This is actually how testable engines, that don’t use any dependencies from the underlying platform, can be written with Svelto. However we used to use implementors to define Entity Components that were intrinsically independent by the platform too. With Svelto.ECS 2.5 Entity Components should preferably be defined as EntityStructs when they don’t need to act as a bridge with the underlying platform. EntityStructs are obviously mandatory to write cache-friendly code, but even when this kind of performance gain is not required, EntityStructs are still beneficial over EntityViews and Implementors because they allow to write allocation free code.

A good implementation detail is that the relation between Entity Components and Implementors is not 1:1. One single implementor object can be used to implement many entity component interfaces. The interface is the abstraction of the underlying platform functionalities that are actually implemented by the implementor. However EntityViews are classes and every time an Entity is built, a set of objects are allocated on the heap. Therefore, why not declaring EntityViews as structs as well? Previously it wasn’t possible because EntityViews references were extensively used in Engines code, stored inside user data structures. With Svelto.ECS 2.5 the use of custom data structures that hold EntityViews references is instead highly discouraged. For this reason, Svelto.ECS 2.5 introduces the new concept of EntityViewStruct, which is a mix between the two worlds. When EntityViewStructs are used, only the implementors (in Unity mainly  as Monobehaviours) are allocated on the heap.

Let’s recap everything:

  1. the use of EntityViews is still supported, but fundamentally obsolete.
  2. when Implementors are needed, EntityViewStructs should be used instead.
  3. EntitViewStructs should be used only when engines need to be abstracted from the underlying platform functionalities, although there are other exceptions. The Engines shouldn’t need know the implementation details behind the change of entity data (i.e.: the Survival demo code uses this design to wrap the Unity RigidBody functionalities, so that the Entity Components can return its parameters without exposing the RigidBody interface).
  4. EntityStructs should be used for all the other cases. When Monobehaviors don’t need to be used, entity components should always be seen as pure structs. EntityStructs come with the bonus to write way less boilerplate code (neither component interfaces nor implementors are needed).
  5. EntityViewStructs and EntityViews can hold only component interfaces.
  6. Component interfaces must declare only setter and getters of primitive or value types. The only exceptions are made by DispatchOnSet<T> and DispatchOnChange<T>
  7. EntityStructs can hold only primitives and value types (no collections allowed either).
  8. Custom data structures defined in engines with the purpose to hold EntityView references should never be used unless strictly necessary.

The main problem about using EntityStructs is that they don’t allow the same fine abstraction that EntityViewStructs do. This is because EntityViewStructs, being defined by the Engines, allow to re-interpret the same component data under different engines point of view, promoting abstraction and modularity. EntityStructs instead are defined by the Entity and the Engine can only use them without semantic reinterpretation. That’s why EntityStructs should be as modular as possible, although the definition of an EntityStruct could have more practical repercussions depending if a SoA or AoS implementation is preferred. Being able to give to the same data different meaning is the only reason why EntityViewStructs could be still preferred to EntityStructs when implementors are not abstraction of the underling platform. For example, an engine can have a read-only access to the same data that another engine can write into. EntityStructs instead are always mutable as making them immutable would defeat the purpose of improving performance.

Extending the Svelto graph, the EntityStructs are placed on the left side:

If there wasn’t any dependency with the underlying platform, it would be possible to write a game with EntityStructs only, however I am still not sure how this would impact the overall design.

Currently this is my thinking:

If there is no dependency to the underlying platform (i.e.: callbacks from Monobehaviours or some Unity related stuff), use EntityStructs as long as they are very abstracted (HealthEntityStruct is a good example). I don’t think there would be much space for very specialized EntityStructs. Implementors could make sense only as Monobehaviours and would store the essential features needed from the platform. EntityViewStructs should always be used in place of EntityViews when implementors are needed. The main question that I have to answer is: would the absolute use of EntityStructs over EntityViewStructs make the design worse? This is something I will analyse in future with our projects. Currently the nice thing about the EntityView(Structs) is that they describe Entities specialization in terms of behaviors running on the entity more than data. Normally, using ECS design, entities are specialized adding more data, however with Svelto EntityDescriptor and EntityViews is possible to read what behaviors will run on the Entity, example:

Since EntityView(Struct)s come with their relative Engines, an entity descriptor shows which behaviors (engines) will run on the entity. Using only EntityStruct I would loose this semantic, moving back to see Entities as a set of data only.


In-depth view of the new functionalities

I understand this is a lot to take in. It shouldn’t be if you know how to use Svelto already, although you will need to change your current way to use it. If you are new to it, the best thing is to follow the survival example, as the code is simpler than the theory.  Let’s start from our MainContext.

-EntityStructs-

Beside cleaning up some code, the most important difference is how I create the Player Entity. The Player Entity now has only EntityViewStructs defined with Implementors as Monobehaviors and EntityStructs for everything that don’t need to act as a bridge (precisely: HealthEntityStruct and PlayerInputDataStruct).

with:

and as example of EntityStructs:

you may notice how creating an Entity still looks quite readable! Especially if you don’t like the Svelto.ECS (minimal, I have to say), boilerplate, you are going to love the EntityStruct, as they are compact and they don’t need to define interfaces or implementors that must come with them.

I put a lot of effort to make structs user friendly in Svelto.ECS 2.5. Unluckily I couldn’t achieve my final goals, as the interfaces I had in mind need the new c# 7.0 by ref features, so at the moment some functionalities around the EntityStructs are not final, but still OK (or at least the best the can be at the moment I guess).

Let’s start from the initialization of the structs. This interface won’t actually change as I am actually happy with it. More often than not, you will want to initialize the value of your structs with initial values. This was quite badly managed by Svelto 2.0, as I wanted to find an allocation free, pass by reference and easy to understand solution.

I found the solution with the EntityStructInitializer struct returned by any IEntityFactory BuildEntity method. After you call BuildEntity, you will use it like this:

in this way the Player Entity will generate a HealthEntityStruct with the starting values you desire.

Alternatively you may create an Engine that inherits from SingleEntityEngine<> or MultiEntitiesEngine<,>. They both implement the Add/Remove methods with a parameter passed by reference. It supports EntityStructs and the modifications made to it will be saved directly in the database.

To summarize:

  • EntityViewStructs are like the good ol’ EntityViews. They hold Component interfaces which must still be filled with reflection (albeit cached and fast). They must implement the IEntityView interface. However they don’t allocate any object on the heap when created, differently than what EntityViews do
  • EntityStructs, which were already present in Svelto, must implement the IEntityStruct interface. They don’t allocate, they are cache friendly and they don’t use reflection. They are the top for performance. Thanks to the new Svelto ECS features, I would cautiously suggest to use them whenever you can. However keep in mind that they could make your code design worse and this is something I want to test in future on large projects.

-Allocation 0 code-

Now that you can use EntityStructs and EntityViewStructs, you are able to achieve zero allocation code while building entities. This would make Entity Pooling unnecessary.

-Handling Entity Structs and EntityViewStructs inside engines-

This is the part that is still a bit awkward due to the lack of the new c# 7.0 features. Essentially EntityStructs should always be managed inside an engine through an array. This is the way to be able to write cache friendly code. To be honest, having an array of structs is not enough, the cache can be easily broken if you don’t know how much memory can fit in the L1 cache and how much memory can be read in a cycle.  One can say that the so much advertised feature is still for advanced users, who know what they are doing and probably don’t need the Unity ECS implementation to achieve it. In more practical terms, EntityStructs can be used where entity cache is not strictly necessary. For this reason, I wanted to support the querying of EntityStructs by entity id too. The only problem is that without the by ref return of c# 7 the only way I had to solve the problem was to use something like:

it’s much less nicer than using the byref return, but still reasonable. This is error prone for junior coders though, as by mistake the lambda could capture external variables, resulting in allocation. This method will be deprecated once c# 7 will be the standard on Unity (probably still a long way to go)

Alternatively the array of entity structs can be used directly, but even in this case, The Junior Coders must know what they are doing:

In both cases the goal is to be able to modify the value back inside the centralized database of Entities. For example, in the second case, if instead of using enemies[i] I would have stored it into a local variable, the enemies[i] value inside the database wouldn’t change, as I would work on a copy of it. The c# 7 return by ref would let me do something like:

and let me using enemy directly not as a copy of the structure, but as a pointer to it.

-the EGID and Entity Groups-

Svelto.ECS 2.5 boasts a new optimized centralized data structure to hold the data of all the entities. With one data structure only it’s able to hold EntityViews, EntityViewStructs and EntityStructs without causing any boxing allocation. The same data structure is able to query a set of entity data of a given type as an array or return a single one indexed by EGID.

EGID stands for EntityGlobalID and it’s a combination of the EntityID and the GroupID where the entity is defined.

in Svelto 2.5, entities are ALWAYS built inside a group. If the group is not defined, a standard exclusive group will be used. Groups help the database query system to be more flexible. It’s possible to give to a group any meaning, for example a groups could be used to create an entity pooling system when a group is used for the “disabled” entities. Another example is to define a group as “all the entities carried by another entity in game”. There are no limitations except:

  • EntityID must be unique inside a group. The same entity ID can be used across several groups.
  • An entity can be present inside one group at time, HOWEVER nothing keeps you from creating two different entities, with the same ID, in two different groups to be shared separately. This is actually an example where the use of implementors “could” be smart. While I said that implementors should always be used as bridge between the platform and the engines, another possible use of implementors is to share the same data between different entities. In fact, several entities could be created using the same implementors. Without this trick, it would be hard to manage the same entity data in two different groups. It’s also possible to swap a specific entity between groups.

At last, it’s important to remember that the EGID of an entity is not always the same over the time. Changing the group would in fact change the value of the EGID.

-Use Entity Groups for pooling-

In the new survival code, I implemented an Entity Pooling system for the enemy spawning, let’s have a look:

When an enemy dies, I don’t destroy the GameObject and remove the entity from the database anymore. Instead I swap its group on death using the method:

I take the dead enemy ID and use the SwapEntityGroup to move it from the Standard Group to a special group ID I use to keep track of the disabled enemies per enemy type.

Then when I have to reuse the Entity I will take its EntityViewStructs from the disabled pool. The EntityStructs must be reset as well and that’s what I do inside the ExecuteOnEntity methods. Note that all the implementors now are only Monobehaviours and they are not garbage collected, so that fetched EntityViewStructs will still point to the implementors of the GameObject linked to the Entity.

-Other important differences-

  • Previously it was possible to retrieve the complete list of EntityViews regardless the group they were belonging to, this is not possible now
  • Previously the RemoveEntity function didn’t need to specify the group where the Entity laid, it’s now needed.

A review of the communication functionalities in Svelto.ECS

Svelto main communication forms are confirmed to be the Sequencer and the DispatchOnSet and DispatchOnChange methods. Currently DispatchOnSet and DispatchOnChange can only be used through component interfaces, but once c# 7 will be available in unity, maybe I will convert them in structs and let them be usable inside EntityStructs as well. DispatchOnSet and DispatchOnChange must be still seen as Data Binding systems and not as Events.

I also improved a bit the Sequencer code, so that a declaration would be even more explanatory now. Remember that the main benefit of the Sequencer is to give to the user the possibility to understand the flow of communication from its declaration without being forced to check the code and reconstruct the communication flow from it.

It would be awesome if someone would be so cool to write an Unity Editor Tool to fetch all the defined Sequencers in the Game assembly and visualize it with a node graph system. I would never have the time neither the interest to do it, but that’s why Svelto is open source, so that other people can help if they wish 😉

Note: Svelto.ECS (together with the companion Svelto.Tasks) is currently compatible with .NetStandard 2.0, Net 3.5, Net 4.5, unity IL2CPP/UWP.

(*) The evolution of my reasoning can be found in these articles that I strongly suggest to read:

Learning Svelto.ECS by Example: The Unity Survival Example

Introduction

Lately I have been discussing Svelto.ECS extensively with several, more or less experienced, programmers. I gathered a lot of feedback and took a lot of notes that I will be using as starting point for my next articles where I will talk more about the theory and good practices. Just to give a little spoiler, I realized that the biggest obstacle that new coders face when starting using Svelto.ECS is the shift of programming paradigm. It’s astonishing how much I have to write to explain the novel concepts introduced by Svelto.ECS compared to the small amount of code written to develop the framework. In fact, while the framework itself is very simple (and lightweight), learning how to move from the class inheritance heavy object oriented design or even the naive Unity components based design, to the “new” modular and decoupled design that Svelto.ECS forces to use, is what usually discourages people from adopting the framework.

Being the framework extensively used at Freejam, I also noticed that it’s thanks to my continuous availability to explain the fundamental concepts that my colleagues have less of a hard time to get in to the flow. Although Svelto.ECS tries to be as rigid as possible, bad habits are hard to die, so users tend to abuse the little flexibility left to adapt the framework to the “old” paradigms they are comfortable with. This can result in a catastrophe due to misunderstandings or reinterpretation of the concepts that are behind the logic of the framework. That’s why I am committed to write as many articles as possible, especially because I know that the ECS paradigm is the best solution I found so far to write efficient and maintainable code for large projects that are refactored and reshaped multiple times over the span of years and Robocraft as well as Cardlife are the existing proof of what I try to demonstrate.

I am not going to talk much about the theories behind the framework in this article, but I want to remind what took me to the path of ditching the use of an IoC Container and starting using exclusively the ECS framework: an IoC container is a very dangerous tool if used without Inversion of Control in mind. As you should have read from my previous articles, I differentiate between Inversion of Creation Control and Inversion of Flow Control. Inversion of Flow Control is basically the Hollywood principle “Don’t call us, we will call you”. This means that dependencies injected should never been used directly through public methods, in doing so you would just use an IoC container as a substitute of any other form of global injection like singletons. However, once an IoC container is used following the IoC principle, it mainly ends up in using repeatedly the Template Pattern to inject managers used only to register the entities to manage. In a real Inversion of Flow Control context, managers are always in charge of handle entities. Does it sounds like what the ECS pattern is about? It indeed does. From this reasoning I took the ECS pattern and evolved it into a rigid framework to the point it can be considered using it like adopting a new coding paradigm.

 

The Survival Example

let’s start downloading the project from https://github.com/sebas77/Svelto.ECS.Examples.Survival

Remember this? I rewrote it for you!

open the scene Level01 and open the project in your IDE. Everything starts from maincontext.cs file.

The Composition Root and the EnginesRoot

The class Main is the Application Composition Root. A Composition Root is where dependencies are created and injected (I talk a lot about this in my articles). A composition root belongs to the context, but a context can have more than one composition root. For example a factory is a composition root. Furthermore an application can have more than one context but this is an advanced scenario and not part of this example.

Before to start digging in the code, let’s introduce the first terms of the Svelto.ECS domain language. ECS is an acronym for Entity Component System. The ECS infrastructure has been analyzed abundantly with several articles written by many authors, but while the basic concepts are in common, the implementations differ a lot. Above all, there isn’t a standard way to solve the few problems rising from using ECS oriented code. That’s where I put most of my effort on, but this is something I will talk about later or in the next articles. At the heart of the theory there are the concepts of Entity, Components (of the entities) and Systems. While I understand why the word System has been used historically, I initially found it not intuitive for the purpose, so Engine is synonym of System and you may use it interchangeably according your preferences.

The EnginesRoot class is the core of Svelto.ECS. With it is possible to register the engines and build all the entities of the game. It doesn’t make much sense to create engines dynamically, so all the engines should be added in the EnginesRoot instance from the same composition root where it has been created. For similar reasons, an EnginesRoot instance must never been injected and engines can’t be removed once added.

We need at least one composition root to be able to create and inject dependencies wherever needed. Yes, it’s possible to have even more than one EnginesRoot per application, but this is too not part of this article, which I will try to keep as simple as possible. This is how a composition root with engines creation and dependencies injection looks like:

This code is part of the survival example which is now well commented and follows almost all the good practice rules that I suggest to use, including keeping the engines logic platform independent and testable. The comments will help you to understand most of it, but a project of this size may be already too much to swallow if you are new to Svelto. For this reason, let’s proceed as we would have done if started from scratch:

Entities

the first step after creating the empty Composition Root and an instance of the EnginesRoot class, would be to identify the entities you want to work with first. Let’s logically start from the Player Entity. The Svelto.ECS entity must not be confused with the Unity GameObject. If you had the chance to read other ECS related articles, you will see that in many of those, entities are often described as indices. This is probably the worst way possible to introduce the concept. While this is true for Svelto.ECS too, it’s well hidden. As matter of fact I want the Svelto.ECS user to visualize, describe and identify every single entity in terms of Game Design Domain language. An entity in code must be an entity described in the game design document. Any other form of entity definition will result in a contrived way to adapt your old paradigms to the Svelto.ECS needs. Follow this fundamental rule and you won’t be wrong in most of the cases. An entity class doesn’t exist per se in code, but you still must define it in a not abstract way.

Engines

Next step is to think about what behaviours to give to this Entity. Every behaviour is always modeled inside an Engine, there is no way to add logic in any other classes inside a Svelto.ECS application. For this purpose we can start from the player character movement and define the PlayerMovementEngine class. The name of the engine must be very specific, as the more specific it is, the higher is the chance the Engine will follow the Single Responsibility Rule. Naming classes properly in Svelto.ECS is of fundamental importance. It’s not just to comunicate clearly your intentions, but it’s actually more about letting you think about your intentions.

For the same reason is as important to put your engine inside a very specific namespace. If you use namespaces according the folder structure, adapt to the Svelto.ECS convention. Using very specialized namespaces helps a lot to identify code design errors when entities are used inside not compatible namespaces. For example, you wouldn’t expect any enemy entity to be used inside a player namespace, unless you want to break the good rules related to modularity and decoupling of objects. The idea is that objects of a specific namespace can be used only inside that namespace or a parent namespace. While with Svelto.ECS is much harder to turn your code in to a fully fledged spaghetti bowl, where dependencies are injected everywhere and randomly, this rule will help you to take your code to an even better level where dependencies are correctly abstracted between classes.

In Svelto.ECS abstraction is pushed on several fronts, but ECS intrinsically promote abstraction of the data from the logic that must handle the data. Entities are defined by their data, not their behaviours. Engines instead are the place where to put the shared behaviours of the same entities, so that engines can always operate on a set of entities.

Svelto.ECS and the ECS paradigm in general allow the coders to achieve one of the holy grails of clean programming, that is the perfect encapsulation of logic. Engines must not have public functions. The only public functions that need to exist are the ones needed to implement framework interfaces. This naturally leads to forget about dependency injection and avoid the awkward code deriving by the use of dependency injection without inversion of control. Engines must NEVER be injected in any other engine or whatever other type of class. If you think to inject an engine, you would just make a fundamental code design mistake.

Compared to Unity monobehaviours, engines already show the first huge benefit, which is the possibility to access to all the entity states of a given type from the same code scope. This means that the code can easily use the state of all the entities directly from the same place where the shared entity logic is going to run. Furthermore separate engines can handle the same entities so that an engine can change an entity state while another engine can read it, effectively putting the two engines in communication through the same entity data. An example of this can be seen with the engines PlayerGunShootingEngine and PlayerGunShootingFxsEngine. In this case the two engines are in the same namespace, so they can share the same entity data. PlayerGunShootingEngine determines if a player target (an enemy) has been damaged and writes the lastTargetPosition of the IGunAttributesComponent (which is a component of the PlayerGunEntity). the PlayerGunShootFxsEngine handles the graphic effects of the gun and reads the position of the currently targeted player target. This is an example of communication between engines through data polling. Later in this article I will show how to let engine communicate between them through data pushing (or data binding). It’s quite logical that engines should (and must) never hold states.

Engines are not supposed to know how to interact with other engines. External communication happens through abstraction and Svelto.ECS solves communication between engines in three different official ways, but I will talk about this later. The best engines are the ones that do not even need to trigger any form of external communication. These engines reflect a well encapsulated behaviour and usually work through a logic loop. Loops are always modeled with a Svelto.Task task inside Svelto.ECS applications. Since the player movement must be updated every physic tick, it would be natural to create a task that is executed every physic update. Svelto.Tasks allows to run every kind of IEnumerator on several types of schedulers. In this case we decide to create a task on the PhysicScheduler that allows to update the player position:

Svelto.Tasks tasks can be executed directly or through ITaskRoutine objects. I won’t talk much about Svelto.Tasks here as I wrote other articles for it. The reason I decided to use a task routine instead to run the IEnumerator implementation directly is quite discretional. I wanted to show that is possible to start the loop when the player entity is added in the engine and stop it when it is removed. However in order to do so, I need to know when the entity is added and removed.

Svelto.ECS introduces the Add and Remove callbacks to know when specific entities are added or removed. This is something unique in Svelto.ECS, but it should be used wisely. I have often seen these callbacks being abused, as in many cases is just enough to query entities. Even holding an entity reference as engine field must be seen as an exception more than a rule.

Only when these callbacks need to be exploited the engine must inherit either from SingleEntityViewEngine<EntityView> or a MultiEntitiesViewEngine<EntityView1,…,EntityViewN>. Again the use of those should be rare and by no means they intend to communicate what entities the engine is going to handle.

Engines more commonly implement the IQueryingEntityViewEngine interface instead. This allows to access the entity database and retrieve data from it. Remember you can always query any entity from inside an engine, but in the moment you are querying an entity that is not compatible with the namespace where the engine lies, then you know you are already doing something wrong. Engines should never assume that the entities are available and they must work on a set of entities. The fact that the game will always have only one player shouldn’t be assumed like I am doing in the code example. A very common approach to how to query entities is found in the EnemyMovementEngine:

In this case the engine main loop is running directly immediately on the predefined scheduler. Tick().Run() shows the shortest way to run an IEnumerator with Svelto.Tasks. The IEnumerator will keep on yielding to the next frame until at least one Enemy Target is found. Since we know that there will be always just one target (another not nice assumption), I pickup the first available. While the Enemy Target can be just one (although could have been more!), the enemies are many and the engine takes care of the movement logic for all of them. In this case I am cheating, as I am actually using the Unity Nav Mesh System, so all I have to do is just to set the NavMesh destination. To be honest with you, I never used the Unity NavMesh code, so I am not even sure of how it works exactly, this code has been just inherited from the original Survival demo.

Note that the component never exposes the Unity navmesh dependency directly. Entity Component, as I will say later, should always expose value types. In this case this rule also allows to keep the code testable, as the field value type navMeshDestination could be later implemented without using a Unity Nav Mesh stub.

To conclude the paragraph related to engines, note that there isn’t such a thing as a too small engine. Hence, don’t be afraid to write an engine even for just few lines of code, after all you can’t put logic anywhere else and you want your engines to follow the Single Responsibility Rule.

EntityViews

So far we introduced the concept of Engine and an abstracted definition of Entity, let’s now define what an EntityView is. I have to admit, of the 5 concepts of which Svelto.ECS is built upon, the EntityViews is probably the most confusing. Previously called Node, name taken from the Ash ECS framework, I realized that node meant nothing. EntityView may be confusing as well, since programmers usually associate views with the concept coming from Model View Controller pattern, however in Svelto.ECS is called View because an EntityView is how the Engine views an Entity. I like to describe it in this way, as it seems the most natural, but I could have also called it EntityMap, as the EntityView maps the entity components that the engine must access to. This scheme of the Svelto.ECS concepts should help a bit:

I suggested to start working on the Engine first, thus we are on the right side of this scheme. Every Engine comes with its own set of EntityViews. An Engine can reuse namespace compatible EntityViews, but it’s more common for an Engine to define its entity views. The Engine doesn’t care if a Player Entity definition actually exists, it dictates the fact that it needs a PlayerEntityView to work. The writing of the code is driven by the Engine needs, you shouldn’t create the entity and its field before to know how to use those fields. In a more complex scenario, the name of the EntityView could have been even more specific. For example, if we had to write complex engines to handle the logic of the player and the rendering of the graphic of the player (or the animation and so on) we could have had a PlayerPhysicEngine with a PlayerPhysicEntityView, then a PlayerGraphicEngine with a PlayerGraphicEntityView or even a PlayerAnimationEngine with a PlayerAnimationEntityView. Even more specific names could be used, like PlayerPhysicMovementEngine or PlayerPhysicJumpEngine (and so on).

EntityViews are classes that hold only Entity Components. Entity Components in Svelto.ECS are always interfaces that must be implemented, but the Engine and EntityView doesn’t need to know the implementation. For this reason, without even implementing the component yet, I can just start to type the logic in the engine like if it was in place:

Even if PlayerEntityView is still an empty class, I start to use the fields I need. Since an EntityView can hold only components, the field must be a component interface.

Hoping you use an IDE that supports refactoring, the IDE will immediately warn you that the field you are trying to access actually doesn’t exist. This is where the refactoring tools can help you speeding up the writing of the code. For example, using Jetbrains rider (but it’s the same with Visual Studio) you can create the field automatically like this:

this would add the component field in the EntityView like:

since the IPlayerInputComponent interface doesn’t exist yet, I name it and use on the spot. Then I use again the refactoring tool:

this will create an empty interface, so that now the code would look like:

the inputComponent field now exists, but it’s empty so the input field is not defined yet, but I know I need it.

Yes that’s right, I would use again the refactoring tool:

so that the IPlayerInputComponent interface will be filled with the right properties. As long as I don’t run the code, I can build it without needing to implement the Entity Component IPlayerInputComponent interface yet. Honestly, once you get in this flow, you will notice how fast can be coding with Svelto.ECS using the IDE refactoring tools.

EntityViews also help to keep the code modular. If engines would be able to access to all the entity components, you wouldn’t be pushed to think what the engine actually really needs to do. You could fall into the trap to add several responsibilities inside the same engine as you can easily access all the properties, even if they are not closely related to that engine behaviour.

Components

We understood that engines model behaviours for a set of entities data and we understood that engines do not use entities directly, but use the entity components through entity views. We understood that an EntityView is a class that can hold ONLY public entity components. I also hinted that entity components are always interfaces, so let’s give a better definition now:

Entities are a set of data and entity components are the way to access that data. In case you haven’t notice it yet, defining entity components as interfaces is another pretty unique feature of Svelto.ECS. Commonly components in other frameworks are objects. Using interfaces instead helps dramatically to keep the code abstracted. If you follow the Interface Segregation Principle writing small component interfaces, even with just one property each, you will notice that a given point you will start to reuse component interfaces inside different entities. In our example ITransformComponent is reused inside many entity views. Using components as interfaces also allow them to be implemented with the same objects, which in many case allows to simplify the communication between entities that see the same entity through different entity views (or the same one if possible).

Therefore, in Svelto.ECS, an entity component is always an interface and this interface is only used through the field of an EntityView inside an Engine. The Entity Component interface is then implemented with a so called Implementor. We are now starting to define the Entity itself and we are on the left side of the scheme above.

Components should always hold value types and the fields are always getter and setter properties. Exceptions may be made only to write setters ad getters as methods to exploit the ref keyword when this optimization is needed. This doesn’t really mean that the code is data oriented, but it would allow to create testable code as the Engine Logic shouldn’t deal with references to external dependancies. Moreover it prevents people from cheating and use public functions (which would include logic!) of random objects. The only reason one could feel the need to use references inside Entity Components interfaces is to deal with third party dependencies, like unity objects. However, the Survival example, shows how to deal with this case too, leaving the engines code testable without need to be concerned with Unity dependencies.

EntityDescriptors

This is where the Entity Descriptors actually come to help to put everything together and hopefully let everything click in place. We know that Engines can access to Entity data through the Entity Components held by the Entity Views. We know that Engines are classes, EntityViews are classes that hold only Entity Components and that Entity Components are interfaces. While I have given an abstracted definition of entity, we haven’t seen any class that actually represent an entity. This is in line with the concept of entities being just IDs inside a modern ECS framework. However without a proper definition of Entity, this would lead coders to identify Entities with EntityViews, which would be catastrophically wrong. EntityViews is the way several Engines can view the same Entity but they are not the entities. The Entity itself should be always be seen as a set of data defined through the entity components, but even this representation is weak. An EntityDescriptor instance gives the chance to the coder to name properly their entities independently by the engines that are going to handle them. Therefore in the case of the Player Entity, we would need an PlayerEntityDescriptor. This class will then be using to build the entity, and while what it really does is something totally different, the fact that the user is able to write BuildEntity<PlayerEntityDescriptor>() helps immensely to visualize the entities to build and to communicate the intentions to other coders.

However what an EntityDescriptor really does is to build a list of EntityViews!!! In the very early stages of the framework development I was letting the coders building this list of EntityViews manually, which was leading to very ugly code as the couldn’t visualize anymore what an entity actually was.

This is how the PlayerEntityDescriptor looks like:

the EntityDescriptors (and the Implementors) are the only classes that can use identifiers from multiple namespaces. In this case the PlayerEntityDescriptor defines the list of EntityViews to instantiate and inject in the engine when the PlayerEntity is built.

EntityDescriptorHolder

The EntityDescriptorHolder is an extension for Unity and should be used only in specific cases. The most common one is to create a sort of polymorphism storing the information of the entity to build on the Unity GameObject. In this way the same code can be used to build multiple type of entities. For example, in Robocraft, we use a single cube factory that builds all the cubes of the machine. The kind of cube to build is stored in the prefab of the cube itself. However this is fine as long as implementors are the same between the cubes or found on the gameobject as monobehaviours. Building entities explicitly is always preferred, so use EntityDescriptorHolders only when you have understood Svelto.ECS properly otherwise there is the risk to abuse it. This function from the example shows how to use the class:

Note that with this example I am already using the less preferred, not generic, function BuildEntity. I will talk about it in a bit. The Implementors in this case are always monobehaviours in the gameobject. Also this is not a good practice. I actually should remove this code from the example, but left to show you this other case. Implementors, as we will see next, should be Monobehaviours only when strictly needed!

Implementors

Before to build our entity, let’s define the last concept in Svelto.ECS that is the Implementor. As we know now, Entity Components are always interfaces and in c# interfaces must be implemented. The object that implements those interfaces are called Implementors. Implementors have several important characteristics:

  • Allow to uncouple the number of objects to build from the number of entity components needed to define the entity data.
  • Allow to share data between different components, as components expose data through properties, different component properties could return the same implementor field.
  • Allow to create stub of the entity component interface very easily. This is crucial for leaving the engine code testable.
  • Act as bridge between Svelto.ECS Engines and third party platforms. This characteristic is of fundamental importance. If you need unity to communicate with the engines you don’t need to use awkward workarounds, simply create an implementor as Monobehaviour. In this way you could use, inside the implementor, Unity callbacks, like OnTriggerEnter/OnTriggerExit and change data according the Unity callback. Logic should not be used inside these callback, except setting entity components data. Here an example:

Remember the granularity of your EntityViews, entity components and implementors is completely discretional and up to you. More granular they are, more the chance are to be reusable.

Build Entities

Let’s say we have created our Engines, added them in the EnginesRoot, created its EntityViews that use Entity Components as interfaces to be implemented by one or more Implementors. It is now time to build our first entity. An Entity is always built through the Entity Factory instance generated by the EnginesRoot through the function GenerateEntityFactory. Differently than the EnginesRoot instance, an IEntityFactory instance can be injected and passed around. Entities can be built inside the Composition Root or dynamically inside game factories, so for the latter case passing the IEntityFactory by parameter is necessary.

The IEntityFactory comes with several look alike functions. For the purposes of this article I will skip explaining the PreallocateEntitySlots<T> and the BuildMetaEntity<T> functions to focus on the most commonly used BuildEntity<T> and BuildEntityInGroup<T>.

BuildEntityInGroup<T> should actually always preferred, but I didn’t need it for the Survival example, so let’s see how the normal BuildEntity<T> is used inside the:

EnemySpawner Code

Don’t forget to read all the comments in the example, they help to clarify even more the Svelto.ECS concepts. Due to the simplicity of the example, I am actually not using the BuildEntityInGroup<T> which is instead commonly used in more sophisticated products. In Robocraft every engine that handles the logic of the functional cubes handles the logic of ALL the functional cubes of that specific type in game. However often is needed to know to which vehicle the cubes belong to, so using a group per machine would help to split the cubes of the same type per machine, where the machine ID is the group ID. This allows us to implement fancy things like running one Svelto.Tasks task per machine inside the same engine, which could even run in parallel using multi-threading.

This piece of code highlight one crucial issue, which I may talk more about in the next articles…from the comment (in case you haven’t read it):

Never create implementors as Monobehaviour just to hold data. Data should always been retrieved through a service layer regardless the data source. The benefit are numerous, including the fact that changing data source would require only changing the service code. In this simple example I am not using a Service Layer but you can see the point. Also note that I am loading the data only once per application run, outside the main loop. You can always exploit this trick when you now that the data you need to use will never change.

Initially I was reading the data directly from the monobehaviour like a good lazy coder would have done. This forced me to create an implementor as monobehaviour just to read serialized data. It could be considered OK as long as we don’t want to abstract the data source, however serializing the information into a json file and reading it from a service request is much better than reading this kind of data from an entity component.

Every entity needs an unique ID. This unique ID must be unique regardless the descriptor type and the group it belongs to. I took this decision recently, so if I say otherwise in other articles, please let me know I will fix it.

Communication in Svelto.ECS

One problem which solution has never been standardized by any ECS implementation is the communication between systems. This is another place where I put a lot of thought on and Svelto.ECS solves it in two novel ways. The third way is the use of the standard observer/observable pattern which is acceptable in very specific and uncommon cases.

DispatchOnSet/DispatchOnChange

We previously saw how to let engines communicate through entity components data polling. DispatchOnSet<T> and DispatchOnChange<T> are the only references (not value type data) that can be returned by entity components properties, but the type of the generic parameter T must be value type. The function names sound like event dispatcher, but instead they must be seen as data pushing methods, as opposite of data polling, a bit like data binding works. That’s it, some times data polling is awkward, we don’t want to poll a variable every frame when we know that the data changes rarely. DispatchOnSet<T> and DispatchOnChange<T> cannot be triggered without a data change, this will force to see them as a data binding mechanism instead of a simple event. Also there is no trigger function to call, instead the value of the data hold by these classes must be set or changed. There aren’t great examples in the Survival code, but you can see how the targetHit boolean of the IGunHitTargetComponent works. The difference between DispatchOnSet<T>and DispatchOnChange<T>is that the latter triggers the event only when the data actually changes, while the former always.

The Sequencer

Perfect engines are totally encapsulated and you can write the logic of that engine as a sequence of instructions using Svelto tasks and IEnumerators. However this is not always possible, as some times engines need to signal events to other engines. This is usually performed through entity data, especially using DispatchOnSet<T>and DispatchOnChange<T>, however like in the case of entities being damaged in the example, a series of independent and uncoupled engines are acting on it. Other times you want the sequence to be authoritative on the order of engines to call, like in the example where I want the death always happening for last. Not only a Sequence is very easy to use in this case, but it’s also very handy! Refactoring the sequence is super simple. So use IEnumerator Svelto Tasks for “vertical” engines and sequence for “horizontal” logic between engines.

Observer/Observable

I left the option to use this pattern especially for cases when legacy or not Svelto.ECS code must communicate with Svelto.ECS engines. For all the other cases, it must be used with extreme caution, as there is a high chance to abuse the pattern since it looks familiar to the coder new to Svelto.ECS and Sequencers are usually a better choice.

Svelto.ECS and Unity

Svelto.ECS (as well as Svelto.Tasks) is designed to be platform agnostic. However I mainly use it with unity, so Unity extensions are provided. The EntityDescriptorHolder is an example. Using implementors as Monobehaviour let to exploit most of the Unity callbacks, but on other platforms the reasoning could be very similar. All that said, Svelto.ECS gives you the chance to abstract from Unity and you should use Unity classes as less as possible, especially Monobehaviours. It’s also important to keep in mind that the creation of GameObject(s) is uncoupled from the creation of Entities, the only thing they have in common is the fact that gameobject monobehaviours can be implementors. You can see from the example that enemy gameobjects and their ECS entities are built indipendently.

Logic inside utility classes

You can create static utility classes to share code, that is not a problem as long as the static classes do not hold any state (they must be just a set of static functions)

closing notes EntityStruct:

I left the EntityStruct concept out of this article. As you may know, Unity is pushing their ECS framework as a mere optimization tool. As I am trying to explain in these articles, this is a mistake, but not only because the ECS paradigm is actually a great way to write maintainable code, but also because in real life, cases where the extreme optimizations resulting from writing cache friendly code is useful are very limited. However EntityStruct are cool and useful for multithreading code as well, but I will need to write better examples to show their power.

 

If you this article is not enough, you can also read the old ones: 

Svelto ECS is now production ready

if you are new to the ECS design and you wonder why it could be useful, you should read my previous articles:

 

That’s all folks! FEEDBACK ME!

Learning Svelto Tasks by example: The million points on multiple CPU cores example [now with Unity Jobs System comparison]

[11/03/2018] : added Unity Jobs System version and updated timings. Please check at the end of the article.

With my previous article on Svelto.Tasks and multi-threaded cache friendly code, I failed to show visually the power of Svelto.Tasks because I didn’t know how to upload a huge amount of data to the GPU without stalling the main thread. Honestly, I stopped being a graphic programmer right before DX 11 was introduced, so my knowledge of modern pipelines is limited. I also thought that to find a good solution it would have been necessary to write some kind of low level workaround to overcome the Unity API limitations, but actually Unity has already got what I needed, it was just necessary some investigation and help to find out :).

With my great astonishment, using the compute buffers API is possible to upload every frame a huge amount of data without affecting the CPU too much. I still have to understand how this works and which DX 11 function ComputeBuffer.SetData maps onto, so if you know, please leave a comment, as I need to understand it, although it is not my priority for this current demo. As matter of fact, it was enough for me to know that uploading the vertices transformed on the CPU would not affect the final performance in a significative way.

I immediately threw some lines of code to show off how simple and efficient is to work with multi-threads with Svelto.Tasks and after combining it with the new IL2CPP feature of Unity 2018, I achieved the incredible number of 1 million particles transformed on the CPU at >30fps!! If someone would have told me that this was possible, to be honest, I would have had my doubts. However let me clear, these kind of demo are pretty lame because it doesn’t make much sense to do these kind of operations on the CPU. This is exactly what the GPUs are for. It’s more a show off than something practical, but the library, obviously, can be used for better cases.

I was also lucky enough to find a good and, most of all, simple demo on github, called MillionPoints, which does what I needed, that is transforming 1 Million points on the GPU using Compute Shaders and the Graphics.DrawMeshInstancedIndirect function. All I had to do is to convert the simple compute shaders code in pure c# and make it run on the CPU.

While I still don’t consider my self a multi-threaded code expert, because one never stops to learn and I didn’t have the chance to use multi-threading in sophisticated algorithms yet, I may dare to say that I start to have a good understanding of all the problems involved and consequentially I designed Svelto.Tasks to be very simple to use in a multi-thread environment, exactly like the new unity c# job system does. Since in Freejam we hire a lot of junior programmers, I had the chance to see first hand all the problems that could arise to give advanced tools to inexpert hands. That’s why I had to design something very straightforward to use and most of the time, worries free. This is what I hope to have achieved with Svelto.Tasks, improving it constantly over the years.

There are two keys elements that make Svelto.Tasks powerful: the runners (or schedulers) and the continuation. The runners are designed to run every kind of IEnumerator (or often called task) on every kind of defined Runner. Svelto.Tasks already ships with a lot of Unity related Runners, but two are platform agnostic: the MultithreadRunner and the SyncRunner.

The concept of continuation (similar to await/async) is even more powerful. It allows to start a task running on a specific scheduler from another task running on another scheduler and continue from there once the new task is finished. This is much simpler to code than to explain. Although there are similarities, Tasks.Net and Svelto.Tasks are obviously designed differently as the latter is designed around all the problematic intrinsic in a game development production, including performance.

Enough talk now so let’s dive in the details. Open the scene main.unity and click on the MillionPoints GameObject. Be sure that the MillionPointsCPU Monobehaviour is enabled and the GPU one is disabled. Run it, don’t expect great performance in the editor though, there is a huge difference between editor and clients in this case. I will show all the differences in bit. BTW I am currently using Unity 2018.1b3.

The goal is to perform the following cache friendly CPU instructions inside the ParticleCPUKernel MoveNext() method inner loop on 1 Million points every frame using tasks running on multiple cores:

I skip all the compute buffer initialization stuff because is not relevant for this article and I start directly from the function StartSveltoCPUWork(). First of all, I decided to split the job in 16 threads. As you know, when it’s time for CPU intensive work, increasing the number of threads more than the number of available cores gives a logarithmic-like gain, so even 16 is already probably too much on my 8 cores machine. Since the operations to apply on the vertices are quite straightforward, we can just have each thread operating on a specific segment of the vertices array and this is what the function PrepareParallelTasks method does.

The MultiThreadedParallelTaskCollection has a similar interface to the simpler ParallelTaskCollection, but it is able to run N tasks on M threads using M ParallelTaskCollections. You may have figured out how this works already. it basically creates M Threads and runs one ParallelTaskCollection on each. The N tasks are split among the M ParallelTaskCollections, so the execution of M ParallelTaskCollections, and not N tasks, are truly in parallel. When N coincide to M, then all the tasks run in parallel like in the case of this example. In this case we initialize 16 ParallelTaskCollections running 1 task each. This task applies the ParticlesCPUKernel methods instructions on 1.000.000 / 16 particles.

Remember, the MultiThreadedParallelTaskCollection is just an IEnumerator than must run like the others tasks in Svelto.Tasks, so it doesn’t just start on its own. For the purpose of this article, I show two different ways to start the collection execution. All the following code assumes that you know well how to work with the yield instruction.

Let’s start from enabling the #Test2 define as it is the simplest scenario. In this case, everything is driven by a single loop running on the main thread.

As I said, this is the simplest flow to understand. The intention is to wait for the other threads to finish, that’s why I run the multithreaded parallel collection execution on the SyncRunner that is designed to stall the current thread. Once the execution is completed, the rest of the code will run on the main thread.

This is probably not the best solution, because the other threads could start compute the particles operations for the next frame between the end of this function and its next execution. Let’s see this visually:

in the way we are running the loop, the multithreaded collection starts during the Update phase, as the main thread loop is running on the Update Runner. While it’s running it stalls the main thread, so nothing else can be executed, then it finished and the process continue. The red vertical line can give an idea about where the threads start and complete to run. However why should the update phase wait for the threads to finish their operation when those could have ran outside the Update phase in parallel with it?

We could invert the way we trigger the multi-threads operation so that we compute the next values in between updates instead than inside the update phase.

There is more than one way to achieve this. Let’s see an example enabling the define #Test1

In this case we have two loops. One running on the main thread, as we have started the MainThreadOperations task using the standard UpdateRunner, and the other on an other thread, starting OperationsRunningOnOtherThreads on the standard MultiThreadedRunner before the main loop starts. Note that the standard threaded runner is used just to check when the _multiParallelTask completes, as the _multiParallelTasks uses its own set of threads to run.

At this point _waitForSignal and _otherwaitForSignal are used to signal when the operations on each thread are completed. I hope this is intuitive. The main thread first waits for the other threads to finish, when this happens the draw mesh is issued and the main thread will signal the other thread to start the operations for the next frame. Since the other thread can finish running before the next update, it will yield its execution to other threads until the are signalled to compute the next values.

The _breakit part is a bit awkward at the moment. It’s necessary to be sure that the threads will stop when the code is executed in the editor, as stopping the execution in the editor won’t kill the running threads like it normally happens with a standalone application (note, it has been eliminated with the latest updates).

Last scenario is an alternative possibility:

so what’s the deal here? Again two loops, although this time the one running on the main thread just keeps on issuing the draw mesh with the last updated particles. In the multithreaded loop instead continuation is exploited to set the freshly computed particles to the compute buffer. Since Unity will throw an exception if this is done on other threads, we have to run the task on the main thread and wait it to finish.

The frame rate for this case is much higher, but don’t be fooled, the reason being that the mainthread this time is not stalled, however the particles will be updated at the same frame rate of the other cases, more or less.

Before to test the performance, let me spend two words about the cache friendly code created to compute the particle values. As you may notice, I use a lot of ref and out. This is because it’s very important to avoid copying structure when not necessary as it can hit hard the stack. This is also why c# 7.2 has recently introduced the byref returning value and byref variables, so that I can write simpler code to avoid copies on the stack of structs. You should always pass your Vector[N] by ref or out.

Let’s do some benchmarking now, using the #Test1 case:

Mono/.Net 4.6 ~20fps
IL2CPP ~48fps
UWP ~23fps
UWP .Net Native ~59fps(*)

Last test was just an experiment. I knew that UWP .net core code can be compiled in native code through the .net native toolchain, therefore I had obviously to compare it against IL2CPP. The result make me think that a future integration in Unity for standalone platforms, if possible, would be beneficial (update: now available in Unity 2018, * I wasn’t able to reproduce the UWP .Net Native timings, so I may have done something wrong then. They are like the IL2CPP timings in my new tests)

And finally the project can be downloaded from github as usual: https://github.com/sebas77/Svelto.Tasks.MillionPoints

P.S.: If you notice that Svelto.Tasks code can perform better, please tell me, I am sure there can be some areas to improve, as I continuously do it.

Update: Svelto.Tasks and Unity Jobs System comparison

Obviously I was curious to see how Unity Jobs Systems compares with Svelto.Tasks. Both systems have been designed to write multi-threaded code worries free, but Svelto.Tasks relies only on the power of c#, while Unity Jobs is allegedly mainly written in native code, exploiting the internal job workers of the Unity Engine.

There isn’t any difference between threads in c# and threads in c++. Threads are anyway handled by the operative system, therefore in both languages what it’s implemented is just a wrapper of the underling system. For this reason, I had some concerns about Unity Jobs System, as we have noticed in the past how marshaling could affect the performance of c# code.

However I can confirm that, once compiled, Unity Jobs runs at the same speed of Svelto.Tasks with very similar results. The IL2CPP still needs some optimizations, as Svelto.Tasks there is actually faster. As IL2CPP is not affected by the marshalling issue, it’s very likely that the Unity Jobs work in IL2CPP is not completed.

Let’s start from the standard mono version first. I have updated the code in github and the Unity Jobs System version is under the folder UnityJobsKernel.

I have compiled the “Naive” (#define TEST2) version to execute Svelto tasks and the Signal based version (#define TEST1). Both of them execute at the same speed of the Unity Jobs version. It’s important to note that the Unity Jobs version maps almost exactly to the “Naive” version of the Svelto solution.

Now you would wonder, why is the Naive version not slower than the Signal based version if it’s so Naive? I will come to that later, as it actually behaves exactly as I assumed, meaning that in a real world scenario, the “advanced version” could be faster than either the Naive version and the Unity jobs system version.

Let’s see the results of the 1 Million Points simulation (Unity Jobs is not available in UWP yet, thus I couldn’t test the performance there). This time I am measuring milliseconds ranges (the lower the faster)

Svelto Tasks Svelto Tasks Naive  

Unity Jobs System

Mono 56-59 57-59 56-57
IL2CPP 24-26 24-25 29-30
UWP 45-49 45-49 n/a
UWP Native 23-24 23-24 n/a

Pretty close right? Difference in IL2CPP could be significant, but Unity will very likely improve it. I am not sure what happened with the UWP Native Toolchain profiling there. Somehow I am not able to reproduce the timings of the first profiling. Maybe I did something wrong then? I don’t care investigating as the platform is not a priority for me and actually it would mean that IL2CPP is as good as the .Net Native Tooilchain platform.

I will not dig in the Unity jobs details. I don’t have the time and it’s out of the scope of this article. However I will explain why it maps to the naive version of Svelto.Tasks solution. As explained in the first part of this article, the naive version stalls the main thread until the offloaded operations are not completed. Working on Svelto.Tasks I learned that the concept of main thread is honestly obsolete. I would love to work with an engine where the main thread is not a thing, after all that’s also the mentality behind DX 12 and Vulcan. Even with the Svelto.Tasks Vanilla Example the applications runs in its own thread that is not the “main thread”.

However following the “naive” approach, its not optimal code must rely exclusively on number of CPU cores and their power, more than on the fact that the code is multi-threaded. This is what I explained above when I showed the unity frame rendering. We don’t want just to trigger a “burst” of operations, we actually should be able to run code in parallel with the rest of the unity pipeline.

As showed, the “Advanced” and “Naive” update have similar results, but what would happen if the main-thread executes other heavy operations outside the Job update? Let’s see what happens if we add a Thread.Sleep(10) in the main update simulating a main thread taking 10 milliseconds extra for other operations:

  Svelto Tasks Unity Jobs System
     
Mono 57-62 68-70

Errata Corrige:

actually there is a way to run the just scheduled jobs without needing to call the Complete() method. This can be done through the static ScheduleBatchedJobs() method of the JobHandle class. I changed the Unity Jobs System kernel in this way:

and now the new timings are more in line with Svelto.Tasks:

Svelto Tasks Unity Jobs System
Mono 57-62 59-62

This is closer to a real life scenario than all the Unity Jobs demo showed so far. I understand anyway, Unity wants to keep things simple. However I don’t think with Svelto they are so hard, so personally I will keep on using Svelto (for this and many other reasons), but in future I will integrate the Unity Jobs because of the future coming “burst” technology, however this one must perform faster than IL2CPP, otherwise there is no real point in using it.

As usual, don’t quote me without testing yourself! I write these articles only when I have time, which usually is during the late night, so it’s better for you to double check always, running my example that you can find on github 😉

Learning Svelto.ECS by example – The Vanilla Example

It’s not simple to learn a new framework and even less shift code paradigm. This is why Svelto.ECS has been written with simplicity in mind. Even so, I realised that I ought to write more tutorials to clarify some simple concepts that are not straightforward to grasp at first. That’s why I thought to explain, step by step, what I have done with the examples bundled on github.

A word before to start: I would really appreciate if you send me your feedback and, when you start to have a good understanding of the framework, share your knowledge with others. I am willing also to review your code when possible, if you share it publicly, for example on github. You can also feel free to update the Svelto.ECS wiki page and send pull requests. You may also think that it would be better to study Unity ECS framework and in this case I would say that being an open sourced c# framework built on top of the Unity engine, it will probably do the same things Svelto.ECS does, only in a different way. At the moment I recognize that Svelto.ECS has some unique features that cannot be found on other currently available frameworks, so it’s your choice to take advantage of them and compare them. Svelto.ECS has been engineered to be both fast and convenient to use, but also rigid enough to dictate, when possible, the right way to design your code, reliving the coder from the responsibility to take the right decisions to write maintainable code easy to refactor without penalizing performance.

All that said, let’s start from the simplest possible project! This project doesn’t even use Unity, so you can just run it with Visual Studio or compatible IDE/compiler. Just for fun I decided to use the .Net Core and .Net Standard frameworks (yes both!!) but the code will compile also on the classic .Net framework. Get it from https://github.com/sebas77/Svelto.ECS.Vanilla.Example

It’s a bit longer than I would have liked because I decided eventually to cram inside most of the features available. Still, if you remove the comments, the real lines of codes aren’t many!

Let’s have a look at the lot, hoping you will not get lost (you shouldn’t)

Well, well, still there? I put the whole code in one file only to demonstrate that the boilerplate to write is very minimal, remember I am showing virtually all the features available in the framework. Now, instead to repeat what I wrote in the code comments, I will try to explain the concept behind it. Please be sure to have understood the previous code before to continue reading. When I develop with Svelto.ECS (always now :)), the real first task I perform is to identify conceptually the entities I want to manage. This first step is of crucial importance and I will explain better why in the next articles about pitfalls and best practices. You must strive to identify an Entity inside the game design domain, using a game design language. For example, good entities are: MainCharacterEntity, CoinGUIWidgetEntity, MinimapEntity, TileEntity (if you have tiles that perform any logic) and so on. In the example, I identified the entities SimpleEntity and SimpleEntityStruct. Those are really bad names, first because they are conceptually abstracted and they must not be, second because the fact that an Entity will be built in a group doesn’t actually identify a new entity as the same entity can be built either grouped or not. Those names are for the purpose of the simplest example I could write, so forgive me to break the rules.

Let’s pick-up now the entity we want to start to work with, for example the SimpleEntity. The next step is to add behaviours to this entity and in order to do so an Engine must be created (If you never read any article of mine before, please note that I chose the word Engine over System as I find it more meaningful). Now for the sake of the example I called it BehaviourForSimpleEntityEngine but you have to name your engines according the behaviour you want to implementFor example good names for Engines are: WingsPhysicEngine, WheelsRenderingEngine, MachineInputEngine and so on…

In a classic ECS framework, Systems usually handle entities directly. Instead I decided to use a different approach that I first saw in the Actionscript ECS framework called Ash, introducing the concept of EntityView (previously named Node in Svelto.ECS 1.0). It’s very simple: an EntityView is how the Engine must see the Entity. It’s basically a filter to not let the Engine have access to all the Entity data. In this way you can focus the engine responsibility, since global data access would promote less modularity. If the Lazy Coders(*) could access to ALL the entity components nothing would keep them from adding more logic in the engine that is not really responsibility of that engine. EntityViews promote encapsulation, modularization, single responsibility and design rigidity.

In our case the BehaviourEntityClassEngine can see and handle the entities SimpleEntity through the views BehaviourEntityViewForSimpleEntity (yeah, it’s pretty hard to name abstract concepts)Entity views can be created as classes for flexibility and as structs for speed. I will talk about EntityViews more on other articles as their properties must be explained properly, but I don’t want to add too much information right now.

Now careful about this: an Engine usually should just implement the IQueryingEntityViewEngine interface  as most of the time is enough to query the entities you want to handle with the EntityViewsDB. However Svelto.ECS has also the feature to know when entities are removed and added from the engine through the Add and Remove callbacks. If you want to enable them, you can use the SingleEntityViewEngine and MultiEntityViewsEngine pre-made classes which help to reduce boilerplate code. You will notice that the MultiEntityViewsEngine doesn’t accept more than four EntityViews. This because empirically I noticed that an engine handling more than four different entities is very likely to have more responsibilities than should and must be refactored. Also note, engines should never contain states and if you design your entity views properly, you don’t really need to. A perfect engine is totally encapsulated and its methods are both private and static (pure encapsulated logic). Advanced scenarios will be explained in other articles.

EntityViews group Entity components. When EntityViews are classes, EntityViews group components as interfaces. In Svelto.ECS components are always seen as interfaces, this is another unique feature and I will explain the benefits on other articles. In our simple example, the SimpleEntity is composed by just one component the ISimpleComponent. An entity can be composed by several components and an EntityView can reference more than one entity component, but for the time being, let’s keep things simple. When Entity components are structs, they actually coincide with the components themselves. Entity as struct is an advanced concept and I don’t need to explain it now, as you would need to use them only in very specific scenarios or just for fun.

OK that’s it, the code now compiles as you have all the elements needed, however it will not run since those components are not implemented…well that’s right, I didn’t explain how to actually build the entities. Build entities come for last in the order of things to do.

Often you will read that in modern ECS frameworks entities are just IDs. This concept can be extremely confusing for a person who comes from an object oriented background, so for the moment ignore it. In Svelto.ECS you have a real convenient way to identify an Entity and this is the EntityDescriptor. In the example I avoided all the boiler plate needed to create an EntityDescriptor using the GenericEntityDescriptor and the MixedEntityDescriptor pre-made classes. Use them whenever possible.

The EntityDescriptor links Entity Components to the relative EntityViews and/or EntityStructs. If EntityViews as classes are used, we just need a way to implement those components as interface. Finally this is what the Implementors are for! Implementor is a very powerful concept, but I don’t need to explain them in detail now. After all, from the example, it should be simple to understand how to use them.

OK This should be enough to give you an introduction to the framework! Is something not clear? Please leave a comment here, I will try to help!

In summary:

  • Always define your Entities conceptually first
  • Start to write Engines and identify the EntityViews that these engines need. EntityViews define how the Engine must see the entities it must handle, so EntityViews are named after the Engine and the Entity and must be created together with the Engine and in the Engine namespace.
  • Define the Entity Components the engine must access to while you code the Engine behaviours, use refactoring tool to easily generate the interfaces you need (I will explain some tricks on other articles)
  • Create the EntityDescriptors that link the Entities to the EntityViews
  • Finally implement the Entity Components as interfaces through Implementors and build your entities. The EntityView instances will be automatically generated and inserted in the EntityViewsDB by the framework.

(*) Lazy coders are the best, but if they lack of discipline they could lead to catastrophes 🙂

if you are new to the ECS design and you wonder why it could be useful, you should read my previous articles:

Check also the other ECS articles:

Learning Svelto.ECS by Example: The Survival Example

and for reference:

Svelto ECS is now production ready

Svelto.ECS 2.0 is production ready

This an introductory post to Svelto.ECS 2.0. Other two posts will follow, one explaining the examples line by line and another explaining all the Svelto.ECS concepts in a simpler fashion than before. Therefore this article is written for who already knows Svelto.ECS.

Svelto.ECS 2.0 is (at the time of writing this) available in alpha stage (pick the alpha branch to have a look now) and, compared to Svelto.ECS, introduce new features, new optimizations and unluckily some breaking changes (main reason why I decided to bump the major version).

I have also updated the Svelto.ECS.Examples code (available on the alpha branch at the time of writing) and added a new “vanilla” example which I can consider the minimum amount of code to write to show almost all the framework features available without using Unity.  The new example is, in fact, targeting Net.Core and Net.Standard.

I will quickly go through the list of new features:

  • New terminologies have been introduced. If you read the previous articles or used Svelto.ECS 1.0, you must know that the term “Node” has been abolished as it was meaningless in the framework context and has been renamed to “EntityView“. I will explain why it makes more sense with the next articles.
  • The GenericEntityDescriptor has been extensively used in all the examples to show how to reduce the boilerplate code with minimum effort.
  • EntityDescriptors are now pure static classes and must not be instantiated. The EntityDescriptor must conceptually identify your entity. The future articles will explain why it’s a big design mistake to identify Entities as EntityViews instead. A Entity is now built with the new signature entityFactory.BuildEntity<SimpleEntityDescriptor>(entityID, implementors). In any case you should rarely need to inherit from EntityDescriptor as most of the time you must inherit from GenericEntityDescriptor and MixedEntityDescriptor.
  • EntityDescriptors can now refers, using the same signature, either to class based EntityViews and struct based Entities, so that entityFactory.BuildEntity<SimpleStructEntityDescriptor>(entityID); will actually build struct based entities. Previously the struct based entities were build in a more awkward fashion. While I deprecated the previous method to build struct entities, changes may be introduced later to support thread safe code as currently Svelto.ECS data structures are not thread safe (custom structures or strategies must be adopted to support thread safe engines). With the next articles I will explain when and how to use EntityViews and Entities as struct. EntityViews should be used for flexibility and Entities as structs for speed. An EntityDescriptor must inherit from the new MixedEntityDescriptor generic class to build entities made out of Entity structs and/or EntityViews.
  • Another very important feature is the possibility to create entities inside buckets (groups). In this way is possible to query a set of entities from a specific group. This is a feature that was absolutely needed in our products (and very likely in yours too). For example we can now easily query all the wings for each machine in Robocraft. This means that we could potentially create a Svelto.Tasks taskroutine for each machine inside the WingsPhysicEngine and stop a specific task if all the wings of a specific machine are shot off. Previously there wasn’t a way to be able to split wings per machine unless custom datastructures were used inside the engines. The new function has the signature: entityFactory.BuildEntityInGroup<SimpleStructEntityDescriptor>(entityID, groupID) and can be used both for struct Entities and class EntityViews.
  • Entities can also be moved between groups. This was a last minute idea and I need to experiment more with it.
  • EnginesRoots must never be hard referenced. Previously I made two mistakes: first hard referencing it inside the SubmissionNodeScheduler (now called SubmisisonEntityViewScheduler) and second letting it be passed around through the IEntityFactory interface. Now an IEntityFactory wrapper must be generated from the EngineRoot.
  • A IRemoveComponent cannot be implemented with a custom implementor any more. Just use it and don’t worry to pass a RemoveImplementor among the other implementors (it will be ignored as it is created by the framework now).
  • The framework is now more rigid and output more warnings in case of misuse.
  • All the previous framework engines, except for the SingleEntityViewEngine and MultiEntityViewsEngine, have been deprecated. An engine can inherit either from one or the another and/or implement IQueryingEntityViewEngine (which is an IEngine too now)
  • I am adding comments all over the code, but I will add more while I write the new articles to come.

then optimizations:

  • Building an entity can be up to 3.3 times faster than in svelto.ECS. However remember that entities should be sporadically or never built during the execution of the gameplay. Entities should always be prebuilt and enabled/disabled when needed. This is even more effective than pooling them, so design them properly. Below a table to show how long takes to build 256k entities as class with Unity 2017.3 (building entities as struct is an order of magnitude faster).
    Version Platform Time(MS)
    1.0 .net 2.0 903
    2.0 .net 2.0 344
    2.0 .net 4.6 260(!)
    2.0 .net Core UWP 203
    2.0 .net Core IL2CPP 288

    if you wonder while IL2CPP is a bit slower, is because I cannot dynamically create code that allows avoid Reflection functions, however the c++ implementation of the reflection is much faster so no much trouble there.

  • Thanks to the new feature to move entities between groups, you can create a group to store the disabled entities, so that you can retrieve a disabled entity yourself to be reused later.
  • Several optimizations here and there, especially to reduce allocations.

and breaking changes (at least the ones I took note of):

  • The BuildEntity signature is totally different and will need to be explained in an another article.
  • IRemoveComponent was breaking a not enforced (yet) rule of Svelto.ECS which is that a component cannot hold references to other classes (except Unity ones at the moment). This is because a component must be seen a data container and cannot be used to call external functions. Thus, the method to remove an entity has changed to _entityFunctions.RemoveEntity<SimpleEntityDescriptor>(entityView.ID);
  • Mass Renames (hope I didn’t get any wrong, I will fix later in case):
    • rename IQueryableNodeEngine into IQueryingEntityViewEngine
    • rename IEngineNodeDB into IEngineEntityViewDB
    •  rename NodeWithID into EntityView
    •  .QueryNode< to .QueryEntityView<
    • .QueryNodes< to .QueryEntityViews<
    • .TryQueryNode( to .TryQueryEntityView(
    • all the nodesDB must be renamed to entityViewsDB
    •  MultiNodesEngine to MultiEntityViewsEngine
    • SingleNodeEngine to SingleEntityViewEngine
    • UnitySumbmissionNodeScheduler is UnitySumbmissionEntityViewScheduler
    • INodeBuilder to IEntityViewBuilder (you should use the GenericEntityDescriptor/MixedEntityDescriptor though)
    • NodeBuilder to EntityViewBuilder  (you should use the GenericEntityDescriptor/MixedEntityDescriptor though)
  •  ICallBackOnAddEngine doesn’t exist anymore and it has been merged into the IQueryingEntityViewEngine interface. This means that the Ready function must be always implemented in these cases.

N.B.: As long as Svelto.ECS 2.0 stays in to alpha state, do not use in a production environment. It still needs to be heavily tested and I will need to write some unit tests too. However you are invited to start experimenting with it and leave some feedback.

How Svelto.ECS + Svelto.Task help writing Data Oriented, Cache Friendly, Multi-Threaded code

Note: this article assumes that the reader knows how to use Svelto.ECS and Svelto.Tasks although the findings are interesting regardless.

Introduction

New exciting features are coming to Svelto.ECS and Svelto.Tasks libraries. As I am currently focused on optimizing Robocraft for Xbox One, I added several functionalities that can help making our game faster. Therefore I decided to write a simple example to show some od them. I have to be honest, it’s very hard to think about a simple example that makes sense. Every game has its own needs and what makes sense for one couldn’t be that meaningful for others. However everything boils down to the fact that I added features to exploit data locality (CPU cache) and easily write multi-threaded code. I have already discussed about how to use Svelto.Tasks multi-threaded ITaskRoutine, so I hope I can now show how to use them with Svelto.ECS. Spoiler alert: this article only won’t be sufficient to show you the potentiality of the library, as being more thorough would make this post too long and complex, therefore I encourage you to experiment on your own or wait for my next articles (that come every six months or so :P). This article also assumes that you know the basic concepts behind Svelto.ECS and Svelto.Tasks.

Initially I wanted to write an example that would mimic the boids demo showed at the unite roadmap talk that you can briefly see on this unity blog post. I soon decided to stop going down that route because it’s obvious that Joachim took advance of the new thread safe transform class or even wrote a new rendering pipeline on purpose for this demo, as the standard Transform class and even the GameObject based pipeline would be the biggest bottleneck impossible to parallelize (Note: I eventually found a work around to this problem to show visually the power of the multi core Svelto.Tasks approach). Since I believe that massive parallelism makes more sense on the GPU and that multi-threading should be exploited on CPU in a different way, I decided to not invest more time on it. However as I had some interesting results, I will use what is left of this useless example I wrote to show you what potential gain in milliseconds you may get using Svelto.ECS and Svelto.Tasks. I will eventually discuss how this approach can potentially be used in a real life scenario too.

Svelto.ECS and Svelto.Tasks have a symbiotic relationship. Svelto.ECS allows to naturally encapsulate logic running on several entities and treat each engine flow as a separate “fiber“. Svelto.Tasks allows to run these independent “fibers” asynchronously even on other threads.

GPUs work in a totally different fashion than CPUs and operative systems take the burden to handle how processes must share the few cores usually available. While on a GPU we talk about hundreds of cores, on a CPU we can have usually only 2-12 cores that have to run thousands of threads. However each core can run only one thread at time and it’s thanks to the OS scheduling system that all these threads can actually share the CPU power. More threads run, more difficult is the OS task to decide which threads to run and when. That’s why massive parallelism doesn’t make much sense on CPU. At full capacity, you can’t physically run more threads than cores, thus multi-threading on CPU is not meant for intensive operations.

The Example

You can proceed now downloading the new Svelto.ECS cache example and open the scene under the Svelto-ECS-Parallelism-Example folder. Since I removed everything from the original demo I was building, I can focus on the code instructions only and that’s why this example is the simplest ever I made so far, as it has only one Engine and one Node. The demonstration will show four different levels of optimization, using different techniques and how is possible to make the same instructions run several times faster. In order to show the optimization steps, I decided to use some ugly defines (sorry, I can’t really spend too much time on these exercises), therefore we need to stop the execution of the demo in the editor and change the define symbols every time we want to proceed to the next step. Alternative you can build a separate executable for each combination of defines, so you won’t have to worry about the editor overhead during the profiling. The first set of defines will be: FIRST_TIER_EXAMPLE;PROFILER.

Let’s open and analyse the code. As usual you can find our standard Context and relative composition root (MainContextParallel and ParallelCompositionRoot). Inside the context initialization, a BoidEngine is created (I didn’t bother renaming the classes) as well as 256,000 entities. Seems a lot, but we are going to run very simple operations on them, nothing to do with a real flock algorithm. Actually the instructions that run are basically random and totally meaningless. Feel free to change them as you wish.

The operations that must be executed are written inside the BoidEnumerator class. While the code example is ugly, I wrote it to be allocation free to be as fast as possible. A preallocated IEnumerator can be reused for each frame. This enumerator doesn’t yield at all, running synchronously, and operates on a set of entities that must be defined beforehand. The set of operations to run, as already said, are totally meaningless and written just to show how much time very simple operations can take to run on a large set of entities.

The main loop logic, enclosed in the Update enumerator, will run until the execution is stopped. It’s running inside a standard Update Scheduler because I am assuming that the result of the operations must be available every frame. In the way it’s designed, the main thread cannot be faster than the execution of the operations even if they run on other threads. This is very easily achievable with Svelto.Tasks.

The Naive Code (Editor 113ms, client 58ms)

So, assuming you didn’t get too much confused by the things happening under the hood, running the example will show you that the operations takes several milliseconds to be executed (use the stats window if you use the editor). in my case, on my I7 CPU, running the FIRST_TIER_EXAMPLE operations takes around 113ms. 

this is how the code looks at the moment:

Let’s have a look a the MSIL code generated

this set of operations runs 4 times for each entity; As I stated before, they are meaningless and random, it’s just a set of common and simple operations to run.

The Callless Code (Editor 57ms, Client 23ms)

Let’s now switch to SECOND_TIER_EXAMPLE, changing the defines in the editor and replacing FIRST_TIER_EXAMPLE. Let the code recompile. Exactly the same operations, but with different instructions, now take around 65ms (client 24ms)…what changed? I simply instructed the compiler to not use the Vector3 methods, running the operations directly on the components.

The code looks like:

It appears to me that the main difference is the number of call/callvirt executed, which obviously involves several extra operations. All the calls involve saving the current registries and some variables on the stack, call virt involves an extra pointer dereference to get the address of the function to call. More importantly, we avoid the copy of the Vector3 struct due to the fact that the method results are passed through the stack.

Let’s quickly switch to THIRD_TIER_EXAMPLE. The code now runs at around 57ms (client 23ms, almost no difference, interesting), but the code didn’t change at all. What changed then? I simply exploited the FasterList method to return the array used by the list. This removed another extra callvirt, as the operator[] of a collection like FasterList (but same goes for the standard List) is actually a method call. An Array instead is known directly by the compiler as a contiguous portion of allocated memory, therefore knows how to treat it efficiently.

The Cache Friendly Code (Editor 24ms, Client 16ms)

Have you have heard how ECS design can easily enable data oriented code? This is what the FOURTH_TIER_EXAMPLE define is about. The code now runs at around 24ms, another big jump, but again it seems that the instructions didn’t change much. Let’s see them:

It’s true that now the code is call free, but would this justifies the big jump? The answer is no. Just removing the last call left would have saved around 10ms only, while now the procedure is more than 30ms faster. The big saving must be found in the new EntityViewStruct feature that is enabled by the FOURTH_TIER_EXAMPLE enables. Svelto.ECS allows now to build EntityViews as class and as struct. When they are built as struct, you can exploit the full power of writing cache friendly data oriented code.

If you don’t know how data is laid out in memory, well, stop here and study how data structures work. If you know c++ you are already advantaged, because you know how pointers work, but otherwise, you must know that you can write contiguous data in memory only with array of value types, including struct, assuming that these structs contain only value types! Svelto ECS now allows to store EntityViews either as classes and structs, but when structs are used, the EntityViews are always stored as simple array of structs, so that you can achieve the fastest result, depending how you design your structs (Structure of Arrays or Array of Structures, it’s up to you!)

To put it simply, when you use references, the memory area pointed by the reference, is very likely nowhere in the memory near where the reference itself is. This means that the CPU cannot exploit the cache to read contiguous data. Array of value types (which can be struct containing valuetypes) are instead always stored contiguously.

Cache is what makes the procedure much faster. Nowadays CPUs can read a continuous block of data up to 64 bytes in one single memory access from the L1 cache (which takes around 4 cycles of clock). The L1 cache is efficiently filled by the CPU on data access, caching 32KB of consecutive data. As long as the data you need to access is in those 32KB of cache, the instructions will run much faster. Missing data from the cache will initialize a very slow RAM memory access, which can be hundreds of cycles slow! There are many articles around talking about modern CPU architecture and how data oriented code takes full advantage of it, so I won’t spend more time, but I will link some resources once I have the time, so check the end of the article for new edit every now and then.

Caching is weird though. For example it could come natural to store the whole EntityViewStruct in a local variable, while this seems a harmless instruction, in a data oriented context it can be disaster! How does removing the storing to a local variable of the struct make it faster? Well this is where things get very tricky. The reason is that the contiguous amount of data you need to read must be not too big and must be able to be read in one memory access (32 to 64bytes, depending the CPU).

Since our BoidNode struct is quite small, caching the variable here actually would have made the execution just slightly slower. However if you make the Boidnode struct larger (Try to add other four Vector3 fields and cache the whole structure in a local variable), you will wreck the processor and the execution will become largely slower! (edit: this problem could be solved with the new Ref feature of c# 7 which, unluckily, is still not supported by Unity)

Instead accessing directly the single component won’t enable the struct-wide read due to the copy to a local variable and since x,y,z are read and cached at once, these instructions will run at the same speed regardless the size of the struct. Alternatively you can cache locally just the position Vector3 in a local variable, which won’t make it faster, but it will be still work fast regardless the size of the struct.

The Multi Threaded Cache Friendly Code (Editor 8ms, Client 3ms)

To conclude, let’s keep the FOURTH_TIER_EXAMPLE define, but add a new one, called TURBO_EXAMPLE. The code now runs at around 8ms. This because the new MultiThreadedParallelTaskCollection Svelto.Tasks feature is now enabled. The operations, instead to run on one thread, are split and run on 8 threads. As you already figured out, 8 threads doesn’t mean 8 times faster (unless you actually have 8 cores :)) and this is the reality. Splitting the operations over multiple threads doesn’t only give sub-linear gains, but also diminishing return, as more the threads, less and less faster will the operations run, until increasing threads won’t make any difference. This is due to what I was explaining earlier. Multi-threading is no magic. Physically your CPU cannot run more threads than its cores and this is true for all the threads of all the processes running at the same time on your Operative System. That’s why CPU multi-threading makes more sense for asynchronous operations that can run over time or operations that involve waiting for external sources (sockets, disks and so on), so that the thread can pause until the data is received, leaving the space to other threads to run meanwhile. Memory access is also a big bottleneck, especially when cache missing is involved (the CPU may even decide to switch to other threads while waiting for the memory to be read, this is what Hyperthreading actually does)

This is how I use the MultiThreadedParallelTaskCollection in the example:

This collection allows to add Enumerators to be executed later on multiple threads. The number of threads to activate is passed by constructor. It’s interesting to note that the number of enumerators to run is independent by the number of threads, although in this case they are mapped 1 to 1. The MultiThreadedParallelTaskCollection, being an IEnumerator, can be scheduler on whatever runner, but the sets of tasks will always run on their own, pre-activated, threads.

The way I am using threading in this example is not the way it should be used in real life. First of all I am actually blocking the main thread to wait for the other threads to finish, so that I can actually measure how long the threads take to finish the operations. In a real life scenario, the code shouldn’t be written to wait for the threads to finish. For example, handling AI could run independently by the frame rate. I am thinking about several way to manage synchronization so that will be possible not just to exploit continuation between threads, but even run tasks on different threads at the same time and synchronize them. WaitForSignalEnumerator is an example of what I have in mind, more will come.

All right, we are at the end of the article now. I need to repeat myself here: this article doesn’t show really what is possible to do with Svelto.ECS and Svelto.Tasks in its entirety, this is just a glimpse of the possibilities opened by this infrastructure. Also the purpose of this article wasn’t about showing the level of complexity that is possible to handle easily with the Svelto framework, but just to show you how important is to know how to optimize our code. The most important optimizations are first the algorithmic ones, then the data structure related ones and eventually the ones at instruction level. Multi-threading is not about optimization, but instead being able to actually exploit the power of the CPU used. I also tried to highlight the CPU threading is not about massive parallelism and GPUs should be used for this purpose instead.

The Importance of the Compiler (Client 1.5ms!!)

I started to wonder, what if I had the chance to use a good c++ compiler to see if it could do a better work with auto-vectorization? After all, we are aware of the fact that the JIT compiler can’t do miracles in this sense. Since IL2CPP is not available for Windows platform, I compiled the same code for UWP, using the IL2CPP option. I guess the results are pretty clear, IL2CPP produces an executable that is twice as fast as the c# version. Since there isn’t any call to the native code, can’t be because of the benefit due to the lack of marshalling. I haven’t had the time to verify yet what the real difference is, but auto-vectorization may be the reason.

Conclusions

While this article is implicitly (and strongly) advising you to follow the Entity-Component-System pattern to centralize and modularize the logic of your game entities and gives another example of Svelto.Tasks usage to easily exploit multithreading, it has the main goal to show you why is needed to pay attention to how the code is written even if we talk about c# and Unity. While the effectiveness of these extreme optimizations largely depend on the context, understanding how the compiler generates the final code is necessary to not degrade the performance of large projects.

Having a light knowledge of modern CPUs architectures (like I do) helps to understand how to better exploit the CPU cache and optimize memory accesses. Multi-threading really needs more separate articles, but I now can state with certainty that exploiting the full power of the CPU with Unity is more than feasible, although the biggest bottleneck will remain the rendering pipeline itself.

To be honest, most of the Unity optimization issues are often related to the marshalling that the Unity framework performs when is time to call the Unity native functions. Unluckily these functions are many and are called very often, becoming the biggest issue we found during our optimizations. GameObject hierarchy complexity, calling Transform functions multiple times and similar problems can really kill the performance of your product. I may talk about these finding in future with a separate article, but many have been discussed already by other developers. The real goal to pursue is to find the right balance of what you can do in c# and what must be left to Unity. The ECS pattern can help with this a lot, avoiding the use of MonoBehaviour functionalities when they are not strictly necessary.

For example, Iet’s try to instantiate 256k GameObjects (yes it’s possible to do) and add a Monobehaviour each that simply runs the fastest version of the test. On my PC, it runs at 105ms, and even if profiling with Unity doesn’t give the best accuracy, it seems that half of this time is spent for Marshalling (very likely the time between switching between c# and c++ for each update, my current assumption is that Monobehaviours are stored as native objects and they need to switch to managed code for each single Update called, this is just crazy!).

Final Results:
Editor Client
ECS Naive Code 113ms 58ms
ECS No Calls Involved 57ms 23ms
ECS Structs only 24ms 16ms
ECS Multithreading (8 threads) 8ms 3ms
ECS Multithreading UWP/IL2CPP (8 threads) n/a 1.5ms
Classic GameObject/Update approach (no calls inside) 105ms 45ms
Classic GameObject/Update approach (no calls inside) UWP n/a 22ms

TL;DR:

  • I crammed too many points in this article 🙂
  • Optimize your algorithms first, you don’t want any quadratic algoritm in your code. Logarithmic is the way to go.
  • Optimize your datastructures next, you must know how your data structures work in memory to efficiently use them. There is difference between an array, a list and a linkedlist!
  • Optimize your instructions paying attention to memory accesses and caching for last (if you want to squeeze those MS OUT!). Coding to exploit the CPU cache is as important as using multi-threading. Of course not all the Engines need to be so optimized, so it depends by the situation. I will link some good references when I have the time, as there is much more to discuss about this point.
  • When using Unity, be very careful to minimize the usage of wrappers to native functions. These are many and often can be easily identified in Visual Studio navigating to the declaration of the function.
  • Mind that Unity callbacks (Update, LateUpdate etc) are often Marshalled from native code, making them slow to use. The Unity hierarchical profiler tells you half the truth, showing the time taken inside the Update, but not between Updates (usually shown on the “others” graph). The Timeline view will show “gaps” between Update calls, which is the time used for Marshalling them.
  • Multithreading is tricky but Svelto.Task can massively help
  • ECS pattern allows to write code that exploits temporal and spatial locality (As you can manage all your entities in one central place). Try Svelto.ECS to see what you can do 😉
  • ECS pattern will also help you to minimize the use of Unity functionalities which most of the time involve costly marshalling operations to switch to the native code. Most of them are not really justified and due to old legacy code that has never been updated in 10 years!
  • I understand that is hard to relate to an example that iterates over 256k entities. However moving logic from Monobehaviours to ECS engines saved dozens of milliseconds in the execution of the real-life code of our projects.
  • Why don’t they finish IL2CPP for standalone platforms? At least windows only!
  • Don’t use the editor to profile 🙂

Please leave here your thoughts, I will probably expand this article with my new findings, especially the ones related to the cache optimizations.

For more Multi-threading shenanigans check my new article: Learning Svelto Tasks by example: The Million Points example 

External resources:

Svelto TaskRunner – Run Serial and Parallel Asynchronous Tasks in Unity3D

Note: this is an on-going article and is updated with the new features introduced over the time. 

In this article I will introduce a better way to run coroutines using the Svelto.Tasks TaskRunner. I will also show how to run coroutine between threads, easily and safely. You can finally exploit the power of your processors, even if you don’t know much about multithreading. If you use Unity, you will be surprised about how simple is to pass results computed from the multithreaded coroutines to the main thread coroutines.

What we got already: Unity and StartCoroutine

If you are a Unity developer, chances are you know already how StartCoroutine works and how it exploits the powerful c# yield keyword to time slice complex routines. A coroutine is a quite handy and clean way to execute procedures over time or better, asynchronous tasks.

Lately Unity improved the support of Coroutines and new fancy things are now possible to achieve. For example, it was already possible to run tasks in serial doing something like:

And apparently* it was also possible to exploit a basic form of continuation starting a coroutine from another coroutine:

*apparently because I never tried this on unity 4 and I wasn’t aware at that time that it was possible.

However lately it’s also possible to return an IEnumerator directly from another IEnumerator without running a new coroutine, which is actually almost 3 times faster, in terms of overhead, than the previous method:

Run parallel routines is also possible, however there is no elegant way to exploit continuation when multiple StartCoroutine happen at once. Basically there is no simple way to know when all the coroutines are completed.

I should add that Unity tried to extend the functionality of the Coroutines introducing new concepts like the CustomYieldInstruction, however it fails to create a tool that can be used to solve more complex problems in a simple and elegant way, problems like, but not limited to, running several sets of parallel and serial asynchronous tasks.

Introducing Svelto.Tasks

Being limited by what Unity can achieve, a couple of years ago I started to work on my TaskRunner library and spent the last few months to evolve it in something more powerful and interesting. The set of use cases that the TaskRunner can now solve elegantly, is quite broad, but before to show a subset of them as example, I will list the main reasons why TaskRunner should be used in place of StartCoroutine:

  • you can use the TaskRunner everywhere, you don’t need to be in a Monobehaviour. The whole Svelto framework focuses on shifting the Unity programming paradigm from the use of the Monobehaviour class to more modern and flexible patterns.
  • you can use the TaskRunner to run Serial and Parallel tasks, exploiting continuation without needing to use callbacks.
  • you can pause, resume and stop whatever set of tasks running.
  • you can catch exceptions from whatever set of tasks running.
  • you can pass parameters to whatever set of tasks running.
  • you can exploit continuation between threads (!).
  • Whatever the number of tasks you are running is, the TaskRunner will always run just one Unity coroutine (with some exceptions).
  • you can run tasks on different schedulers (including schedulers on different threads!).
  • you can transform whatever asynchronous operation into a task, thanks to the ITask interface.

A subset of use cases that the TaskRunner is capable to handle, is what I am going to show you soon, and I am sure you will be surprised by some of them :). TaskRunner can be used in multiple ways and, while the performance doesn’t change much between methods, you would fully exploit the power of the library only knowing when to use what. Let’s start:

The simplest way to use the TaskRunner is to use the function Run, passing whatever IEnumerator in it.

This simply does what says on the tin. It’s very similar to the StartCoroutine function, but it can be called from everywhere. Just pay attention to the fact that
TaskRunner uses the not generic IEnumerator underneath, so using generic IEnumerator, with a value type as parameter, will always result in boxing (as shown in the example above).

TaskRunner can be also used to run every combination of serial and parallel tasks in this way:

But why not exploit the IEnumerator continuation? It’s more elegant than using callbacks and we don’t need to use a SerialTaskCollection explicitly (with no loss of performance). We won’t even need to use two ParallelTasks:

if you feel fancy, you can also use the extension methods provided:

Svelto.Tasks and Unity compatibility

you are used to yield special objects like WWW, WaitForSeconds or WaitForEndOfFrame. Those functions are not enumerators and they work because Unity is able to recognize them and run special functions accordingly. For example, when you return WWW, Unity will run a background thread to execute the http request. If WWW is not able to reach Unity framework, it will never be able to run properly. For this reason, the MainThreadRunner is actually compatible with all the Unity functions. You can yield them, however there are limitations: you cannot yield them, as they are, from a ParallelTaskCollection. If you do it, the ParallelTaskCollection will stop executing and will wait Unity to return the result, effectively loosing the benefits of the process. Whenever you return a Unity special async function from inside a ParallelTaskCollection, you’ll need to wrap it inside an IEnumerator if you want to take advantage of the parallel execution. This is the reason why WWWEnumeratorWaitForSecondsEnumerator and AsyncOperationEnumerator exist.

TaskRoutines and Promises

When c# coders think about asynchronous tasks, they think about the .net Task Library. The Task Library is an example of an easy to use tool that can be used to solve very complex problems. The main reason why the Task Library is so flexible, is because it’s Promises compliant. While the Promises design has been proved proficient through many libraries, it can also be implemented in several ways, but in every case, what makes the promises powerful, is the idea to implement continuation without using messy events all over the code.

In Sveto.Tasks, the promises pattern is implemented through the ITaskRoutine interface.  let’s see how it works: an ITaskRoutine is a coroutine already prepared and ready to start at your command. To create a new ITaskRoutine simply run this function:

Since an allocation actually happens, it’s best to preallocate and prepare a routine during the initialization phase and run it during the execution phase. A task routine can also be reused, changing all the parameters, before to run it again.  Running an empty ITaskRoutine will result in an exception thrown, so we need to prepare it first. You can do something like:

In this case I used the function SetEnumeratorProvider instead of SetEnumerator. In this way the Task Runner is able to recreate the enumerator in case you want to start the same function multiple times. Let’s see what we can do:

We can Start the routine like this using

We can Pause the routine using

We can Resume the routine using

we can Stop the routine using (it’s getting tedious)

we can Restart the routine using

You can check the ExampleTaskRoutine example out to see how it works.

Let’s see how ITaskRoutine are compliant with Promises. As we have seen, we can pipe serial and/or parallel tasks and exploit continuation. We can get the result from the previous enumerator as well, using the current properties. We can pass parameters, through the enumerator function itself. The only feature we haven’t seen yet is how to handle failures, which is obviously possible too.

For the failure case I used an approach similar to the .net Task library. You can either stop a routine from a routine, yielding Break.It; or throwing an exception. All the exceptions, including the ones threw on purpose, will interrupt the execution of the current coroutine chain. Let’s see how to handle both cases with some, not so practical, examples.

In the example above we can see several new concepts. First of all, it shows how to use the Start() method providing what to execute when the ITaskRoutine is stopped or if an exception is thrown from inside the routine. It shows how to yield Break.It to emulate the Race function of the promises pattern. Break.It is not like returning yield break, it will actually break the whole coroutine from where the current enumerator has been generated. At last it shows how to yield an array of IEnumerator as syntactic sugar in place of the normal Parallel Task generation. Just to be precise, OnStop will NOT be called when the task routine completes, it will be called only when ITaskRoutine Stop() or yield Break.It are used.

Update:

Break.it will now break the current running task collection. This means that if you run Break.It inside a ParallelTaskCollection or SerialTaskCollection it will break the current collection only and not the whole ITaskRoutine. In this case Stop() won’t be called, but the TaskCollection completes. This is how to use Break.It in a real life scenario:

 

Now let’s talk about something quite interesting: the schedulers. So far we have seen our tasks running always on the standard scheduler, which is the Unity main thread scheduler. However you are able to define your own scheduler and you can run the task whenever you want! For example, you may want to run tasks during the LateUpdate or the PhysicUpdate. In this case you may implement your own IRunner scheduler or even inherit from MonoRunner and run the StartCoroutineInternal as a callback inside the Monobehaviour that will drive the LateUpdate or PhysicUpdate. Using a different scheduler than the default one is pretty straightforward:

 

Multithread and Continuation between threads

But what if I tell you that you can run tasks also on other threads? Yep that’s right, your scheduler can run on another thread as well and, in fact, one Multithreaded scheduler is already available. However you may wonder, what would be the practical way to use a multithreaded scheduler? Well, let’s spend some time on it, since what I came out with, is actually quite intriguing. Caution, we are now stepping in the twilight zone.

First of all, all the features so far mentioned work on whatever scheduler you put them on. This is fundamental in the design, however some limitations may be present due to the Unity not thread safe nature. For example, the MultiThreadRunner, won’t be able to detect special Unity coroutines, like WWW, AsyncOperation or YieldInstruction, which is obvious, since they cannot run on anything else than the main thread. You may wonder what the point of using a MultiThreadRunner is, if eventually it cannot be used with Unity functions. The answer is continuation! With continuation you can achieve pretty sweet effects.

Let’s see an example, enabling the PerformanceMT GameObject from the scene included in the library code. It compares the same code running on a normal StartCoroutine (SpawnObjects) and on another thread (SpawnObjectsMT). Enable only the MonoBehaviour you want test to compare the performance. What’s happening? Both MBs spawn 150 spheres that will move along random directions. In both cases, a slow coroutine runs. The coroutine goal is to compute the largest prime number smaller than a given random value between 0 and 1000; The result will be used to compute the current sphere color which will be updated as soon as it’s ready. The following is the multithreaded version:

Well I hope it’s clear to you at glance. First we run CalculateAndShowNumber on the multiThreadScheduler. We use the same MultiThreadRunner instance for all the game objects, because otherwise we would spawn a new thread for each sphere and we don’t want that. One extra thread is more than enough (I will spend few words on it in a bit).

FindPrimeNumber is supposed to be a slow function, which it is. As a matter of fact, if you run the single threaded version (enabling the SpawnObject monobehaviour instead of SpawnObjectMT) you will notice that the frame rate is fairly slow. In fact, the GPU must wait the CPU to compute the prime number.

The Multithreaded version runs the main enumerator on another thread, but how can the color be set since it’s impossible to use the Renderer component from anything else than the main thread? This is where a bit of magic happens. Returning the enumerator from a task, running on another scheduler, will actually continue its execution on that scheduler. You may think that at this point the thread will wait for the enumerator running on the main thread to continue. This is partially true, since differently than a Thread.Join(), the thread is actually not stalled, it will continue yielding, so if other tasks are running on the same thread, they will be actually processed. At the end of the main thread enumerator, the path will return to the other thread and continue from there. Quite fascinating, I’d say, also because you could expect great difference in performance.

So, We have seen some advanced applications of the TaskRunner using different threads, but since Unity will soon support c#6, you could wonder why to use the Svelto TaskRunner instead of the Task.Net library.  Well, they serve two different purposes. Task.Net library has not been designed for applications that could run heavy routines on threads. The Task.Net and the await/async keywords heavily exploit the Thread.Pool to use as many threads as possible, with the condition that most of the time, these threads are very short-lived. Basically it’s designed to serve applications that run hundreds of short lived and light asynchronous tasks.  This is usually true when we talk about server applications.

For games instead, what we generally need, are few threads where to run heavy operations that can go in parallel with the main thread and this is what the TaskRunner has been designed for. You will also be sure that all the routines running on the same MultiThreadRunner scheduler instance, won’t occur in any concurrency issue. In fact, you may see every Multithread runner as a Fiber (if you feel brave, you can also check this very interesting video). It’s also worth to notice that the MultiThreadRunner will keep the thread alive as long as it is actually used, effectively letting the Thread.Pool (used underneath) to manage the threads properly.

Other stuff…

To complete this article, I will spend few words on other two minor features. As previously mentioned, the TaskRunner will identify IAbstractTask implementations as well. Usually you will need to implement an ITask interface to be useful. The ITask is meant to transform whatever class in a task that can be executed through the task runner. For example, it could be used to run web services, which result will be yielded on the main thread.