A little chat with: Gabriele Galatolo, Back-end developer and Data Scientist

Author:

Kode s.r.l.

Date:

31.08.2023

Topic:

Interview

The Data Science field often requires multifaceted skills, which are hard to define (and sometimes to explain) in short. Today we want to try clarifying it with Gabriele, who, at Kode, thinks about the solutions and products to develop right from the structuring of their engine: back-end and middleware.

Gabriele Galatolo, Back-end developer and Data Scientist

Back-end Developer & Data Scientist, a double definition, would you like to try to explain what your job consists of?

It is always complicated to say in a concise way what my job is. I have long chosen to say that I am a developer. Hence everyone immediately thinks of the guy behind the computer writing codes. Which I do: I work on the back-end and middleware to make things possible. Except that my reality is far from the usual idea of development (related to the services we all use everyday).

To complicate matters, I not only deal with software development in the data field (which entails relevant specificities). The fact is that I approach projects with a particular perspective. Indeed my point of view shifts the focus from extracting from data, the information our customer wants to understand. I think more broadly about how to ensure that this extraction works. I have always been focused on the engine, which is something one generally knows little or nothing about.

Infrastructural (or middleware) development, however, is what makes things possible. Tt is what guarantees a certain level of performance, or what makes a solution scalable.

Working at the infrastructural or middleware level means taking charge of all this information without necessarily knowing its semantics. Data science projects are a well-defined pipeline (data extraction, preprocessing, modelling, visualisation). We strive to optimise each of these blocks. For example, in data storage (as with other components of Princess*), whether I deal with the way the various types of databases display information, I can ignore the content of the data I am working on. I mean, I can ignore the semantics, at least when I am not dealing with the knowledge extraction and client services part in projects. Which, on the other hand, is key, for example, when I’m in charge of computer vision or neural network-based projects.

So, in addition to a double definition, you really have a double role. Does this imply a different outlook in your approach to projects?

Probably yes. When I’m doing the data science part, I can’t help but ask myself how to optimise the solution. How to structure its middleware and not to blow it all up? how not to make the analysis last a hundred years? What is the limit beyond which not to push? These are points that not everyone in development thinks about.

Perhaps this insight comes to me from afar. I come from the world of computer science. The idea that computer science is about finding simple and fast, working solutions to complex problems, comes from far back in time. We always had computers at home; I only played games with them, but my father did not. I used to see him studying manuals and finding ways to do new things, so I started trying too. However, at a certain point, I changed: from someone who fussed around to someone who uses the subject differently. I shall also thank my first computer science teacher at high school. She practically never let us use the computer for the entire first year. The reason why is that computer science is not a programming language, but a way to solve automation problems efficiently.

Computer science tackles complex problems and its solutions can be replicated in many areas… Do you know the queuing theory? We may apply it to the context of supermarkets as well as to data management. I have been working for years on the middleware of a high frequency trading platform, where speed of data extraction was crucial. If the answers did not arrive within a millisecond, they opened an issue.

I brought this experience here in Kode, especially the focus on the performance of what we develop. We actually don’t have time limits on the order of milliseconds. The things we do are intensive, both because of the amount of data to process and the heavy frameworks to use. Let’s see an example in the manifacturing field. At the end of a batch, we shall analyse the production that has just finished. It measn that we have a few minutes, since we shall show results before the beginning of the following batch.

To get back to the original question: I never really go out of one hat or the other. As a matter of facts, my view of projects is always a mix from both points of view.

Can you give us some examples of infrastructure development activities in the Data Science world?

There are very specific activities to our field, such as data storage; but on the infrastructure side we also work on the development of frameworks (such as Princess*). These are mainly tools to reduce the repetition of development or simplify and optimise integrations.

Another tool we are working on, for example, is an Execution Assistant, which allows serverless functions to be developed, removing the responsibility from the developer to take care of the deployment.

it’s a movement, which I think will become more and more popular, aiming at enabling the developer not to worry about how and where his software, or the functionality he develops, will run.

This approach applies well to Data Science, because by working in a pipeline with well-structured and unpackable phases, it is possible to say to the developer of the data analysis, machine learning and modelling part: “Choose freely the languages and methods you consider most suitable, because how it will run, with what resources, and other technical aspects, is not your concern, someone else will take care of it”; or rather: something else.

*You mentioned Princess a couple of times, can you tell us what it is? Did you develop it?

Yes, actually I’m the daddy. And the mum too… In short Princess is a proprietary Kode’s framework that holds, like a real toolbox, the development tools for the Data Scientist. But Princess is another story, we’ll tell it next time.

A little chat with: Gabriele Galatolo, Back-end developer and Data Scientist

Back-end Developer & Data Scientist, a double definition, would you like to try to explain what your job consists of?

So, in addition to a double definition, you really have a double role. Does this imply a different outlook in your approach to projects?

Can you give us some examples of infrastructure development activities in the Data Science world?

*You mentioned Princess a couple of times, can you tell us what it is? Did you develop it?

Contact form

Thank you for your message