WhiteBox Machine Learning logo

We craft AI systems for both enterprises and startups.

Values

Transparency

We do not design black boxes to charge you expensive maintenance fees. We will make sure that you understand our solutions and get the know-how you need to maintain and improve yourself, with or without us.

Simplicity

Simple is better. AI is full of over-engineered and obscure solutions that slow down the adoption of artificial intelligence. Keeping things simple is hard and require actual knowledge and previous thinking, research and planning.

Quality

We will never compromise quality. A single carefully made solution which successfully add value is better than a thousand botched, half-made systems.

Approach

1. AI planning and strategy

Get the right assessment before jumping into the AI race. Avoid common mistakes which will cause your AI strategy to derail and underdeliver. Let us help you to make things right from the beginning, from Data Engineering to Machine Learning deployment.

2. Data Engineering

Lots of Data? We are used to work with Big Data workloads using frameworks like Hadoop or Spark, and successfully built and deployed AI at scale.

3. AI model designing

We are excellent modelers with business experience in many sectors and real world use cases using a broad range of technologies, from classic Machine Learning to Deep Learning. And the best of all, we have principles crystal clear independently of technology.

4. AI deployment

Most companies fail while deploying their models from prototype to production. Did your company spend a fortune hiring a horde of PhDs and only got a bunch of serialized R models with no actual value in return? We can help you.

Portfolio

Wind Power Forecasting

Energy

Description
Prediction of future power generation of renewable energy plants based on meteorological information (three-dimensional grids).
Results
Improvement of 10 percentage points in the normalized forecasting error versus previous models. Millions of $ in savings for our client when going to the US electricity bid.
Technology
Data ingestion with Spark (Scala) over Hortonworks infrastructure. Modeling with scikit-learn and lightGBM. Orchestration with Apache Airflow.

Predictive Maintenance

Energy

Description
Calculation of damage due to fatigue of structural components of wind turbines and the time of failure forecasting.
Results
Development of a model which emulates the results of commercial enterprise software, which generates a saving of ~30K€ / year in software licenses and a huge increase in Know-How about wind turbines fatigue.
Technology
Treatment and study of fatigue time series with pandas. Predictive model based on linear methods. High performance computing with NumPy and Fortran 90.

Mobile Network User Experience

Telecommunications

Description
Development of a Customer Experience Management (CEM) Framework. Network quality data ingestion from 3G / 4G antennas. Creation of a model to predict customer experience effect on churn and complaints.
Results
Complete mobile network monitoring and ability to identify antennas with low performance and negative impact on user experience at individual aggregation level.
Technology
Data ingestion made with Python and Impala over Cloudera infrastructure. Modeling with scikit-learn. Visualizations with Matplotlib and seaborn. Dashboards with Microsoft Power BI.

User Complaints Forecasting

Telecommunications

Description
Predictive model for customers probability to open issues due to disagreements in billing, based on their consumption patterns and individual personal profile.
Results
Classification model with ~ 0.85 AUC over very unbalanced data. Automation of part of the customer service process.
Technology
Creation of data mart using PySpark. Modeling using Spark MLlib. Visualizations with Matplotlib, Plotly and Seaborn.

Mobile Operators Benchmarking

Telecommunications

Description
Massive data ingestion of antenna performance, package loss, network parameters... collected by cars to calculate those KPIs associated to the benchmarking.
Results
KPIs of network performance for the mobile operators benchmarking on each country.
Technology
Spark on Java for the ETL, Impala and Hive for data analysis and Tableau for data visualization.

Data Pipelines Optimization

Healthcare & Pharma

Description
Client had a big problem in terms of Spark process performance due to the huge amount of data to process (~ TB).
Results
Complete optimization of the data pipeline that allowed to go from computing times in the order of days to a few minutes.
Technology
Spark with Scala for data processing. Flume and Sqoop for data ingestion. Storage in HDFS available through Hive SQL engine. All over MapR technology.

Analytical Engine for Researchers

Healthcare & Pharma

Description
Project with support of the H2020 program. Creation of an analytical engine capable of scraping data from different sources and relating them to each other. Search engine with the use of Natural Language Processing.
Results
Application that is able to translate searches in Natural Language into exportable tables and visualizations made from heterogeneous data.
Technology
NLP model made with NLTK. Web scraping with Selenium, Beautiful Soup and Requests. Web application based on Django. Visualizations with Bokeh.

Footfall Analytics

Retail

Description
People detection and counting both in store and on public roads and heat maps to determine the areas with the greatest influx in supermarket shelves.
Results
Success in getting the best possible store location (pedestrian analysis) and the best product layout (store heatmaps).
Technology
OpenCV for image processing and TensorFlow for detection models. All restricted to the requirements of the Google Edge TPU hardware.

Classification of Customer Issues

Retail

Description
Classification of customer issues in online stores. Data processing using NLP and categorization into 18 different types.
Results
Automation of part of the customer support process. Only those issues that the algorithm is unable to classify with confidence are handled manually.
Technology
NLP model made with NLTK. Integration with the rest of the systems in the form of REST API using Flask.

Moderator Guru

Startups

Description
The Moderator Guru is a Machine Learning, Natural Language Processing based text moderator service that detects and classifies offensive text messages.
Results
API endpoint capable to process thousands of requests per minute. Currently working on a Wordpress plugin.
URL: https://moderator-guru.com
Technology
NLP model focused on speed. Django for backend and vanilla frontend.

Renfe Guru

Startups

Description
Web scraping of renfe.com to extract high speed train data, in order to take advantage of pricing changes and save money in train journeys.
Results
Work In Progress. Expected alarm system for ticket pricing changes and Machine Learning model for predicting when these changes are more likely to happen.
Technology
Working on the most suitable algorithm for the train data. Django for backend and vanilla frontend.

DataTau

Startups

Description
DataTau is the reference newsboard for Data Scientists and Data Engineers inspired by the popular Hacker News. The site went down for a month and we decided to recover and open source it, rewriting application code from scratch and providing hosting.
Results
It reached Hacker News frontpage on its launch day and it is currently serving the hottest Data Science news to the World.
URL: https://datatau.net
Technology
Django for backend and vanilla frontend with a lot of Jinja..

Team

Pedro Muñoz

Pedro Muñoz

🔬 Data Scientist | 🏢 Big Data Architect | 🤖 Machine Learning Engineer
David Cañones

David Adrián Cañones

🔬 Data Scientist | 🤖 Machine Learning Engineer | 💹 MBA

Contact Us

Phone Number

+34 691 101 949
+34 637 539 220

Email Address

info@whiteboxml.com