Consulting Projects Portfolio

Wind Power Forecasting

Energy

Description
Prediction of future power generation of renewable energy plants based on meteorological information (three-dimensional grids).
Results
Improvement of 10 percentage points in the normalized forecasting error versus previous models. Millions of $ in savings for our client when going to the US electricity bid.
Technology
Data ingestion with Spark (Scala) over Hortonworks infrastructure. Modeling with scikit-learn and lightGBM. Orchestration with Apache Airflow.

Predictive Maintenance

Energy

Description
Calculation of damage due to fatigue of structural components of wind turbines and the time of failure forecasting.
Results
Development of a model which emulates the results of commercial enterprise software, which generates a saving of ~30K€ / year in software licenses and a huge increase in Know-How about wind turbines fatigue.
Technology
Treatment and study of fatigue time series with pandas. Predictive model based on linear methods. High performance computing with NumPy and Fortran 90.

Mobile Network User Experience

Telecommunications

Description
Development of a Customer Experience Management (CEM) Framework. Network quality data ingestion from 3G / 4G antennas. Creation of a model to predict customer experience effect on churn and complaints.
Results
Complete mobile network monitoring and ability to identify antennas with low performance and negative impact on user experience at individual aggregation level.
Technology
Data ingestion made with Python and Impala over Cloudera infrastructure. Modeling with scikit-learn. Visualizations with Matplotlib and seaborn. Dashboards with Microsoft Power BI.

User Complaints Forecasting

Telecommunications

Description
Predictive model for customers probability to open issues due to disagreements in billing, based on their consumption patterns and individual personal profile.
Results
Classification model with ~ 0.85 AUC over very unbalanced data. Automation of part of the customer service process.
Technology
Creation of data mart using PySpark. Modeling using Spark MLlib. Visualizations with Matplotlib, Plotly and Seaborn.

Mobile Operators Benchmarking

Telecommunications

Description
Massive data ingestion of antenna performance, package loss, network parameters... collected by cars to calculate those KPIs associated to the benchmarking.
Results
KPIs of network performance for the mobile operators benchmarking on each country.
Technology
Spark on Java for the ETL, Impala and Hive for data analysis and Tableau for data visualization.

Data Pipelines Optimization

Healthcare & Pharma

Description
Client had a big problem in terms of Spark process performance due to the huge amount of data to process (~ TB).
Results
Complete optimization of the data pipeline that allowed to go from computing times in the order of days to a few minutes.
Technology
Spark with Scala for data processing. Flume and Sqoop for data ingestion. Storage in HDFS available through Hive SQL engine. All over MapR technology.

Analytical Engine for Researchers

Healthcare & Pharma

Description
Project with support of the H2020 program. Creation of an analytical engine capable of scraping data from different sources and relating them to each other. Search engine with the use of Natural Language Processing.
Results
Application that is able to translate searches in Natural Language into exportable tables and visualizations made from heterogeneous data.
Technology
NLP model made with NLTK. Web scraping with Selenium, Beautiful Soup and Requests. Web application based on Django. Visualizations with Bokeh.

Footfall Analytics

Retail

Description
People detection and counting both in store and on public roads and heat maps to determine the areas with the greatest influx in supermarket shelves.
Results
Success in getting the best possible store location (pedestrian analysis) and the best product layout (store heatmaps).
Technology
OpenCV for image processing and TensorFlow for detection models. All restricted to the requirements of the Google Edge TPU hardware.

Classification of Customer Issues

Retail

Description
Classification of customer issues in online stores. Data processing using NLP and categorization into 18 different types.
Results
Automation of part of the customer support process. Only those issues that the algorithm is unable to classify with confidence are handled manually.
Technology
NLP model made with NLTK. Integration with the rest of the systems in the form of REST API using Flask.

Keywords CPC forecasting

Digital Marketing

Description
CPC (Cost Per Click) prediction of a keyword in Google Ads for keywords marked as those which are going to have impressions (previously chosen by a classification algorithm).
Results
Optimum start CPC for a keyword so that it could have impressions at the minimum possible cost for the company.
Technology
NLP model using Word2Vec technology (fastText) with two supervised models: one for the classification task (prediction of keywords that are going to have impressions) and another for learning from the historical CPC data.

Online Ads CTR forecasting

Digital Marketing

Description
CTR (Click Through Ratio) forecasting for online ads applying cutting edge NLP algorithms. An ads optimization framework was also developed to suggest texts to maximize CTR.
Results
Performance metric (R2) of 0.85 predicting CTR for an online ad with arbitrary text.
Technology
Developed using word embeddings based on fastText language models. PCP (regression) model developed with lightGBM. Model served in production using MLflow. Orchestration with Apache Airflow. Azure cloud infrastructure.

Social Networks Analysis

Market Research

Description
Developed a framework to process and analyze Twitter and Facebook user data (with their explicit permission) and identify specific topics (politics, sports, etc.) as well as sentiment polarity.
Results
This framework allowed our client to improve quality of their market research surveys, identifying users with expertise on a specific field thanks to their activity in social networks.
Technology
NLP modeling using fastText. User data processing using Twitter and Facebook APIs. Orchestration with Apache Airflow.

Mobile App user behavior analytics

Mobility

Description
Developed a framework to analyze user behavior in Android and iOS app for a popular urban mobility company. All analyses were carried out using massive data from app user events.
Results
Identified major flaws in user signup process which leaded to a redesign of signup flow resulting in an improvement from 25% to 50% registration. Developed an unsupervised learning algorithm able to locate user home and work locations based on app events.
Technology
Big Data processing with Apache Spark. In-memory calculations using pandas and Modin. Machine Learning algorithms developed with scikit-learn. AWS cloud infrastructure.

Products

The Moderator Guru

The Moderator Guru is a Machine Learning, Natural Language Processing based text moderator service that detects and classifies offensive text messages

It helps you blocking offensive or inappropriate messages before they reach your audience and identifies toxic users and trolls. It uses a lightweight but strong NLP technology which is able to spot and classify abusive messages, saving money and time to your moderation team

El Tren Barato

El Tren Barato is a web scraping based search engine for high speed trains in Spain along with an alarm system and pricing forecasting tool

It allows access to the best deals along with alarms for your selected trips so that you can get the best available price for the trains of your choice