Igor Muniz Soares

Brazil · United States · +55 62 999772996 · igor.muniz.ims@gmail.com
Machine Learning Engineer · Data Scientist · Master of Computer Engineering · Kaggle Competition Expert

With over 6 years of experience in machine learning, I excel at extracting valuable insights from diverse datasets. My strong skills in python coding, statistics, and deep learning enable me to tackle complex problems effectively. Alongside my professional work, I nurture a passion for puzzles and actively participate in Kaggle competitions during my free time. Furthermore, I have had the privilege of teaching machine learning, deep learning, and data science to hundreds of students, contributing to the growth and development of future professionals in the field. Currently interested in generative AI, focusing my studies on Large Language Models, Vector Databases and Langchain.


Experience

Lead Machine Learning Engineer

Exponential Ventures - Austin, Texas, US

Exponential Ventures is an American Startup that solve today's biggest problems with Exponential Technologies such as AI, HMI, Robotics, and Quantum computing. As a Machine Learning Engineer, I'm responsible for defining our products’ data architecture and AI models. I’m also the leader of the data science team, defining the tasks, controlling the environment and being their support.

I've worked or I'm working on the following projects:

  • Development of all financial rules and ml models for a fintech with ROSCA model
  • Advanced the state-of-the-art of Genetic Engineering Attribution, placing 10th in Prediction Track and 2nd in Innovation Track (out of 1,211 teams) in the Genetic Engineering Attribution Challenge.
  • Improved the autoML pipelines for one of the internal products.
  • Recommendation System for e-commerce products using computer vision and NLP.
  • Computer vision project for product classification and brand and model identification.
  • Computer Vision Project for image classification and segmentation of Leafs.

August 2020 - Present

Applied NLP Researcher

CeIA - Center of Excellence in Artificial Intelligence

CeIA is a scientific laboratory financed by private companies operating in several segments such as energy, retail, delivery, health, among others. I work part-time there as a machine learning engineer consultant, developing and researching natural language processing models.

Projects developed by me there:

  • Pipeline development for building, training, deploying and automatically inferring machine learning models in a chatbot platform.
  • Development of a QnA system
  • Development of parse models and embeddings generators to correct address strings.
  • Development of document retrieval and question-answering system using Large Language Models, Langchain, Transformers, Hugging Face and Vector Database (Milvus)
March 2021 - Jun 2023

Deep Learning Professor

FASAM - South American College

Responsible for deep learning classes in the graduate course in Big Data and Machine Learning that takes place on weekends.

January 2021 - Jul 2022

Data Scientist

Indra Company

Responsible for analyzing data, discovering patterns and proposing solutions to business problems in many different areas, creating custom reports for clients and automating processes for data prediction or classification.

Projects developed by me there:

  • Research, development, training and improvements of existing ocr models in portuguese.
  • Classification and segmentation of different cloud types from satellite images.
  • Adaptation and training of a deep learning model for a question answering system in Portuguese. Model based on Bert and Google QA Net. This system is applied in a Chatbot capable of answer questions based on unstructured texts.
  • Model development for natural language conversion into sql language, allowing users to perform database queries.
  • Training and classification of intentions present in certain phrases for Chatbots flow control
  • System to suggest work inspections based on risk of accidents. Exploratory analysis of historical accident data and construction of predictive model based on factors determined by the customer
  • Big data analysis for cross-checking information from Telefônica Brasil in order to find patterns in customers with internet peer disconnection issues
  • Data analysis and development of machine learning model to classify possible frauds in government advance database
February 2018 - August 2020

Deep Learning Researcher

Federal University of Goiás

Research on artificial intelligence techniques for person detection, pose estimation and activity classification on imagens. During this time, it was developed the work "A complete bottom-up approach to recognizing human activities in images through the estimated pose with convolutional networks"

February 2017 - July 2019

IoT Intern

MPT Engenharia

Software development in C / C ++ and Matlab and PCB prototyping with ATMEGA microcontrollers.

March 2015 - July 2016

Education

Federal University of Goiás

Master of Science
Computer Engineering

Intelligent Systems

March 2017 - September 2019

Federal University of Goiás

Bachelor of Science

Computer Engineering

March 2011 - December 2016

Skills

Programming Languages & Tools

  • Python
  • C/C++
  • SQL
  • Tensorflow
  • Keras
  • Pytorch
  • Pandas
  • Scikit-Learn
  • Hugging Face
  • GCP
  • AWS
  • Spark
  • OpenCV
  • Spacy
  • NLTK
  • Hadoop Platform
  • FastAPI
  • Docker

Knowledges

  • Software Engineering
  • DevOps/MLOps
  • Math/Statistics
  • Research
  • Machine Learning / Deep Learning
  • Computer Vision
  • Natural Language Processing
  • Cloud Computing
  • Data Visualization/Presentation
  • Linux/Unix

Non-Technical Skills

  • Business acumen
  • Teamwork
  • Leadership
  • Problem Solver

Projects

Some of the projects I worked on and gave me experience on the following topics:

Natural Language Processing

Question Answering System

Adaptation and training of a deep learning model for a question answering system in Portuguese. Model based on Bert and Google QA Net. This system is applied in a Chatbot capable of answer questions based on unstructured texts

Intention Classification System

Training and classification of intentions present in certain phrases for Chatbots flow control

SeqTag: Sequence Tagging for portuguese datasets

Development of a deep learning model for token classification and sequence tagging in portuguese texts

Code:
Product categorization from title classification

Project developed for the Mercado Libre Data Challenge able to classify Portuguese and Spanish texts.

Code:
Natural Language to SQL

Model development for natural language conversion into sql language, allowing users to perform database queries.

Computer Vision

Activity Recognition based on pose estimation

This work propose a single end-to-end model able to detect people, estimate their pose, and recognize each one of their activities by their pose. The experiments show that the model has reached the state of the art in the tasks of person detection and pose estimation on MSCOCO Dataset 2017, and can recognize walking, running, sitting, and standing activities with an F1 score of 0.7344

Code:
Optical Character Recognition (OCR)

Training and improvements of existing ocr models in portuguese

Understanding Clouds from Satellite Images

Classification and segmentation of different cloud types from satellite images. 22nd place solution

Code:
AI and Computer Vision for Medicine

Development of deep learning models for identification of pulmonary diseases and intracranial hemorrhage in X-rays

Tabular Data

internet disconnection causes identification

Big data analysis for cross-checking information from Telefônica Brasil in order to find patterns in customers with internet peer disconnection issues

Fraud Identification

Data analysis and development of machine learning model to classify possible frauds in government advance database

Classification of intramuscular signals

Development of an artificial neural network to classify arm movements from the collected intramuscular signals. This feature is part of building a myoelectric prosthesis for people with amputated arms.

Code:

Publications

Improving lab-of-origin prediction of genetically engineered plasmids via deep metric learning
Soares, I.M., Camargo, F.H.F., Marques, A. et al.
Arxiv
Nature Computational Science - 2022
Deep metric learning improves lab of origin prediction of genetically engineered plasmids
IM Soares, FHF Camargo, A Marques, OM Crook
Arxiv
Nature Computational Science - 2022 (Preprint)
Ranking labs-of-origin for genetically engineered DNA using Metric Learning
I. Muniz, F. H. F. Camargo, A. Marques
Arxiv
arXiv - 2021
An end-to-end approach for recognizing human activity in images using pose estimation
I. Muniz, C. Vinhal, G. da Cruz Jr.
arXiv - 2019

Awards

  • 1st Place in the 2nd Workshop on Artificial Intelligence - Federal University of Goiás 2019
  • 2nd Place in the "Porto Seguro Data Challenge" - Kaggle 2021
  • 2nd Place in the "Genetic Engineering Attribution Challenge Innovation Track" - DrivenData 2021
  • 2nd Place in the Hackathon "Dev for a Change" - Indra Brasil 2019
  • 9th Place in the Latin America Mercado Libre Data Challenge - https://ml-challenge.mercadolibre.com/ 2019
  • 10th Place in the "Genetic Engineering Attribution Challenge Prediction Track" - DrivenData 2021
  • 22nd Place (Silver Medal) against 1538 teams in the Understanding Clouds from Satellite Images Challenge - Kaggle 2019
  • 29nd Place (Silver Medal) against 1305 teams in the SIIM-FISABIO-RSNA COVID-19 Detection - Kaggle 2021
  • 30th Place (Silver Medal) against 1620 teams in the Santa's Workshop Tour 2019 - Kaggle 2019/2020
  • 77th Place (Bronze Medal) against 1233 teams in the TensorFlow 2.0 Question Answering - Kaggle 2020
  • 78th Place (Bronze Medal) against 1275 teams in the VinBigData Chest X-ray Abnormalities Detection - Kaggle 2021