1. Machine/Deep Learning tools
    1. Common issues
      1. Unbalanced Dataset
        1. Solution : Oversampling
    2. Technics
      1. Clustering
      2. Data Labeling
      3. Reinforcement Learning
        1. TensorTrade
      4. Text few shot learning
      5. Text summarization
      6. Transformers
      7. BERT
        1. Blogs
        2. BERTweet: A pre-trained language model for English Tweets
        3. COVID-Twitter-BERT
      8. GPT2
        1. aitextgen – Train a GPT-2 Text-Generating Model w/ GPU
        2. Relates Blogs
      9. Question Answering
      10. Reformers
    3. Benchmaks
      1. Text
        1. XGLUE: Expanding cross-lingual understanding and generation with tasks from real-world scenarios
    4. Useful Libs
      1. Wrapper
        1. Vision
        2. Text
          1. fast.ai Code-First Intro to Natural Language Processing
      2. Text
        1. NLTK
        2. SpaCy
        3. Transformers (HuggingFace)
          1. Related blogs
        4. Simple Transformers (based on HuggingFace)
        5. cdQA: Unsupervised QA
        6. Facebook UnsupervisedQA
      3. Others
        1. Facebook MMF
    5. Hands-on
      1. NLP
        1. Structured Data
      2. AutoML
        1. AutoKeras
        2. OVHcloud autoML
  2. Deep Learning use cases
    1. Nothing to Image
      1. Generative
        1. Face
          1. Disentangled Image Generation Through Structured Noise Injection
    2. Image to Anything
      1. Image to Image
        1. Avatarify : Deepdake for zoom
        2. Inpainting
          1. High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling
          2. EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning
          3. Progressive Image Inpainting with Full-Resolution Residual Network
        3. Super resolution
          1. PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models
          2. Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining
        4. Image-to-Image translation
          1. DeepFaceDrawing: Deep Generation of Face Images from Sketches
          2. UGATIT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (ICLR 2020)
          3. Selfie to Anime
        5. Segmentation
          1. Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3
          2. Attention-Guided Hierarchical Structure Aggregation for Image Matting
          3. Foreground-aware Semantic Representations for Image Harmonization
          4. Single-Stage Semantic Segmentation from Image Labels (CVPR 2020)
      2. Others
        1. Background Matting: The World is Your Green Screen
        2. 3D Photography using Context-aware Layered Depth Inpainting (CVPR 2020)
        3. Project an image centroid to another image using OpenCV
      3. Image to Text
        1. CompGuessWhat?!: a Multi-Task Evaluation Framework for Grounded Language Learning
        2. YOLOv4: Optimal Speed and Accuracy of Object Detection
        3. Image Captioning with PyTorch
        4. ResNeSt: Split-Attention Networks
        5. Hands-on guide to sign language classification using CNN
      4. Image to Sound/Speech
      5. Image to Video
    3. Text to Anything
      1. Text to Image
        1. Network Fusion for Content Creation with Conditional INNs (CVPR 2020)
      2. Text to Text
        1. Bilingual Translation
        2. T5 finetuning
        3. Training Electra
        4. Text Translation
        5. Text Generation
          1. Lyrics Generation
          2. Next Word Prediction
        6. Code to Code
          1. Unsupervised Translation of Programming Languages
      3. Text to Sound/Speech
        1. Pitchtron: Towards audiobook generation from ordinary people’s voices
        2. Transformers TTS
      4. Text to Video
    4. Sound/Speech to Anything
      1. Sound/Speech to Image
        1. Audio to Image Conversion
      2. Sound/Speech to Text
        1. Speech Command Recognition
      3. Sound/Speech to Sound/Speech
        1. Speaker-independent-emotional-voice-conversion-based-on-conditional-VAW-GAN-and-CWT
      4. Sound/Speech to Video
    5. Video to Anything
      1. Video to Video
        1. Segmentation
          1. MSeg : A Composite Dataset for Multi-domain Semantic Segmentation
          2. Motion Supervised co-part Segmentation
      2. Video to Image
      3. Video to Text
      4. Video toSound/Speech
      5. Video to Video
  3. Inference
    1. Python serving
    2. Fastai
    3. HuggingFace
    4. Hummingbird
  4. Tools
    1. Terminal
      1. Rich
    2. Python
      1. PyAudio FFT
      2. Process Mining : alpha-miner
      3. Image Feature extractor
  5. Cool projects
    1. Web based Training
    2. How to evaluate Longformer on TriviaQA using NLP
    3. Data Visualization
  6. Hardware
    1. GPU
      1. Nvidia
        1. Ampere
  7. MOOC
    1. Fast.ai
    2. Benchmark
      1. NLP

Machine/Deep Learning tools

Common issues

Unbalanced Dataset

Solution : Oversampling

Technics

Clustering

Data Labeling

Reinforcement Learning

TensorTrade

Text few shot learning

Text summarization

Great tutorial serie here

Transformers

@misc{vaswani2017attention,
    title={Attention Is All You Need},
    author={Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
    year={2017},
    eprint={1706.03762},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

BERT

Blogs

BERTweet: A pre-trained language model for English Tweets

COVID-Twitter-BERT

@article{muller2020covid,
  title={COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter},
  author={M{\"u}ller, Martin and Salath{\'e}, Marcel and Kummervold, Per E},
  journal={arXiv preprint arXiv:2005.07503},
  year={2020}
}

GPT2

aitextgen – Train a GPT-2 Text-Generating Model w/ GPU

Relates Blogs

Question Answering

Reformers

Benchmaks

Text

XGLUE: Expanding cross-lingual understanding and generation with tasks from real-world scenarios

Useful Libs

Wrapper

Vision

Text

fast.ai Code-First Intro to Natural Language Processing

here is the associated tutorial serie:

Text

NLTK

I consider @SentDex founder pythonprogramming.net and https://www.youtube.com/channel/sentdex as the best tutorial for NLTK

SpaCy

A good Spacy tutorial Youtube serie here :

Spacy channel :

Transformers (HuggingFace)

Simple Transformers (based on HuggingFace)

Simple Transformers is a wrapper on top of HuggingFace’s Transformer Library take makes it easy to setup and use, here is an example of binary classification :

from simpletransformers.classification import ClassificationModel
import pandas as pd
import logging


logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

# Train and Evaluation data needs to be in a Pandas Dataframe of two columns. The first column is the text with type str, and the second column is the label with type int.
train_data = [['Example sentence belonging to class 1', 1], ['Example sentence belonging to class 0', 0]]
train_df = pd.DataFrame(train_data)

eval_data = [['Example eval sentence belonging to class 1', 1], ['Example eval sentence belonging to class 0', 0]]
eval_df = pd.DataFrame(eval_data)

# Create a ClassificationModel
model = ClassificationModel('roberta', 'roberta-base') # You can set class weights by using the optional weight argument

# Train the model
model.train_model(train_df)

# Evaluate the model
result, model_outputs, wrong_predictions = model.eval_model(eval_df)

cdQA: Unsupervised QA

Facebook UnsupervisedQA

Others

Facebook MMF

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

@inproceedings{singh2018pythia,
  title={Pythia-a platform for vision \& language research},
  author={Singh, Amanpreet and Goswami, Vedanuj and Natarajan, Vivek and Jiang, Yu and Chen, Xinlei and Shah, Meet and Rohrbach, Marcus and Batra, Dhruv and Parikh, Devi},
  booktitle={SysML Workshop, NeurIPS},
  volume={2018},
  year={2018}
}

Hands-on

NLP

| Tool | Binary Classification | Multi-Label Classification | Question Answering | Tokenization | Generation | Named Entity Recognition | |-|-|-|-|-|-|-|

Structured Data

AutoML

AutoKeras

AutoKeras

from tensorflow.keras.datasets import mnist

import autokeras as ak

# Prepare the dataset.
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train.shape)  # (60000, 28, 28)
print(y_train.shape)  # (60000,)
print(y_train[:3])  # array([7, 2, 1], dtype=uint8)

# Initialize the ImageClassifier.
clf = ak.ImageClassifier(max_trials=3)
# Search for the best model.
clf.fit(x_train, y_train, epochs=10)
# Evaluate on the testing data.
print('Accuracy: {accuracy}'.format(
    accuracy=clf.evaluate(x_test, y_test)))
@inproceedings{jin2019auto,
  title={Auto-Keras: An Efficient Neural Architecture Search System},
  author={Jin, Haifeng and Song, Qingquan and Hu, Xia},
  booktitle={Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  pages={1946--1956},
  year={2019},
  organization={ACM}
}

OVHcloud autoML

Demo Demo2 Demo3

Deep Learning use cases

Nothing to Image

Generative

Face

Disentangled Image Generation Through Structured Noise Injection

https://github.com/yalharbi/StructuredNoiseInjection/raw/master/example_fakes_alllocal.png

@misc{alharbi2020disentangled,
    title={Disentangled Image Generation Through Structured Noise Injection},
    author={Yazeed Alharbi and Peter Wonka},
    year={2020},
    eprint={2004.12411},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Image to Anything

Image to Image

Avatarify : Deepdake for zoom

Inpainting

High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling

https://s1.ax1x.com/2020/03/18/8wQG5T.jpg

EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning

https://user-images.githubusercontent.com/1743048/50673917-aac15080-0faf-11e9-9100-ef10864087c8.png

@inproceedings{nazeri2019edgeconnect,
  title={EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning},
  author={Nazeri, Kamyar and Ng, Eric and Joseph, Tony and Qureshi, Faisal and Ebrahimi, Mehran},
  journal={arXiv preprint},
  year={2019},
}
Progressive Image Inpainting with Full-Resolution Residual Network

Before After

@misc{guo2019progressive,
    title={Progressive Image Inpainting with Full-Resolution Residual Network},
    author={Zongyu Guo and Zhibo Chen and Tao Yu and Jiale Chen and Sen Liu},
    year={2019},
    eprint={1907.10478},
    archivePrefix={arXiv},
    primaryClass={eess.IV}
}

Super resolution

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

http://pulse.cs.duke.edu/assets/094.jpeg

@InProceedings{PULSE_CVPR_2020, 
author = {Menon, Sachit and Damian, Alex and Hu, McCourt and Ravi, Nikhil and Rudin, Cynthia}, 
title = {PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models}, 
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 
month = {June}, 
year = {2020} 
}
Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining

https://github.com/SHI-Labs/Cross-Scale-Non-Local-Attention/raw/master/Figs/Visual_3.png

@inproceedings{Mei2020image,
  title={Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining},
  author={Mei, Yiqun and Fan, Yuchen and Zhou, Yuqian and Huang, Lichao and Huang, Thomas S and Shi, Humphrey},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
@InProceedings{Lim_2017_CVPR_Workshops,
  author = {Lim, Bee and Son, Sanghyun and Kim, Heewon and Nah, Seungjun and Lee, Kyoung Mu},
  title = {Enhanced Deep Residual Networks for Single Image Super-Resolution},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
  month = {July},
  year = {2017}
}

Image-to-Image translation

DeepFaceDrawing: Deep Generation of Face Images from Sketches

http://geometrylearning.com/DeepFaceDrawing/imgs/teaser.jpg

UGATIT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (ICLR 2020)

https://github.com/taki0112/UGATIT/blob/master/assets/teaser.png?raw=true

@inproceedings{
Kim2020U-GAT-IT:,
title={U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation},
author={Junho Kim and Minjae Kim and Hyeonwoo Kang and Kwang Hee Lee},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=BJlZ5ySKPH}
}
Selfie to Anime

https://github.com/jqueguiner/databuzzword/blob/master/images/A578852A-9A4D-4D90-88E0-A4D81C7D41B3.jpeg

@misc{kim2019ugatit,
    title={U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation},
    author={Junho Kim and Minjae Kim and Hyeonwoo Kang and Kwanghee Lee},
    year={2019},
    eprint={1907.10830},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Segmentation

Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3

https://gitlab.com/irafm-ai/poly-yolo/-/raw/master/poly-yolo-titlepage-image.jpg?inline=false

@misc{hurtik2020polyyolo,
    title={Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3},
    author={Petr Hurtik and Vojtech Molek and Jan Hula and Marek Vajgl and Pavel Vlasanek and Tomas Nejezchleba},
    year={2020},
    eprint={2005.13243},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Attention-Guided Hierarchical Structure Aggregation for Image Matting

https://wukaoliu.github.io/HAttMatting/figures/visualization.png

@InProceedings{Qiao_2020_CVPR,
    author = {Qiao, Yu and Liu, Yuhao and Yang, Xin and Zhou, Dongsheng and Xu, Mingliang and Zhang, Qiang and Wei, Xiaopeng},
    title = {Attention-Guided Hierarchical Structure Aggregation for Image Matting},
    booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2020}
}
Foreground-aware Semantic Representations for Image Harmonization

https://github.com/saic-vul/image_harmonization/raw/master/images/ih_teaser.jpg

@article{sofiiuk2020harmonization,
  title={Foreground-aware Semantic Representations for Image Harmonization},
  author={Konstantin Sofiiuk, Polina Popenova, Anton Konushin},
  journal={arXiv preprint arXiv:2006.00809},
  year={2020}
}
Single-Stage Semantic Segmentation from Image Labels (CVPR 2020)

https://github.com/visinf/1-stage-wseg/blob/master/figures/results.gif?raw=true

@inproceedings{Araslanov:2020:WSEG,
  title     = {Single-Stage Semantic Segmentation from Image Labels},
  author    = {Araslanov, Nikita and and Roth, Stefan},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2020}
}

Others

Background Matting: The World is Your Green Screen

@InProceedings{BMSengupta20,
  title={Background Matting: The World is Your Green Screen},
  author = {Soumyadip Sengupta and Vivek Jayaram and Brian Curless and Steve Seitz and Ira Kemelmacher-Shlizerman},
  booktitle={Computer Vision and Pattern Regognition (CVPR)},
  year={2020}
}

3D Photography using Context-aware Layered Depth Inpainting (CVPR 2020)

@inproceedings{Shih3DP20,
  author = {Shih, Meng-Li and Su, Shih-Yang and Kopf, Johannes and Huang, Jia-Bin},
  title = {3D Photography using Context-aware Layered Depth Inpainting},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2020}
}

Project an image centroid to another image using OpenCV

https://github.com/cyrildiagne/screenpoint/blob/master/example/match_debug.png?raw=true

Image to Text

CompGuessWhat?!: a Multi-Task Evaluation Framework for Grounded Language Learning

@inproceedings{suglia2020compguesswhat,
  title={CompGuessWhat?!: a Multi-task Evaluation Framework for Grounded Language Learning},
  author={Suglia, Alessandro, Konstas, Ioannis, Vanzo, Andrea, Bastianelli, Emanuele, Desmond Elliott, Stella Frank and Oliver Lemon},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  year={2020}
}

https://compguesswhat.github.io/paper/

YOLOv4: Optimal Speed and Accuracy of Object Detection

@misc{bochkovskiy2020yolov4,
    title={YOLOv4: Optimal Speed and Accuracy of Object Detection},
    author={Alexey Bochkovskiy and Chien-Yao Wang and Hong-Yuan Mark Liao},
    year={2020},
    eprint={2004.10934},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Image Captioning with PyTorch

https://raw.githubusercontent.com/jayeshsaita/image_captioning_pytorch/master/data/sample_output/output_6774537791.jpg

ResNeSt: Split-Attention Networks

@article{zhang2020resnest,
title={ResNeSt: Split-Attention Networks},
author={Zhang, Hang and Wu, Chongruo and Zhang, Zhongyue and Zhu, Yi and Zhang, Zhi and Lin, Haibin and Sun, Yue and He, Tong and Muller, Jonas and Manmatha, R. and Li, Mu and Smola, Alexander},
journal={arXiv preprint arXiv:2004.08955},
year={2020}
}
Hands-on guide to sign language classification using CNN

Hands-on guide to sign language classification using CNN

Image to Sound/Speech

Image to Video

Text to Anything

Text to Image

Network Fusion for Content Creation with Conditional INNs (CVPR 2020)

Text to Text

Bilingual Translation

T5 finetuning

Training Electra

  • [Pre-train ELECTRA from Scratch for Spanish] (https://chriskhanhtran.github.io/posts/electra-spanish/)

Text Translation

Text Generation

Lyrics Generation
Next Word Prediction

UI

Code to Code

Unsupervised Translation of Programming Languages
@misc{lachaux2020unsupervised,
    title={Unsupervised Translation of Programming Languages},
    author={Marie-Anne Lachaux and Baptiste Roziere and Lowik Chanussot and Guillaume Lample},
    year={2020},
    eprint={2006.03511},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Text to Sound/Speech

Pitchtron: Towards audiobook generation from ordinary people’s voices

Transformers TTS

Text to Video

Sound/Speech to Anything

Sound/Speech to Image

Audio to Image Conversion

Sound/Speech to Text

Speech Command Recognition

Sound/Speech to Sound/Speech

Speaker-independent-emotional-voice-conversion-based-on-conditional-VAW-GAN-and-CWT

@unknown{unknown,
author = {Zhou, Kun and Sisman, Berrak and Zhang, Mingyang and Li, Haizhou},
year = {2020},
month = {05},
pages = {},
title = {Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion},
doi = {10.13140/RG.2.2.20921.60006}
}

Sound/Speech to Video

Video to Anything

Video to Video

Segmentation

MSeg : A Composite Dataset for Multi-domain Semantic Segmentation

https://user-images.githubusercontent.com/62491525/83893958-abb75e00-a71e-11ea-978c-ab4080b4e718.gif

@InProceedings{MSeg_2020_CVPR,
author = {Lambert, John and Zhuang, Liu and Sener, Ozan and Hays, James and Koltun, Vladlen},
title = {MSeg A Composite Dataset for Multi-domain Semantic Segmentation},
booktitle = {Computer Vision and Pattern Recognition (CVPR)},
year = {2020}
}
Motion Supervised co-part Segmentation

https://github.com/AliaksandrSiarohin/motion-cosegmentation/blob/master/sup-mat/beard-line.gif?raw=true

@article{Siarohin_2020_motion,
  title={Motion Supervised co-part Segmentation},
  author={Siarohin, Aliaksandr and Roy, Subhankar and Lathuilière, Stéphane and Tulyakov, Sergey and Ricci, Elisa and Sebe, Nicu},
  journal={arXiv preprint},
  year={2020}
}

Video to Image

Video to Text

Video toSound/Speech

Video to Video

Inference

Python serving

Fastai

How to deploy Fastai on Ubuntu

HuggingFace

@article{sanh2020movement,
    title={Movement Pruning: Adaptive Sparsity by Fine-Tuning},
    author={Victor Sanh and Thomas Wolf and Alexander M. Rush},
    year={2020},
    eprint={2005.07683},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Hummingbird

python library that compiles trained ML models into tensor computation for faster inference. Supported models include sklearn decision trees, random forest, lightgbm, xgboost.

Tools

Terminal

Rich

Rich is a Python library for rich text and beautiful formatting in the terminal https://github.com/willmcgugan/rich/raw/master/imgs/features.png?raw=true

Python

PyAudio FFT

https://raw.githubusercontent.com/tr1pzz/Realtime_PyAudio_FFT/master/assets/teaser.gif

Process Mining : alpha-miner

Image Feature extractor

Cool projects

Web based Training

How to evaluate Longformer on TriviaQA using NLP

Data Visualization

Hardware

GPU

Nvidia

Ampere

MOOC

Fast.ai

Benchmark

NLP