Curriculum Vitae

Education

M.S. (by research) in Computer Science

CVIT, IIIT Hyderabad (July’16 - Present)

  • Thesis : Unconstrained Arabic & Urdu Text Recognition using Deep CNN-RNN Hybrid Networks
  • Advisor : Prof. C.V. Jawahar
  • Major : Computer Vision and Machine Learning

B.Tech. (honors) in Computer Science

IIIT Hyderabad (July’12 - April’16)

  • CGPA : 8.43 (out of 10)
  • Courses : ECE449 Artificial Neural Networks - CSE441 Database Systems - CSE565 Cloud Computing - CSE577 Machine Learning - CSE578 Computer Vision - CSE481 Optimization Methods - CSE478 Digital Image Processing - CSE471 S.M. in AI - CSE371 Artificial Intelligence - IEC239 Digital Signal Analysis - ICS211 Algorithms - IMA201 Calculas & Complex Numbers - & about 40 more.

Publications

Mohit Jain, Minesh Mathew and C.V. Jawahar, Unconstrained Scene Text and Video Text Recognition for Arabic Script, 1st International Workshop on Arabic Script Analysis and Recognition (ASAR 2017), Nancy, France, 2017. [BEST PAPER AWARD]

Mohit Jain, Minesh Mathew and C.V. Jawahar, Unconstrained OCR for Urdu using Deep CNN-RNN Hybrid Networks, 4th Asian Conference on Pattern Recognition (ACPR 2017), Nanjing, China, 2017. [STUDENT TRAVEL AWARD]

Minesh Mathew, Mohit Jain and C.V. Jawahar, Benchmarking Scene Text Recognition in Devanagari, Telugu and Malayalam, 6th International Workshop on Multilingual OCR (MOCR 2017), Kyoto, Japan, 2017.

Experience

Lead Backend & ML Developer

StartupFlux, Noida, India. (Oct’16 - Apr’17)

  • StartupFlux provides smart business analytics for startups and investors using Deep Learning and Machine Learning techniques. My responsibilities include leading the Back-End and ML operations at this startup, mentoring and managing the team of developers and interns working here.

Web Administrator

IIIT Hyderabad, India. (April’15 - Present)

  • Job requires maintaining and sustaining various online-portals and databases in-use at IIIT Hyderabad.

Research Intern

Virginia Tech, Blacksburg, U.S.A. (April’15 - March’16)

  • Worked on making Convolutional Neural Networks robust against adversarial perturbations under the guidance of Prof. Dhruv Batra (VT) and Prof. CV Jawahar (IIIT-H).

Summer Of Code

CloudCV, Blacksburg, U.S.A. (April’15 - Dec’15)

  • Work related to the creation of cloud based servers capable of carrying out computation intensive machine learning tasks with a very user friendly GUI.

Undergraduate Teching Assistant

IIIT Hyderabad, India.

  • Job requires teaching students of the respective courses via tutorial classes and grading coursework/exams.
  • TA for Computer Networks : Spring’15
  • TA for IT Workshop - I : Fall’14

Projects

Unconstrained Scene Text and Video Text Recognition for Arabic Script (Project Page)

People Involved : Mohit Jain, Minesh Mathew and C.V. Jawahar

  • Building robust recognizers for Arabic has always been challenging. We demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid architecture in recognizing Arabic text in videos and natural scenes. We outperform previous state-of-the-art on two publicly available video text datasets - ALIF and AcTiV. For the scene text recognition task, we introduce a new Arabic scene text dataset and establish baseline results. For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data. We overcome this by synthesizing millions of Arabic text images from a large vocabulary of Arabic words and phrases.

Unconstrained OCR for Urdu using Deep CNN-RNN Hybrid Networks (Project Page)

People Involved : Mohit Jain, Minesh Mathew and C.V. Jawahar

  • Building robust text recognition systems for languages with cursive scripts like Urdu has always been challenging. Intricacies of the script and the absence of ample annotated data further act as adversaries to this task. We demonstrate the effectiveness of an end-to-end trainable hybrid CNN-RNN architecture in recognizing Urdu text from printed documents, typically known as Urdu OCR. The solution proposed is not bounded by any language specific lexicon with the model following a segmentation-free, sequence-tosequence transcription approach. The network transcribes a sequence of convolutional features from an input image to a sequence of target labels.