My Portfolio

Experience

USC Information Sciences Institute

Graduate Research Assistant - Mentor: Prof. Pedro Szekely

Technology Stack Used: Python, RDFLib, WikiData, RDF, RxNorm, Jupyter, AWS EC2, S3, Git, Docker, Flask, Pytorch, ElasticSearch, Kibana

Table Linker: Entity Linking System for linking tables to Wikidata

1. Developed algorithm for candidate generation for entities in a table. Improved recall from 0.905 to 0.986

2. Trained and tuned Pairwise Contrastive Loss Neural Network for Candidate Ranking

3. Achieved top 1 score of 0.902 on T2DV2 tables and 0.87 on Semtab Round 4 tables

Harmonize:

1. Worked on creating a linked data source for Drugs using Wikidata and RxNorm

2. Developed a pipeline to generate RDF triples and load them to Blazegraph, containerized application using Docker

3. Indexed entities of the drug subgraph using ElasticSearch, and used it for string search based on exact match and fuzzy match

4. Indexed 92 million data items from wikidata, wikipedia using Elasticsearch

5. Contributed to open source projects: KGTK and table-linker

Novartis Corporation

Data Science Intern at Data Strategy Team

Technology Stack Used: Python, RDFLib, WikiData, Scrapy, BeautifulSoup, RDF, Wikibase, Jupyter, AWS EC2, S3, Git, Docker, Flask

FAIRification of Data:

1. Cleaned dirty transaction data by developing Linked Master Data Management using Wikibase infrastructure.

2. Optimized reconciliation using OpenRefine. Achieved speedup by 50%.

3. Achieved an overall precision of 85% on linking parsed entities to Q nodes in Wikidata using Wikidata-Wikifier

4. Streamlined reconciliation process for data curators by integrating Wikidata-Wikifier backend with OpenRefine UI.

5. Devised pipeline to validate shape of new entities that are added to Wikibase allowing manual curation.

Marshall School of Business

Graduate Researcher

Technology Stack Used: Python, Scrapy, BeautifulSoup, Spacy, FLAIR NLP, Snorkel, Jupyter, Git

Business Open Knowledge Network:

1. Developed broad crawler to crawl 100,000 company’s webpage extracting information about Mergers and Acquisition.

2. Achieved recall of 69% on extracting names of target and acquirer companies.

3. Extracted customer relations from Capital IQ database for 20 years(2000 - 2020) using Snorkel. Achieved an F1 score of 68.1%

Moscow Institute of Physics and Technology

Machine Learning Intern

Technology Stack Used: Python, Machine Learning(Random Forests, Gradient Boosting Machines, Neural Networks)

Worked on project to predict the wheat yield in 5 states of the United States.

1. Data Collection: Collected Tabular data, independent variables were mostly weather parameters and dependent variable was yield. Dataset collected was for counties in 5 states of the United States.

2. Data Exploration: Got insight of data using python libraries - pandas, matplotlib.

3. Feature Engineering: Engineered new features based on insights gained from data exploration step. Fabricated new features using statistical techniques.

4. Model Building: Established and compared performance of Machine Learning models - Random Forest Regressor, Gradient Boosting Regressor and Artificial Neural Network. GBM gave best performance.

My Work

Knowledge Graph for Video Games

Technology Stack Used: Python, Apache Jena Fuseki, RDFlib, Scrapy, BeautifulSoup, RDF, fastText, rltk (Entity Linking), Javascript, HTML

1. Built a Knowledge Graph for Video Games and the System Requirements.
2. Designed a scoring function to do entity linking between Games extracted from different data sources.
3. Also worked on recommending games to the user based on not only the games the user likes but also based on what the user can actually run on his system. Used fastText word vectors for the recommendation systems.
4. The KG also links the online platform where the game can be found for the cheapest price.

View Project Demo on youtube

Machine Translation using Seq2Seq Models

Technology Stack Used: Python, Pytorch, TorchText, Jupyter, Git

1. This is a self learning task to learn and implement different architectures of Seq2Seq model used for Machine Translation
2. The machine translation task worked on was to convert text in German language to English
3. Till now, I have experimented with vanilla Seq2Seq model using LSTM, implemented Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation and used Attention for the task

View Project

Help Robots Navigate

Technology Stack Used: Python, Machine Learning(Random Forest, Gradient Boosting Machine), Kaggle, Jupyter

1. This was a Kaggle Competition namely Career Con 2019. It involved prdicting the surface on which the robot is standing.
2. The link for the competition is: https://www.kaggle.com/c/career-con-2019/overview.
3. You can also read more about the project here: .
View Project

Relation Extraction using Snorkel

This work involves extracting the educational institutions where an actor completed his/her education. The corpus used for this relation extraction task is the crawled biography of cast from IMDB.
View Project

Web Crawling from IMDB

This work crawls information about movies from the IMDB site. The crawler is written using Scrapy in Python
View Project

Entity Resolution and Knowledge Representation

This task involves entity resolution of movies from the IMDB database and AFI database. It creates a node for each movie in IMDB and adds properties for the same movie from the AFI database.

This task also involves creating RDF triples for the resulting dataset.
View Project

RIJUL VOHRA

Graduate Student | USC Viterbi School of Engineering | Researcher @ Information Sciences Institute

About Me

Experience

USC Information Sciences Institute

Graduate Research Assistant - Mentor: Prof. Pedro Szekely

Technology Stack Used: Python, RDFLib, WikiData, RDF, RxNorm, Jupyter, AWS EC2, S3, Git, Docker, Flask, Pytorch, ElasticSearch, Kibana

Novartis Corporation

Data Science Intern at Data Strategy Team

Technology Stack Used: Python, RDFLib, WikiData, Scrapy, BeautifulSoup, RDF, Wikibase, Jupyter, AWS EC2, S3, Git, Docker, Flask

Marshall School of Business

Graduate Researcher

Technology Stack Used: Python, Scrapy, BeautifulSoup, Spacy, FLAIR NLP, Snorkel, Jupyter, Git

Moscow Institute of Physics and Technology

Machine Learning Intern

Technology Stack Used: Python, Machine Learning(Random Forests, Gradient Boosting Machines, Neural Networks)

Education

University of Southern California, Los Angeles

Graduate Student at Viterbi School of Engineering

Thapar Institute of Engineering and Technology

Bachelor of Engineering in Electronics and Communication

My Work

Knowledge Graph for Video Games

Technology Stack Used: Python, Apache Jena Fuseki, RDFlib, Scrapy, BeautifulSoup, RDF, fastText, rltk (Entity Linking), Javascript, HTML

Machine Translation using Seq2Seq Models

Technology Stack Used: Python, Pytorch, TorchText, Jupyter, Git

Help Robots Navigate

Technology Stack Used: Python, Machine Learning(Random Forest, Gradient Boosting Machine), Kaggle, Jupyter

Relation Extraction using Snorkel

Web Crawling from IMDB

Entity Resolution and Knowledge Representation

Skills

Blog

Help Robots Navigate

Convolutional Neural Networks