Hi, I am Rijul, presently pursuing my Masters in Data Science at the University of Southern California. I am a huge fan of making computers learn complex tasks. My interests range from machine learning, Natural Language Processing, and creating a knowledge base through information retrieval, entity resolution.
I would be happy to connect with you to talk about Machine Learning, Deep Learning, Natural Language Processing, Knowledge Graphs or AI in general.
Table Linker: Entity Linking System for linking tables to Wikidata
1. Developed algorithm for candidate generation for entities in a table. Improved recall from 0.905 to 0.986
2. Trained and tuned Pairwise Contrastive Loss Neural Network for Candidate Ranking
3. Achieved top 1 score of 0.902 on T2DV2 tables and 0.87 on Semtab Round 4 tables
Harmonize:
1. Worked on creating a linked data source for Drugs using Wikidata and RxNorm
2. Developed a pipeline to generate RDF triples and load them to Blazegraph, containerized application using Docker
3. Indexed entities of the drug subgraph using ElasticSearch, and used it for string search based on exact match and fuzzy match
4. Indexed 92 million data items from wikidata, wikipedia using Elasticsearch
5. Contributed to open source projects: KGTK and table-linker
FAIRification of Data:
1. Cleaned dirty transaction data by developing Linked Master Data Management using Wikibase infrastructure.
2. Optimized reconciliation using OpenRefine. Achieved speedup by 50%.
3. Achieved an overall precision of 85% on linking parsed entities to Q nodes in Wikidata using Wikidata-Wikifier
4. Streamlined reconciliation process for data curators by integrating Wikidata-Wikifier backend with OpenRefine UI.
5. Devised pipeline to validate shape of new entities that are added to Wikibase allowing manual curation.
Business Open Knowledge Network:
1. Developed broad crawler to crawl 100,000 company’s webpage extracting information about Mergers and Acquisition.
2. Achieved recall of 69% on extracting names of target and acquirer companies.
3. Extracted customer relations from Capital IQ database for 20 years(2000 - 2020) using Snorkel. Achieved an F1 score of 68.1%
Worked on project to predict the wheat yield in 5 states of the United States.
1. Data Collection: Collected Tabular data, independent variables were mostly weather parameters and dependent variable was yield. Dataset collected was for counties in 5 states of the United States.
2. Data Exploration: Got insight of data using python libraries - pandas, matplotlib.
3. Feature Engineering: Engineered new features based on insights gained from data exploration step. Fabricated new features using statistical techniques.
4. Model Building: Established and compared performance of Machine Learning models - Random Forest Regressor, Gradient Boosting Regressor and Artificial Neural Network. GBM gave best performance.
1. Built a Knowledge Graph for Video Games and the System Requirements.
2. Designed a scoring function to do entity linking between Games extracted from different data sources.
3. Also worked on recommending games to the user based on not only the games the user likes
but also based on what the user can actually run on his system. Used fastText word vectors
for the recommendation systems.
4. The KG also links the online platform where the game can be found for the cheapest price.
1. This is a self learning task to learn and implement different architectures of Seq2Seq model used for Machine Translation
2. The machine translation task worked on was to convert text in German language to English
3. Till now, I have experimented with vanilla Seq2Seq model using LSTM, implemented
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation and
used Attention for the task
1. This was a Kaggle Competition namely Career Con 2019. It involved prdicting the surface on which the robot is standing.
2. The link for the competition is: https://www.kaggle.com/c/career-con-2019/overview.
3. You can also read more about the project here: .
This work involves extracting the educational institutions where an actor completed his/her education. The corpus used for this relation extraction task is the crawled biography of cast from IMDB.
View ProjectThis work crawls information about movies from the IMDB site. The crawler is written using Scrapy in Python
View ProjectThis task involves entity resolution of movies from the IMDB database and AFI database. It creates a node for each movie in IMDB and adds properties for the same movie from the AFI database.
This task also involves creating RDF triples for the resulting dataset.
View ProjectThis was a Kaggle Competition. Although I could not participate in the competition while it was running but I worked out on the problem later. The problem is multi-classification problem wherein we have to predict the surface on which the robot is standing.
View BlogThis blog aims at explaining everything about convolutionl neural networks. Have fun reading it and do let me know if you have any comments.
View Blog