Main

Wilson Kam Kai Yip

Conscientious, hard working, enjoy learning new technology (in quick pace), and able to resolve business problems via theoretical math. Fluent in both Python and R.

Experience

OneFineStay

Data Engineer

London

Oct,2023-Current

Build and maintain ELT data-pipelines from different platforms (such as Salesforce, Stripe, Aircall, JIRA, etc) into BigQuery with Airflow and DBT.
- Design abstract dags and operators to govern the general process of the ELT.
- Design abstract HttpHook to handle different http errors, especially setting up retry mechanism when rate limit is exceeded.
- Set up layers of tables and perform feature engineering to bring raw data into meaningful fields.
Setup Google Cloud Functions as webhooks using Terraform.
- Migrate the legacy stream write API to the new BigQuery Storage Write API.
Quality and cost control on data warehouse
- Write generic tests on DBT daily to monitor data quality and byte usage on BigQuery.
- Setup CI jobs on DBT before merging PRs to identify potential bugs.
- Reduced 80% of the monthly bill from BigQuery by optimising the queries, model materialisation and stroage billing model.
Review pull request on GitHub from data analyst.

Tailify Software

Machine Learning Engineer

London

Jul,2022-Oct,2023

EDA and feature engineering with PySpark, ElasticSearch and other databases to provide quality analyses and input for models.
Design and implement machine learning models to predict YT channels’ audiences demographics. Design and implement FastAPI for the model.
Monitor the system performances by querying Grafana Loki and perform log analysis. Visualise the results on Grafana and Looker Studio.
Design database architecture, collaborate with multiple stakeholders to understand requirements. Build and maintain a PostgreSQL database.
ETL the agency performance and operational data with Airflow .
Apply mathematical knowledge to real world problems: introducing soft-cosine similarity to optimise the existing cosine similarity model, matching 5-10% more top performers.
Writing async functions to follow tens of millions urls (include link shorteners e.g. bit.ly) to identify the real domains, reducing 90% of execution time compare to non-async functions.
Rapid application development: Automating client requests to generate documents, logging and notification on completion. Utilizing Google APIs, Slack API, ElasticSearch, PostgreSQL.

Universities in Hong Kong

Data Scientist

Hong Kong

Sept,2017-Jan,2022

Perform statistical analysis and deploy machine learning models, including AB-testing, PCA, Poisson regression, k-means, hierarchical clustering, LDA topic modelling, etc. to perform analysis on different types of data.
Perform ETL using R and Selenium to scrape data from social medias and forums.
Develope and maintain Shiny Dashboard to visualise analysis results.

Education

University of Hong Kong

Bachelor of Science

N/A

Sept,2014-Jul,2017

Major: Mathematics/Physics
Minor: Computational and Financial Mathematics

Aside

Contact info

Skills

Languages