Wilson Kam Kai Yip
Conscientious, hard working, enjoy learning new technology (in quick pace), and
able to resolve business problems via theoretical math. Fluent in both Python
and R
- Build and maintain ELT data-pipelines from different platforms (such
as Salesforce, Stripe, Aircall, JIRA, etc) into BigQuery with Airflow
and DBT.
- Design abstract dags and operators to govern the general process of the ELT.
- Design abstract HttpHook to handle different http errors, especially setting up retry mechanism when rate limit is exceeded.
- Set up layers of tables and perform feature engineering to bring raw data into meaningful fields.
- Setup Google Cloud Functions as webhooks using Terraform.
- Migrate the legacy stream write API to the new BigQuery Storage Write API.
- Quality and cost control on data warehouse
- Write generic tests on DBT daily to monitor data quality and byte usage on BigQuery.
- Setup CI jobs on DBT before merging PRs to identify potential bugs.
- Reduced 80% of the monthly bill from BigQuery by optimising the queries, model materialisation and stroage billing model.
- Review pull request on GitHub from data analyst.
Tailify Software
- EDA and feature engineering with PySpark, ElasticSearch and other databases to provide quality analyses and input for models.
- Design and implement machine learning models to predict YT channels’ audiences demographics. Design and implement FastAPI for the model.
- Monitor the system performances by querying Grafana Loki and perform log analysis. Visualise the results on Grafana and Looker Studio.
- Design database architecture, collaborate with multiple stakeholders to understand requirements. Build and maintain a PostgreSQL database.
- ETL the agency performance and operational data with Airflow .
- Apply mathematical knowledge to real world problems: introducing soft-cosine similarity to optimise the existing cosine similarity model, matching 5-10% more top performers.
- Writing async functions to follow tens of millions urls (include link shorteners e.g. to identify the real domains, reducing 90% of execution time compare to non-async functions.
- Rapid application development: Automating client requests to generate documents, logging and notification on completion. Utilizing Google APIs, Slack API, ElasticSearch, PostgreSQL.
Universities in Hong Kong
Data Scientist
Hong Kong
- Perform statistical analysis and deploy machine learning models, including AB-testing, PCA, Poisson regression, k-means, hierarchical clustering, LDA topic modelling, etc. to perform analysis on different types of data.
- Perform ETL using R and Selenium to scrape data from social medias and forums.
- Develope and maintain Shiny Dashboard to visualise analysis results.
University of Hong Kong
Bachelor of Science
Major: Mathematics/Physics
Minor: Computational and Financial Mathematics