An introduction to Github Actions with an example to write a post whenever a new blog post is merged to the main branch.
In modern software development, an engineer’s job does not end when a product is developed. Numerous times are spent on testing and deploying the product, no matter if the product is a website or a programming library or anything. Usually these tasks are repetitive and boring because these products are required to be maintained and updated. The same testing and deploying process will need to be rerun again throughout the life-cycle of the product.
The same problem happens on data scientists and machine learning engineers as well, where the models they have developed are also required to be tested and deployed (and updated and tested and deployed again and again). The concept of continuous integration and delivery came to automate these repetitive tasks and saves our precious time.
This article describes these concepts through an example – write a LinkedIn post whenever a new blog post is created in this blog. We will first briefly go through what Github Actions is, then we will talk about how to write a post on LinkedIn through its API. Finally we will create a workflow to check if there is a new blog post and write a LinkedIn post if there is.
Github Actions is a platform for continuous integration / continuous delivery (CI/CD). One can write workflows to automate build, testing, and deployment pipelines. Each workflow is triggered by one or more events and can be run by different runners. We will describe these concepts more below.
Each workflow must be defined in the folder of .github/workflows
in a repo and it must be specified in a YAML file like below. We will go through each section of the file.
# Workflow Name
name: Release Process
on:
# Events
push: # One event
branches:
- main
workflow_run: # Another event
workflows: [pages-build-deployment]
types:
- completed
jobs:
# Job
generate-release: # Job id
name: Create GitHub Release # Job name
runs-on: ubuntu-latest # Runner
steps:
- name: Checkout Repository # Step1
uses: actions/checkout@v2 # Actions
- name: Run release code # Step2
run: |
cd /target/directory
./run-release-code
# Another Job
another-job: # Job id
name: Another Job # Job name
needs: [generate-release] # Requires the job to complete successfully
runs-on: ubuntu-latest # Runner
steps:
- name: Checkout Repository # Step1
uses: actions/checkout@v2 # Actions
- name: do other stuffs # Step2
run: echo $CUSTOM_VAR
env:
CUSTOM_VAR: "${{ secrets.CUSTOM_VAR }}" # Secret value
The entire YAML file specified in this code chunk is a workflow. There can be multiple workflows in different YAML files stored inside .github/workflows
directory. Each workflow can be triggered by one or more events, or they can be triggered manually, or at a defined schedule. Each workflow can also contains one or more jobs.
An event is an activity within the repository. For example, an event can be a pull / push request. It can also be the completion of another workflow or scheduled by cron syntax.
The above workflow will be triggered whenever one of the two specified events occurs. These two events are
pages-build-deployment
is completed, this workflow will be started.A job is a series of steps that will be executed on the same runner. Each step is either a shell script or an action. The steps will be executed in order and dependent on each other. By default, each job will be run by a different runner and concurrently. One can specify the dependency of jobs by the key needs
. The above example shows an implementation.
Also, one can also specify a strategy matrix to repeat the same job for different conditions. For example, the following job will be executed 6 times, namely
{node-version: 10, os: ubuntu-22.04}
{node-version: 10, os: ubuntu-20.04}
{node-version: 12, os: ubuntu-22.04}
{node-version: 12, os: ubuntu-20.04}
{node-version: 14, os: ubuntu-22.04}
{node-version: 14, os: ubuntu-20.04}
jobs:
example_matrix:
strategy:
matrix:
os: [ubuntu-22.04, ubuntu-20.04]
version: [10, 12, 14]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/setup-node@v3
with:
node-version: ${{ matrix.version }}
Actions are custom applications for GitHub Actions that perform complex but repetitive tasks. You can write an action from scratch or use an existing action available from the GitHub Marketplace in your workflow.
A runner is an OS on a virtual machine or container to execute a specific job. GitHub provides Ubuntu Linux, Microsoft Windows, and macOS runners to run the workflows. One can also host their own machine as runner.
For each step or job, one can specify an env
session to define environment variables. But if we are dealing with credentials, this might not be a good choice. One can go to Settings of the repository, under Security, click Secrets and variables, then click Actions. Inside the page, one can define secrets for the repository and can access them within the env
session inside a workflow as shown in the example.
Contexts are a way to access information about workflow runs, variables, runner environments, jobs, and steps. For example the name of the working branch, the working directory of Github Actions, etc. The keyword secrets
in the above section is also a context. See more from this page.
LinkedIn offers various API products for consumers to do various of things. One of which is to write posts on behalf of the users (see this documentation). To do that, we need to
The process is similar to my previous blog post about OAuth2 for Google APIs. I will briefly describe the process here.
We will first create a company on LinkedIn and the application.
Now we have the client_id
, client_secret
and redirect_uri
ready, we can now authenticate ourselves and authorise the application. The following script will generate a url to login to your LinkedIn account. Then it will generate the access_token
.
import os
from urllib.parse import urlencode, urlparse
from http.server import HTTPServer, BaseHTTPRequestHandler
import json
import requests
import webbrowser
= os.getenv("CLIENT_ID")
client_id = os.getenv("CLIENT_SECRET")
client_secret = os.getenv("REDIRECT_URI")
redirect_uri = "r_liteprofile w_member_social openid profile email"
scope
def parse_query(path):
= urlparse(path)
parsed_url = parsed_url.query.split("&")
query = [x.split("=") for x in query]
query = {x[0]: x[1] for x in query}
query return query
def auth_code(code, client_id, client_secret, redirect_uri):
= {
params "grant_type": "authorization_code",
"code": code,
"redirect_uri": redirect_uri,
"client_id": client_id,
"client_secret": client_secret
}= {
headers "content-type": "application/x-www-form-urlencoded",
"content-length": "0"
}= "https://www.linkedin.com/oauth/v2/accessToken"
url = requests.post(url, params=params, headers=headers)
response
response.raise_for_status()= response.json()
content return content
class NeuralHTTP(BaseHTTPRequestHandler):
def do_GET(self):
= self.path
path = parse_query(path)
query
= query.get("code")
code if code:
= 200
status_code = auth_code(
content =query.get("code"),
code=client_id,
client_id=client_secret,
client_secret=redirect_uri
redirect_uri
)print(json.dumps(content, indent=4))
else:
= 400
status_code = {
content "error": "code not found"
}
self.send_response(status_code)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(bytes(json.dumps(content, indent=4), "utf-8"))
def log_message(self, format, *args):
"""Silence log message. Can be ignored."""
return
if __name__ == "__main__":
with HTTPServer(("127.0.0.1", 8088), NeuralHTTP) as server:
= "https://www.linkedin.com/oauth/v2/authorization"
auth_url = {
params "client_id": client_id,
"response_type": "code",
"redirect_uri": redirect_uri,
"scope": scope,
}
= f"{auth_url}?{urlencode(params)}"
url open(url)
webbrowser. server.handle_request()
# {
# "access_token": "...",
# "expires_in": 5183999,
# "scope": "email,openid,profile,r_liteprofile,w_member_social",
# "token_type": "Bearer",
# "id_token": "..."
# }
To write a post on LinkedIn, We need to first identify the author’s user_id
. A GET request to https://api.linkedin.com/v2/userinfo with the access_token
obtained from the above are needed.
import os
import requests
import json
= "https://api.linkedin.com/v2/userinfo"
url = os.getenv("LINKEDIN_ACCESS_TOKEN")
token
= {"Authorization": f"Bearer {token}"}
headers
= requests.get(url, headers=headers)
response
response.raise_for_status()= response.json()
content print(json.dumps(content, indent=4))
{
"sub": "....",
"email_verified": true,
"name": "Wilson Yip",
"locale": {
"country": "US",
"language": "en"
},
"given_name": "Wilson",
"family_name": "Yip",
"email": "wilsonyip@elitemail.org",
"picture": "https://media.licdn.com/dms/image/C4E03AQGo1BKbUYmyBA/profile-displayphoto-shrink_100_100/0/1646639382257?e=1696464000&v=beta&t=6lhHrDK3vx6GOC01wIKkfVYAmCiSWoZtc8XpE0JoUmM"
}
The user_id
is stored in the sub
value.
We will be calling the Share in LinkedIn endpoint to write a post in LinkedIn along with the specific request body to attach an article to the post. The following scripts shows an example.
import os
import requests
def build_post_body(
user_id,
post_content,
media_title,
media_description,
article_url
):= {
body "author": f"urn:li:person:{user_id}",
"lifecycleState": "PUBLISHED",
"specificContent": {
"com.linkedin.ugc.ShareContent": {
"shareCommentary": {
"text": post_content
},"shareMediaCategory": "ARTICLE",
"media": [
{"status": "READY",
"description": {
"text": media_description
},"originalUrl": article_url,
"title": {
"text": media_title
}
}
]
}
},"visibility": {
"com.linkedin.ugc.MemberNetworkVisibility": "PUBLIC"
}
}return body
if __name__ == "__main__":
= os.getenv("LINKEDIN_USER_ID") # user_id
linkedin_user_id = os.getenv("LINKEDIN_TOKEN") # access_token
linkedin_token = "https://api.linkedin.com/v2/ugcPosts"
linkedin_post_endpoint
= {
headers "X-Restli-Protocol-Version": "2.0.0",
"Authorization": "Bearer " + linkedin_token
}
= build_post_body(
body =linkedin_user_id,
user_id="Content of the LinkedIn post",
post_content="The title of the article",
media_title="The description of the article",
media_description="https://www.link-to-article.com/article"
article_url
)
= requests.post(
response =linkedin_post_endpoint,
url=body,
json=headers
headers )
A workflow is created to write a post on LinkedIn whenever there is a new article merged to the main
branch of a repository. The workflow is triggered every time after completion of the pages-build-deployment
workflow, which is the workflow to build the website. Yet, there is a problem:
We need to keep tract which article was posted to LinkedIn already in order to define which article is new.
For simplicity, I have created a Google Sheet to store the article paths and the corresponding LinkedIn post_id
. If an article’s path does not appear in the table, that is the new article and will further trigger the scripts.
The workflow is quite simple. It just runs a Python file. The Python file will check if there are any new article path, write a LinkedIn post if there is one, and update the log file.
name: create-linkedin-post
on:
workflow_run:
workflows: [pages-build-deployment]
types:
- completed
jobs:
on-success:
runs-on: ubuntu-latest
if: ${{ github.event.workflow_run.conclusion == 'success' }}
steps:
- name: Chekcout
uses: actions/checkout@v3
- name: Install python dependencies
run: pip install pyyaml
- name: Wait for some seconds
run: sleep 30
- name: Create Linkedin Post
run: python ./tools/cd/linkedin_post.py
env:
LINKEDIN_USER_ID: ${{ secrets.LINKEDIN_USER_ID }}
LINKEDIN_TOKEN: ${{ secrets.LINKEDIN_TOKEN }}
GCP_CLIENT_EMAIL: ${{ secrets.GCP_CLIENT_EMAIL }}
GCP_PRIVATE_KEY_ID: ${{ secrets.GCP_PRIVATE_KEY_ID }}
GCP_PRIVATE_KEY: ${{ secrets.GCP_PRIVATE_KEY }}
LINKEDIN_POSTS_LOG_SSID: ${{ secrets.LINKEDIN_POSTS_LOG_SSID }}
LINKEDIN_POSTS_LOG_RANGE: ${{ secrets.LINKEDIN_POSTS_LOG_RANGE }}
on-failure:
runs-on: ubuntu-latest
if: ${{ github.event.workflow_run.conclusion == 'failure' }}
steps:
- run: echo "Fail to write LinkedIn Post."
#!/usr/bin/python
import os
import requests
import json
from time import time
import jwt
import yaml
def build_post_body(
user_id,
post_content,
media_title,
media_description,
article_url
):= f"{article_url}/img/preview.png".replace(
preview_url "//img/", "/img/"
)= {
body "author": f"urn:li:person:{user_id}",
"lifecycleState": "PUBLISHED",
"specificContent": {
"com.linkedin.ugc.ShareContent": {
"shareCommentary": {
"text": post_content
},"shareMediaCategory": "ARTICLE",
"media": [
{"status": "READY",
"description": {
"text": media_description
},"originalUrl": article_url,
"title": {
"text": media_title
},"thumbnails": [
{"url": preview_url
}
]
}
]
}
},"visibility": {
"com.linkedin.ugc.MemberNetworkVisibility": "PUBLIC"
}
}return body
def find_latest_missing_post(page_posts, linkedin_posts):
= [x.get("path") for x in page_posts]
page_post_paths = [x.get("path") for x in linkedin_posts]
linkedin_post_paths = [
missing_idx for i, x in enumerate(page_post_paths) if x not in linkedin_post_paths
i
]
if missing_idx:
= [page_post_paths[i] for i in missing_idx]
missing_paths = [page_posts[i].get("date") for i in missing_idx]
missing_post_dates = missing_paths[missing_post_dates.index(max(missing_post_dates))]
latest_missing_post = page_posts[page_post_paths.index(latest_missing_post)]
latest_missing_post else:
= None
latest_missing_post
return latest_missing_post
def read_rmd_yml(path):
with open(path, "r") as f:
= f.readlines()
rmd_yml
= [i for i, x in enumerate(rmd_yml) if x == "---\n"]
yml_idx return yaml.safe_load("".join(rmd_yml[(yml_idx[0]+1):(yml_idx[1])]))
def auth_gapi_token(client_email, private_key_id, private_key):
dict = {
payload: "iss": client_email,
"scope": "https://www.googleapis.com/auth/drive",
"aud": "https://oauth2.googleapis.com/token",
"iat": int(time()),
"exp": int(time() + 3599)
}dict[str, str] = {'kid': private_key_id}
headers:
bytes = jwt.encode(
signed_jwt: =payload,
payload=private_key.replace("\\n", "\n"),
key="RS256",
algorithm=headers
headers
)
dict = {
body: "grant_type": "urn:ietf:params:oauth:grant-type:jwt-bearer",
"assertion": signed_jwt
}= requests.request(
response: requests.Response "POST", "https://oauth2.googleapis.com/token", json=body
)
response.raise_for_status()
= response.json()
content return content.get('access_token')
def read_gsheet(ssid, ranges, token):
= f"https://sheets.googleapis.com/v4/spreadsheets/{ssid}/values/{ranges}"
url = {
headers "Authorization": f"Bearer {token}"
}= requests.get(url, headers=headers)
response
response.raise_for_status()return response.json()
def append_gsheet(ssid, ranges, data, token):
= f"https://sheets.googleapis.com/v4/spreadsheets/{ssid}/values/{ranges}:append"
url
= {
body "range": ranges,
"majorDimension": "ROWS",
"values": data
}= {
headers "Authorization": f"Bearer {token}"
}= requests.post(url, params={"valueInputOption": "RAW"}, headers=headers, json=body)
response
response.raise_for_status()
def create_linkedin_post(post):
= os.getenv("LINKEDIN_USER_ID")
linkedin_user_id = os.getenv("LINKEDIN_TOKEN")
linkedin_token = "https://api.linkedin.com/v2/ugcPosts"
linkedin_post_endpoint
= os.listdir(f"./_{post['path']}")
rmd_file = list(filter(lambda x: ".rmd" in x.lower(), rmd_file))[0]
rmd_file
= read_rmd_yml(f"./_{post['path']}/{rmd_file}")
rmd_yml = "The post was created by Github Actions.\nhttps://github.com/wilsonkkyip/wilsonkkyip.github.io"
post_note = rmd_yml["abstract"] + f"\n\n{post_note}"
abstract
= build_post_body(
body =linkedin_user_id,
user_id=abstract,
post_content=rmd_yml["title"],
media_title=rmd_yml["description"],
media_description=f"https://wilsonkkyip.github.io/{post['path']}"
article_url
)
= {
headers "X-Restli-Protocol-Version": "2.0.0",
"Authorization": "Bearer " + linkedin_token
}
= requests.post(
response =linkedin_post_endpoint,
url=body,
json=headers
headers
)
response.raise_for_status()
= response.json()
content
return content
def main():
= os.getenv("GCP_CLIENT_EMAIL")
gcp_client_email = os.getenv("GCP_PRIVATE_KEY_ID")
gcp_private_key_id = os.getenv("GCP_PRIVATE_KEY")
gcp_private_key
= os.getenv("LINKEDIN_POSTS_LOG_SSID")
log_ssid = os.getenv("LINKEDIN_POSTS_LOG_RANGE")
log_range
= auth_gapi_token(
gcp_token
gcp_client_email, gcp_private_key_id, gcp_private_key
)
= read_gsheet(log_ssid, log_range, gcp_token)
logs = [
linkedin_posts "values"][0][0]: x[0], logs["values"][0][1]: x[1]} for x in logs["values"][1:]
{logs[
]
with open("./posts/posts.json", "r") as file:
= json.loads(file.read())
page_posts
= find_latest_missing_post(page_posts, linkedin_posts)
missing_post
if missing_post:
= create_linkedin_post(missing_post)
response = [[missing_post["path"], response.get("id")]]
appending_data
append_gsheet(log_ssid, log_range, appending_data, gcp_token)
if __name__ == "__main__":
main()