An introduction to Github Actions with an example to write a post whenever a new blog post is merged to the main branch.
In modern software development, an engineer’s job does not end when a product is developed. Numerous times are spent on testing and deploying the product, no matter if the product is a website or a programming library or anything. Usually these tasks are repetitive and boring because these products are required to be maintained and updated. The same testing and deploying process will need to be rerun again throughout the life-cycle of the product.
The same problem happens on data scientists and machine learning engineers as well, where the models they have developed are also required to be tested and deployed (and updated and tested and deployed again and again). The concept of continuous integration and delivery came to automate these repetitive tasks and saves our precious time.
This article describes these concepts through an example – write a LinkedIn post whenever a new blog post is created in this blog. We will first briefly go through what Github Actions is, then we will talk about how to write a post on LinkedIn through its API. Finally we will create a workflow to check if there is a new blog post and write a LinkedIn post if there is.
Github Actions is a platform for continuous integration / continuous delivery (CI/CD). One can write workflows to automate build, testing, and deployment pipelines. Each workflow is triggered by one or more events and can be run by different runners. We will describe these concepts more below.
Each workflow must be defined in the folder of .github/workflows in a repo and it must be specified in a YAML file like below. We will go through each section of the file.
# Workflow Name
name: Release Process
on:
  # Events
  push:                                   # One event
    branches:
      - main
  workflow_run:                           # Another event
    workflows: [pages-build-deployment]
    types: 
      - completed
jobs:
  # Job
  generate-release:                 # Job id
    name: Create GitHub Release     # Job name
    runs-on: ubuntu-latest          # Runner
    steps:
    - name: Checkout Repository     # Step1
      uses: actions/checkout@v2     # Actions
      
    - name: Run release code        # Step2
      run: |
        cd /target/directory
        ./run-release-code
  
  # Another Job
  another-job:                      # Job id
    name: Another Job               # Job name
    needs: [generate-release]       # Requires the job to complete successfully
    runs-on: ubuntu-latest          # Runner
    steps:
    - name: Checkout Repository     # Step1
      uses: actions/checkout@v2     # Actions
      
    - name: do other stuffs         # Step2
      run: echo $CUSTOM_VAR
      env: 
        CUSTOM_VAR: "${{ secrets.CUSTOM_VAR }}" # Secret valueThe entire YAML file specified in this code chunk is a workflow. There can be multiple workflows in different YAML files stored inside .github/workflows directory. Each workflow can be triggered by one or more events, or they can be triggered manually, or at a defined schedule. Each workflow can also contains one or more jobs.
An event is an activity within the repository. For example, an event can be a pull / push request. It can also be the completion of another workflow or scheduled by cron syntax.
The above workflow will be triggered whenever one of the two specified events occurs. These two events are
pages-build-deployment is completed, this workflow will be started.A job is a series of steps that will be executed on the same runner. Each step is either a shell script or an action. The steps will be executed in order and dependent on each other. By default, each job will be run by a different runner and concurrently. One can specify the dependency of jobs by the key needs. The above example shows an implementation.
Also, one can also specify a strategy matrix to repeat the same job for different conditions. For example, the following job will be executed 6 times, namely
{node-version: 10, os: ubuntu-22.04}{node-version: 10, os: ubuntu-20.04}{node-version: 12, os: ubuntu-22.04}{node-version: 12, os: ubuntu-20.04}{node-version: 14, os: ubuntu-22.04}{node-version: 14, os: ubuntu-20.04}jobs:
  example_matrix:
    strategy:
      matrix:
        os: [ubuntu-22.04, ubuntu-20.04]
        version: [10, 12, 14]
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/setup-node@v3
        with:
          node-version: ${{ matrix.version }}Actions are custom applications for GitHub Actions that perform complex but repetitive tasks. You can write an action from scratch or use an existing action available from the GitHub Marketplace in your workflow.
A runner is an OS on a virtual machine or container to execute a specific job. GitHub provides Ubuntu Linux, Microsoft Windows, and macOS runners to run the workflows. One can also host their own machine as runner.
For each step or job, one can specify an env session to define environment variables. But if we are dealing with credentials, this might not be a good choice. One can go to Settings of the repository, under Security, click Secrets and variables, then click Actions. Inside the page, one can define secrets for the repository and can access them within the env session inside a workflow as shown in the example.
Contexts are a way to access information about workflow runs, variables, runner environments, jobs, and steps. For example the name of the working branch, the working directory of Github Actions, etc. The keyword secrets in the above section is also a context. See more from this page.
LinkedIn offers various API products for consumers to do various of things. One of which is to write posts on behalf of the users (see this documentation). To do that, we need to
The process is similar to my previous blog post about OAuth2 for Google APIs. I will briefly describe the process here.
We will first create a company on LinkedIn and the application.
Now we have the client_id, client_secret and redirect_uri ready, we can now authenticate ourselves and authorise the application. The following script will generate a url to login to your LinkedIn account. Then it will generate the access_token.
import os
from urllib.parse import urlencode, urlparse
from http.server import HTTPServer, BaseHTTPRequestHandler
import json
import requests
import webbrowser
client_id = os.getenv("CLIENT_ID")
client_secret = os.getenv("CLIENT_SECRET")
redirect_uri = os.getenv("REDIRECT_URI")
scope = "r_liteprofile w_member_social openid profile email"
def parse_query(path):
    parsed_url = urlparse(path)
    query = parsed_url.query.split("&")
    query = [x.split("=") for x in query]
    query = {x[0]: x[1] for x in query}
    return query
def auth_code(code, client_id, client_secret, redirect_uri):
    params = {
        "grant_type": "authorization_code",
        "code": code,
        "redirect_uri": redirect_uri,
        "client_id": client_id,
        "client_secret": client_secret
    }
    headers = {
        "content-type": "application/x-www-form-urlencoded",
        "content-length": "0"
    }
    url = "https://www.linkedin.com/oauth/v2/accessToken"
    response = requests.post(url, params=params, headers=headers)
    response.raise_for_status()
    content = response.json()
    return content
class NeuralHTTP(BaseHTTPRequestHandler):
    def do_GET(self):
        path = self.path
        query = parse_query(path)
        code = query.get("code")
        if code:
            status_code = 200
            content = auth_code(
                code=query.get("code"),
                client_id=client_id,
                client_secret=client_secret,
                redirect_uri=redirect_uri
            )
            print(json.dumps(content, indent=4))
        else:
            status_code = 400
            content = {
                "error": "code not found"
            }
        self.send_response(status_code)
        self.send_header("Content-Type", "application/json")
        self.end_headers()
        self.wfile.write(bytes(json.dumps(content, indent=4), "utf-8"))
    
    def log_message(self, format, *args):
        """Silence log message. Can be ignored."""
        return
if __name__ == "__main__":
    with HTTPServer(("127.0.0.1", 8088), NeuralHTTP) as server:
        auth_url = "https://www.linkedin.com/oauth/v2/authorization"
        params = {
            "client_id": client_id,
            "response_type": "code",
            "redirect_uri": redirect_uri,
            "scope": scope,
        }
        url = f"{auth_url}?{urlencode(params)}"
        webbrowser.open(url)
        server.handle_request()# {
#     "access_token": "...",
#     "expires_in": 5183999,
#     "scope": "email,openid,profile,r_liteprofile,w_member_social",
#     "token_type": "Bearer",
#     "id_token": "..."
# }To write a post on LinkedIn, We need to first identify the author’s user_id. A GET request to https://api.linkedin.com/v2/userinfo with the access_token obtained from the above are needed.
import os
import requests 
import json
url = "https://api.linkedin.com/v2/userinfo"
token = os.getenv("LINKEDIN_ACCESS_TOKEN")
headers = {"Authorization": f"Bearer {token}"}
response = requests.get(url, headers=headers)
response.raise_for_status()
content = response.json()
print(json.dumps(content, indent=4)){
    "sub": "....",
    "email_verified": true,
    "name": "Wilson Yip",
    "locale": {
        "country": "US",
        "language": "en"
    },
    "given_name": "Wilson",
    "family_name": "Yip",
    "email": "wilsonyip@elitemail.org",
    "picture": "https://media.licdn.com/dms/image/C4E03AQGo1BKbUYmyBA/profile-displayphoto-shrink_100_100/0/1646639382257?e=1696464000&v=beta&t=6lhHrDK3vx6GOC01wIKkfVYAmCiSWoZtc8XpE0JoUmM"
}The user_id is stored in the sub value.
We will be calling the Share in LinkedIn endpoint to write a post in LinkedIn along with the specific request body to attach an article to the post. The following scripts shows an example.
import os
import requests 
def build_post_body(
    user_id, 
    post_content, 
    media_title, 
    media_description, 
    article_url
):
    body = {
        "author": f"urn:li:person:{user_id}",
        "lifecycleState": "PUBLISHED",
        "specificContent": {
            "com.linkedin.ugc.ShareContent": {
            "shareCommentary": {
                    "text": post_content
                },
                "shareMediaCategory": "ARTICLE",
                "media": [
                    {
                        "status": "READY",
                        "description": {
                            "text": media_description
                        },
                        "originalUrl": article_url,
                        "title": {
                            "text": media_title
                        }
                    }
                ]
            }
        },
        "visibility": {
            "com.linkedin.ugc.MemberNetworkVisibility": "PUBLIC"
        }
    }
    return body
if __name__ == "__main__":
    linkedin_user_id = os.getenv("LINKEDIN_USER_ID")    # user_id 
    linkedin_token = os.getenv("LINKEDIN_TOKEN")        # access_token
    linkedin_post_endpoint = "https://api.linkedin.com/v2/ugcPosts"
    headers = {
        "X-Restli-Protocol-Version": "2.0.0",
        "Authorization": "Bearer " + linkedin_token 
    }
    body = build_post_body(
        user_id=linkedin_user_id,
        post_content="Content of the LinkedIn post",
        media_title="The title of the article",
        media_description="The description of the article",
        article_url="https://www.link-to-article.com/article"
    )
    response = requests.post(
        url=linkedin_post_endpoint, 
        json=body, 
        headers=headers
    )A workflow is created to write a post on LinkedIn whenever there is a new article merged to the main branch of a repository. The workflow is triggered every time after completion of the pages-build-deployment workflow, which is the workflow to build the website. Yet, there is a problem:
We need to keep tract which article was posted to LinkedIn already in order to define which article is new.
For simplicity, I have created a Google Sheet to store the article paths and the corresponding LinkedIn post_id. If an article’s path does not appear in the table, that is the new article and will further trigger the scripts.
The workflow is quite simple. It just runs a Python file. The Python file will check if there are any new article path, write a LinkedIn post if there is one, and update the log file.
name: create-linkedin-post
on:
  workflow_run:
    workflows: [pages-build-deployment]
    types: 
      - completed
jobs:
  on-success:
    runs-on: ubuntu-latest
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
    steps:
      - name: Chekcout
        uses: actions/checkout@v3
      
      - name: Install python dependencies
        run: pip install pyyaml
      
      - name: Wait for some seconds
        run: sleep 30
      
      - name: Create Linkedin Post
        run: python ./tools/cd/linkedin_post.py
        env: 
          LINKEDIN_USER_ID: ${{ secrets.LINKEDIN_USER_ID }}
          LINKEDIN_TOKEN: ${{ secrets.LINKEDIN_TOKEN }}
          GCP_CLIENT_EMAIL: ${{ secrets.GCP_CLIENT_EMAIL }}
          GCP_PRIVATE_KEY_ID: ${{ secrets.GCP_PRIVATE_KEY_ID }}
          GCP_PRIVATE_KEY: ${{ secrets.GCP_PRIVATE_KEY }}
          LINKEDIN_POSTS_LOG_SSID: ${{ secrets.LINKEDIN_POSTS_LOG_SSID }}
          LINKEDIN_POSTS_LOG_RANGE: ${{ secrets.LINKEDIN_POSTS_LOG_RANGE }}
  on-failure:
    runs-on: ubuntu-latest
    if: ${{ github.event.workflow_run.conclusion == 'failure' }}
    steps: 
      - run: echo "Fail to write LinkedIn Post."#!/usr/bin/python
import os
import requests 
import json 
from time import time
import jwt
import yaml
def build_post_body(
    user_id, 
    post_content, 
    media_title, 
    media_description, 
    article_url
):
    preview_url = f"{article_url}/img/preview.png".replace(
        "//img/", "/img/"
    )
    body = {
        "author": f"urn:li:person:{user_id}",
        "lifecycleState": "PUBLISHED",
        "specificContent": {
            "com.linkedin.ugc.ShareContent": {
            "shareCommentary": {
                    "text": post_content
                },
                "shareMediaCategory": "ARTICLE",
                "media": [
                    {
                        "status": "READY",
                        "description": {
                            "text": media_description
                        },
                        "originalUrl": article_url,
                        "title": {
                            "text": media_title
                        },
                        "thumbnails": [
                            {
                                "url": preview_url
                            }
                        ]
                    }
                ]
            }
        },
        "visibility": {
            "com.linkedin.ugc.MemberNetworkVisibility": "PUBLIC"
        }
    }
    return body
def find_latest_missing_post(page_posts, linkedin_posts):
    page_post_paths = [x.get("path") for x in page_posts]
    linkedin_post_paths = [x.get("path") for x in linkedin_posts]
    missing_idx = [
        i for i, x in enumerate(page_post_paths) if x not in linkedin_post_paths
    ]
    
    if missing_idx:
        missing_paths = [page_post_paths[i] for i in missing_idx]
        missing_post_dates = [page_posts[i].get("date") for i in missing_idx]
        latest_missing_post = missing_paths[missing_post_dates.index(max(missing_post_dates))]
        latest_missing_post = page_posts[page_post_paths.index(latest_missing_post)]
    else:
        latest_missing_post = None
    return latest_missing_post
def read_rmd_yml(path):
    with open(path, "r") as f:
        rmd_yml = f.readlines()
    
    yml_idx = [i for i, x in enumerate(rmd_yml) if x == "---\n"]
    return yaml.safe_load("".join(rmd_yml[(yml_idx[0]+1):(yml_idx[1])]))
def auth_gapi_token(client_email, private_key_id, private_key):
    payload: dict = {
        "iss": client_email,
        "scope": "https://www.googleapis.com/auth/drive",
        "aud": "https://oauth2.googleapis.com/token",
        "iat": int(time()),
        "exp": int(time() + 3599)
    }
    headers: dict[str, str] = {'kid': private_key_id}
    signed_jwt: bytes = jwt.encode(
        payload=payload,
        key=private_key.replace("\\n", "\n"),
        algorithm="RS256",
        headers=headers
    )
    body: dict = {
        "grant_type": "urn:ietf:params:oauth:grant-type:jwt-bearer",
        "assertion": signed_jwt
    }
    response: requests.Response = requests.request(
        "POST", "https://oauth2.googleapis.com/token", json=body
    )
    response.raise_for_status()
    content = response.json()
    return content.get('access_token')
def read_gsheet(ssid, ranges, token):
    url = f"https://sheets.googleapis.com/v4/spreadsheets/{ssid}/values/{ranges}"
    headers = {
        "Authorization": f"Bearer {token}"
    }
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    return response.json()
def append_gsheet(ssid, ranges, data, token):
    url = f"https://sheets.googleapis.com/v4/spreadsheets/{ssid}/values/{ranges}:append"
    body = {
        "range": ranges,
        "majorDimension": "ROWS",
        "values": data
    }
    headers = {
        "Authorization": f"Bearer {token}"
    }
    response = requests.post(url, params={"valueInputOption": "RAW"}, headers=headers, json=body)
    response.raise_for_status()
def create_linkedin_post(post):
    linkedin_user_id = os.getenv("LINKEDIN_USER_ID")
    linkedin_token = os.getenv("LINKEDIN_TOKEN")
    linkedin_post_endpoint = "https://api.linkedin.com/v2/ugcPosts"
    rmd_file = os.listdir(f"./_{post['path']}")
    rmd_file = list(filter(lambda x: ".rmd" in x.lower(), rmd_file))[0]
    rmd_yml = read_rmd_yml(f"./_{post['path']}/{rmd_file}")
    post_note = "The post was created by Github Actions.\nhttps://github.com/wilsonkkyip/wilsonkkyip.github.io"
    abstract = rmd_yml["abstract"] + f"\n\n{post_note}"
    body = build_post_body(
        user_id=linkedin_user_id,
        post_content=abstract,
        media_title=rmd_yml["title"],
        media_description=rmd_yml["description"],
        article_url=f"https://wilsonkkyip.github.io/{post['path']}"
    )
    headers = {
        "X-Restli-Protocol-Version": "2.0.0",
        "Authorization": "Bearer " + linkedin_token 
    }
    response = requests.post(
        url=linkedin_post_endpoint, 
        json=body, 
        headers=headers
    )
    response.raise_for_status()
    content = response.json()
    return content
def main():
    gcp_client_email = os.getenv("GCP_CLIENT_EMAIL")
    gcp_private_key_id = os.getenv("GCP_PRIVATE_KEY_ID")
    gcp_private_key = os.getenv("GCP_PRIVATE_KEY")
    log_ssid = os.getenv("LINKEDIN_POSTS_LOG_SSID")
    log_range = os.getenv("LINKEDIN_POSTS_LOG_RANGE")
    gcp_token = auth_gapi_token(
        gcp_client_email, gcp_private_key_id, gcp_private_key
    )
    logs = read_gsheet(log_ssid, log_range, gcp_token)
    linkedin_posts = [
        {logs["values"][0][0]: x[0], logs["values"][0][1]: x[1]} for x in logs["values"][1:]
    ]
    with open("./posts/posts.json", "r") as file:
        page_posts = json.loads(file.read())
    missing_post = find_latest_missing_post(page_posts, linkedin_posts)
    if missing_post:
        response = create_linkedin_post(missing_post)
        appending_data = [[missing_post["path"], response.get("id")]]
        append_gsheet(log_ssid, log_range, appending_data, gcp_token)
if __name__ == "__main__": 
    main()