Introduction

In modern software development, an engineer’s job does not end when a product is developed. Numerous times are spent on testing and deploying the product, no matter if the product is a website or a programming library or anything. Usually these tasks are repetitive and boring because these products are required to be maintained and updated. The same testing and deploying process will need to be rerun again throughout the life-cycle of the product.

The same problem happens on data scientists and machine learning engineers as well, where the models they have developed are also required to be tested and deployed (and updated and tested and deployed again and again). The concept of continuous integration and delivery came to automate these repetitive tasks and saves our precious time.

This article describes these concepts through an example – write a LinkedIn post whenever a new blog post is created in this blog. We will first briefly go through what Github Actions is, then we will talk about how to write a post on LinkedIn through its API. Finally we will create a workflow to check if there is a new blog post and write a LinkedIn post if there is.

Github Actions

Github Actions is a platform for continuous integration / continuous delivery (CI/CD). One can write workflows to automate build, testing, and deployment pipelines. Each workflow is triggered by one or more events and can be run by different runners. We will describe these concepts more below.

Each workflow must be defined in the folder of .github/workflows in a repo and it must be specified in a YAML file like below. We will go through each section of the file.

# Workflow Name
name: Release Process

on:
  # Events
  push:                                   # One event
    branches:
      - main

  workflow_run:                           # Another event
    workflows: [pages-build-deployment]
    types: 
      - completed

jobs:
  # Job
  generate-release:                 # Job id
    name: Create GitHub Release     # Job name
    runs-on: ubuntu-latest          # Runner
    steps:
    - name: Checkout Repository     # Step1
      uses: actions/checkout@v2     # Actions
      
    - name: Run release code        # Step2
      run: |
        cd /target/directory
        ./run-release-code
  
  # Another Job
  another-job:                      # Job id
    name: Another Job               # Job name
    needs: [generate-release]       # Requires the job to complete successfully
    runs-on: ubuntu-latest          # Runner
    steps:
    - name: Checkout Repository     # Step1
      uses: actions/checkout@v2     # Actions
      
    - name: do other stuffs         # Step2
      run: echo $CUSTOM_VAR
      env: 
        CUSTOM_VAR: "${{ secrets.CUSTOM_VAR }}" # Secret value

Workflows

The entire YAML file specified in this code chunk is a workflow. There can be multiple workflows in different YAML files stored inside .github/workflows directory. Each workflow can be triggered by one or more events, or they can be triggered manually, or at a defined schedule. Each workflow can also contains one or more jobs.

Events

An event is an activity within the repository. For example, an event can be a pull / push request. It can also be the completion of another workflow or scheduled by cron syntax.

The above workflow will be triggered whenever one of the two specified events occurs. These two events are

Every time the main branch is pushed or merged from another branch, this workflow will be started.
Whenever another workflow pages-build-deployment is completed, this workflow will be started.

Jobs

A job is a series of steps that will be executed on the same runner. Each step is either a shell script or an action. The steps will be executed in order and dependent on each other. By default, each job will be run by a different runner and concurrently. One can specify the dependency of jobs by the key needs. The above example shows an implementation.

Also, one can also specify a strategy matrix to repeat the same job for different conditions. For example, the following job will be executed 6 times, namely

{node-version: 10, os: ubuntu-22.04}
{node-version: 10, os: ubuntu-20.04}
{node-version: 12, os: ubuntu-22.04}
{node-version: 12, os: ubuntu-20.04}
{node-version: 14, os: ubuntu-22.04}
{node-version: 14, os: ubuntu-20.04}

jobs:
  example_matrix:
    strategy:
      matrix:
        os: [ubuntu-22.04, ubuntu-20.04]
        version: [10, 12, 14]
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/setup-node@v3
        with:
          node-version: ${{ matrix.version }}

Actions

Actions are custom applications for GitHub Actions that perform complex but repetitive tasks. You can write an action from scratch or use an existing action available from the GitHub Marketplace in your workflow.

Runners

A runner is an OS on a virtual machine or container to execute a specific job. GitHub provides Ubuntu Linux, Microsoft Windows, and macOS runners to run the workflows. One can also host their own machine as runner.

Secrets

For each step or job, one can specify an env session to define environment variables. But if we are dealing with credentials, this might not be a good choice. One can go to Settings of the repository, under Security, click Secrets and variables, then click Actions. Inside the page, one can define secrets for the repository and can access them within the env session inside a workflow as shown in the example.

Github Context

Contexts are a way to access information about workflow runs, variables, runner environments, jobs, and steps. For example the name of the working branch, the working directory of Github Actions, etc. The keyword secrets in the above section is also a context. See more from this page.

LinkedIn API

LinkedIn offers various API products for consumers to do various of things. One of which is to write posts on behalf of the users (see this documentation). To do that, we need to

Create a company on LinkedIn
Create an application on behalf of the company
Authenticate yourself and authorise the application to write posts on behalf of you

The process is similar to my previous blog post about OAuth2 for Google APIs. I will briefly describe the process here.

OAuth2

We will first create a company on LinkedIn and the application.

Go to https://developer.linkedin.com/ and click Create App (and login to your LinkedIn account)
Enter the name of the application
Click Create a new LinkedIn Page if you do not have a company on LinkedIn
Select Company
Enter the name of the company, select the industry, company size, company type. Check the terms and click Create page
Go back to the developer page and select the company just created
Upload a logo for the application
Check the Legal agreement and click Create app
Click Verify and follow the instruction
Click Products, click Request access for both Share on LinkedIn and Sign in with LinkedIn
Click Auth and copy the Client ID and Client Secret
Under OAuth 2.0 settings, enter the authorised redirect url

Now we have the client_id, client_secret and redirect_uri ready, we can now authenticate ourselves and authorise the application. The following script will generate a url to login to your LinkedIn account. Then it will generate the access_token.

import os
from urllib.parse import urlencode, urlparse
from http.server import HTTPServer, BaseHTTPRequestHandler
import json
import requests
import webbrowser

client_id = os.getenv("CLIENT_ID")
client_secret = os.getenv("CLIENT_SECRET")
redirect_uri = os.getenv("REDIRECT_URI")
scope = "r_liteprofile w_member_social openid profile email"

def parse_query(path):
    parsed_url = urlparse(path)
    query = parsed_url.query.split("&")
    query = [x.split("=") for x in query]
    query = {x[0]: x[1] for x in query}
    return query

def auth_code(code, client_id, client_secret, redirect_uri):
    params = {
        "grant_type": "authorization_code",
        "code": code,
        "redirect_uri": redirect_uri,
        "client_id": client_id,
        "client_secret": client_secret
    }
    headers = {
        "content-type": "application/x-www-form-urlencoded",
        "content-length": "0"
    }
    url = "https://www.linkedin.com/oauth/v2/accessToken"
    response = requests.post(url, params=params, headers=headers)
    response.raise_for_status()
    content = response.json()
    return content

class NeuralHTTP(BaseHTTPRequestHandler):
    def do_GET(self):
        path = self.path
        query = parse_query(path)

        code = query.get("code")
        if code:
            status_code = 200
            content = auth_code(
                code=query.get("code"),
                client_id=client_id,
                client_secret=client_secret,
                redirect_uri=redirect_uri
            )
            print(json.dumps(content, indent=4))
        else:
            status_code = 400
            content = {
                "error": "code not found"
            }

        self.send_response(status_code)
        self.send_header("Content-Type", "application/json")
        self.end_headers()
        self.wfile.write(bytes(json.dumps(content, indent=4), "utf-8"))
    
    def log_message(self, format, *args):
        """Silence log message. Can be ignored."""
        return

if __name__ == "__main__":
    with HTTPServer(("127.0.0.1", 8088), NeuralHTTP) as server:
        auth_url = "https://www.linkedin.com/oauth/v2/authorization"
        params = {
            "client_id": client_id,
            "response_type": "code",
            "redirect_uri": redirect_uri,
            "scope": scope,
        }

        url = f"{auth_url}?{urlencode(params)}"
        webbrowser.open(url)
        server.handle_request()

# {
#     "access_token": "...",
#     "expires_in": 5183999,
#     "scope": "email,openid,profile,r_liteprofile,w_member_social",
#     "token_type": "Bearer",
#     "id_token": "..."
# }

Calling API

Identify User Id

To write a post on LinkedIn, We need to first identify the author’s user_id. A GET request to https://api.linkedin.com/v2/userinfo with the access_token obtained from the above are needed.

import os
import requests 
import json

url = "https://api.linkedin.com/v2/userinfo"
token = os.getenv("LINKEDIN_ACCESS_TOKEN")

headers = {"Authorization": f"Bearer {token}"}

response = requests.get(url, headers=headers)
response.raise_for_status()
content = response.json()
print(json.dumps(content, indent=4))

{
    "sub": "....",
    "email_verified": true,
    "name": "Wilson Yip",
    "locale": {
        "country": "US",
        "language": "en"
    },
    "given_name": "Wilson",
    "family_name": "Yip",
    "email": "wilsonyip@elitemail.org",
    "picture": "https://media.licdn.com/dms/image/C4E03AQGo1BKbUYmyBA/profile-displayphoto-shrink_100_100/0/1646639382257?e=1696464000&v=beta&t=6lhHrDK3vx6GOC01wIKkfVYAmCiSWoZtc8XpE0JoUmM"
}

The user_id is stored in the sub value.

Write Post

We will be calling the Share in LinkedIn endpoint to write a post in LinkedIn along with the specific request body to attach an article to the post. The following scripts shows an example.

import os
import requests 

def build_post_body(
    user_id, 
    post_content, 
    media_title, 
    media_description, 
    article_url
):
    body = {
        "author": f"urn:li:person:{user_id}",
        "lifecycleState": "PUBLISHED",
        "specificContent": {
            "com.linkedin.ugc.ShareContent": {
            "shareCommentary": {
                    "text": post_content
                },
                "shareMediaCategory": "ARTICLE",
                "media": [
                    {
                        "status": "READY",
                        "description": {
                            "text": media_description
                        },
                        "originalUrl": article_url,
                        "title": {
                            "text": media_title
                        }
                    }
                ]
            }
        },
        "visibility": {
            "com.linkedin.ugc.MemberNetworkVisibility": "PUBLIC"
        }
    }
    return body

if __name__ == "__main__":
    linkedin_user_id = os.getenv("LINKEDIN_USER_ID")    # user_id 
    linkedin_token = os.getenv("LINKEDIN_TOKEN")        # access_token
    linkedin_post_endpoint = "https://api.linkedin.com/v2/ugcPosts"

    headers = {
        "X-Restli-Protocol-Version": "2.0.0",
        "Authorization": "Bearer " + linkedin_token 
    }

    body = build_post_body(
        user_id=linkedin_user_id,
        post_content="Content of the LinkedIn post",
        media_title="The title of the article",
        media_description="The description of the article",
        article_url="https://www.link-to-article.com/article"
    )

    response = requests.post(
        url=linkedin_post_endpoint, 
        json=body, 
        headers=headers
    )

Auto Posting Workflow

A workflow is created to write a post on LinkedIn whenever there is a new article merged to the main branch of a repository. The workflow is triggered every time after completion of the pages-build-deployment workflow, which is the workflow to build the website. Yet, there is a problem:

We need to keep tract which article was posted to LinkedIn already in order to define which article is new.

For simplicity, I have created a Google Sheet to store the article paths and the corresponding LinkedIn post_id. If an article’s path does not appear in the table, that is the new article and will further trigger the scripts.

The workflow is quite simple. It just runs a Python file. The Python file will check if there are any new article path, write a LinkedIn post if there is one, and update the log file.

The Workflow

name: create-linkedin-post

on:
  workflow_run:
    workflows: [pages-build-deployment]
    types: 
      - completed

jobs:
  on-success:
    runs-on: ubuntu-latest
    if: ${{ github.event.workflow_run.conclusion == 'success' }}

    steps:
      - name: Chekcout
        uses: actions/checkout@v3
      
      - name: Install python dependencies
        run: pip install pyyaml
      
      - name: Wait for some seconds
        run: sleep 30
      
      - name: Create Linkedin Post
        run: python ./tools/cd/linkedin_post.py
        env: 
          LINKEDIN_USER_ID: ${{ secrets.LINKEDIN_USER_ID }}
          LINKEDIN_TOKEN: ${{ secrets.LINKEDIN_TOKEN }}
          GCP_CLIENT_EMAIL: ${{ secrets.GCP_CLIENT_EMAIL }}
          GCP_PRIVATE_KEY_ID: ${{ secrets.GCP_PRIVATE_KEY_ID }}
          GCP_PRIVATE_KEY: ${{ secrets.GCP_PRIVATE_KEY }}
          LINKEDIN_POSTS_LOG_SSID: ${{ secrets.LINKEDIN_POSTS_LOG_SSID }}
          LINKEDIN_POSTS_LOG_RANGE: ${{ secrets.LINKEDIN_POSTS_LOG_RANGE }}
  on-failure:
    runs-on: ubuntu-latest
    if: ${{ github.event.workflow_run.conclusion == 'failure' }}
    steps: 
      - run: echo "Fail to write LinkedIn Post."

The Python Script

#!/usr/bin/python

import os
import requests 
import json 
from time import time
import jwt
import yaml

def build_post_body(
    user_id, 
    post_content, 
    media_title, 
    media_description, 
    article_url
):
    preview_url = f"{article_url}/img/preview.png".replace(
        "//img/", "/img/"
    )
    body = {
        "author": f"urn:li:person:{user_id}",
        "lifecycleState": "PUBLISHED",
        "specificContent": {
            "com.linkedin.ugc.ShareContent": {
            "shareCommentary": {
                    "text": post_content
                },
                "shareMediaCategory": "ARTICLE",
                "media": [
                    {
                        "status": "READY",
                        "description": {
                            "text": media_description
                        },
                        "originalUrl": article_url,
                        "title": {
                            "text": media_title
                        },
                        "thumbnails": [
                            {
                                "url": preview_url
                            }
                        ]
                    }
                ]
            }
        },
        "visibility": {
            "com.linkedin.ugc.MemberNetworkVisibility": "PUBLIC"
        }
    }
    return body

def find_latest_missing_post(page_posts, linkedin_posts):
    page_post_paths = [x.get("path") for x in page_posts]
    linkedin_post_paths = [x.get("path") for x in linkedin_posts]
    missing_idx = [
        i for i, x in enumerate(page_post_paths) if x not in linkedin_post_paths
    ]
    
    if missing_idx:
        missing_paths = [page_post_paths[i] for i in missing_idx]
        missing_post_dates = [page_posts[i].get("date") for i in missing_idx]
        latest_missing_post = missing_paths[missing_post_dates.index(max(missing_post_dates))]
        latest_missing_post = page_posts[page_post_paths.index(latest_missing_post)]
    else:
        latest_missing_post = None

    return latest_missing_post

def read_rmd_yml(path):
    with open(path, "r") as f:
        rmd_yml = f.readlines()
    
    yml_idx = [i for i, x in enumerate(rmd_yml) if x == "---\n"]
    return yaml.safe_load("".join(rmd_yml[(yml_idx[0]+1):(yml_idx[1])]))

def auth_gapi_token(client_email, private_key_id, private_key):
    payload: dict = {
        "iss": client_email,
        "scope": "https://www.googleapis.com/auth/drive",
        "aud": "https://oauth2.googleapis.com/token",
        "iat": int(time()),
        "exp": int(time() + 3599)
    }
    headers: dict[str, str] = {'kid': private_key_id}

    signed_jwt: bytes = jwt.encode(
        payload=payload,
        key=private_key.replace("\\n", "\n"),
        algorithm="RS256",
        headers=headers
    )

    body: dict = {
        "grant_type": "urn:ietf:params:oauth:grant-type:jwt-bearer",
        "assertion": signed_jwt
    }
    response: requests.Response = requests.request(
        "POST", "https://oauth2.googleapis.com/token", json=body
    )

    response.raise_for_status()

    content = response.json()
    return content.get('access_token')

def read_gsheet(ssid, ranges, token):
    url = f"https://sheets.googleapis.com/v4/spreadsheets/{ssid}/values/{ranges}"
    headers = {
        "Authorization": f"Bearer {token}"
    }
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    return response.json()

def append_gsheet(ssid, ranges, data, token):
    url = f"https://sheets.googleapis.com/v4/spreadsheets/{ssid}/values/{ranges}:append"

    body = {
        "range": ranges,
        "majorDimension": "ROWS",
        "values": data
    }
    headers = {
        "Authorization": f"Bearer {token}"
    }
    response = requests.post(url, params={"valueInputOption": "RAW"}, headers=headers, json=body)
    response.raise_for_status()

def create_linkedin_post(post):
    linkedin_user_id = os.getenv("LINKEDIN_USER_ID")
    linkedin_token = os.getenv("LINKEDIN_TOKEN")
    linkedin_post_endpoint = "https://api.linkedin.com/v2/ugcPosts"

    rmd_file = os.listdir(f"./_{post['path']}")
    rmd_file = list(filter(lambda x: ".rmd" in x.lower(), rmd_file))[0]

    rmd_yml = read_rmd_yml(f"./_{post['path']}/{rmd_file}")
    post_note = "The post was created by Github Actions.\nhttps://github.com/wilsonkkyip/wilsonkkyip.github.io"
    abstract = rmd_yml["abstract"] + f"\n\n{post_note}"

    body = build_post_body(
        user_id=linkedin_user_id,
        post_content=abstract,
        media_title=rmd_yml["title"],
        media_description=rmd_yml["description"],
        article_url=f"https://wilsonkkyip.github.io/{post['path']}"
    )

    headers = {
        "X-Restli-Protocol-Version": "2.0.0",
        "Authorization": "Bearer " + linkedin_token 
    }

    response = requests.post(
        url=linkedin_post_endpoint, 
        json=body, 
        headers=headers
    )
    response.raise_for_status()

    content = response.json()

    return content


def main():
    gcp_client_email = os.getenv("GCP_CLIENT_EMAIL")
    gcp_private_key_id = os.getenv("GCP_PRIVATE_KEY_ID")
    gcp_private_key = os.getenv("GCP_PRIVATE_KEY")

    log_ssid = os.getenv("LINKEDIN_POSTS_LOG_SSID")
    log_range = os.getenv("LINKEDIN_POSTS_LOG_RANGE")

    gcp_token = auth_gapi_token(
        gcp_client_email, gcp_private_key_id, gcp_private_key
    )

    logs = read_gsheet(log_ssid, log_range, gcp_token)
    linkedin_posts = [
        {logs["values"][0][0]: x[0], logs["values"][0][1]: x[1]} for x in logs["values"][1:]
    ]

    with open("./posts/posts.json", "r") as file:
        page_posts = json.loads(file.read())

    missing_post = find_latest_missing_post(page_posts, linkedin_posts)

    if missing_post:
        response = create_linkedin_post(missing_post)
        appending_data = [[missing_post["path"], response.get("id")]]
        append_gsheet(log_ssid, log_range, appending_data, gcp_token)

if __name__ == "__main__": 
    main()

Github Actions with Example