Hosting a Static Website Using CloudFlare and BackBlaze B2 (part 2)

Thursday, June 23, 2022
Reading time 9 minutes

This is the second part of a series of posts I started detailing the steps I’ve followed to publish a static website using Cloudflare and B2, along with GitLab runners, which are responsible for building and uploading the site files. You can find the other parts of this series in the following links:

Part 1
Part 2
Part 3

Once the Bucket has been created where we can deposit our files, and we already have everything necessary to do it securely (that is, the bucket name, s3 endpoint, as well as our KeyId and ApplicationKey), now it’s necessary to make it so that, in addition to building the site, the result can be uploaded to our bucket. That’s what we’ll focus on in this post. To do this, there’s nothing more to do than define another job in our “pipeline” (GitLab calls a group of jobs that run for a project). This job should run right after our main job, which for now only generates the HTML files of our site, finishes. The new job will take the artifacts that the other had made available, and upload the files to our S3 bucket. To upload these files, we’ll need a Docker image that includes the aws-cli tool, the official application for uploading files to S3 and compatible services, such as B2. Finally, also to upload our files, we’ll need to configure our pipeline with some environment variables, where we can indicate the credentials we already have. And we’ll do all this securely, to prevent our keys (especially the App Key) from being exposed in the process.

Note: In this series of posts I’ve used GitLab as a platform to build, and now upload, the website files. But it’s not the only way to do this. It’s perfectly possible to install hugo and aws-cli on a machine (both have versions for Windows and Linux), use aws-cli’s configure command to enter your BackBlaze app key, etc.

Step 1. Configure Environment Variables in GitLab

We assume we’re working with GitLab, and we’re using either GitLab.com or our own software instance, but in both cases with a Docker runner available. The first thing we have to do, before defining our job to upload files, is configure the environment variables that Aws-cli will use to log in to BackBlaze B2. This is safer than entering secret credentials in any file. To do this, follow these steps:

While on the home page of your project hosting the website, locate and activate the link to go to settings from the project menu.
Once inside the settings, again in the project navigation menu, inside the “settings” section (which will now be shown as a list item with another list inside it), locate and activate the link called “CI/CD”.
Navigate until you find the heading called “Variables”. A table will be shown with all the environment variables that jobs can use in your project. At the end of the table, activate the button called “Add variable”. This will show a modal dialog where you can add a new variable. Variables have the following information you’ll need to fill in:
- key: The name that will be used to reference this variable.
- value: the variable content.
- type: Leave it as default in variable.
- environment: Leave it in all by default.
- protect variable: If checked, it can only be used in pipelines of protected branches. I usually uncheck this.
- mask variable: If activated, the variable content won’t be shown in logs, this is useful to prevent GitLab from showing our App Key later, for example.
Once you understand what to do for each variable, we’ll proceed to add 4 variables. Two variables will contain our App Key (remember we have two pieces of data for the key, keyId and ApplicationKey), one variable will be used for our bucket name, and another variable will be extra and will indicate the S3 signature version we’ll use to sign the files. Below are the details on what to add and how to name the variables. In an almost standardized way, all environment variable names go in uppercase.
Add the variable that corresponds to our keyID. In variable key, call it “AWS_ACCESS_KEY_ID”. In value, paste the keyId content and check the box to mask the variable.
Add another variable for our applicationKey, in key, call it “AWS_SECRET_ACCESS_KEY” and in value, place the applicationKey from your document. Also mask this variable.
Add the signature version variable. In key, call it “AWS_SIGNATURE_VERSION” and in value simply write the number 4. It’s not necessary to mask the variable.
Finally, add the variable to indicate the bucket name where we’ll upload the file. In my example, I used the name “S3_BUCKET”, and in value I placed “manuelcortez-net”. It’s not necessary to mask this variable either.

Step 2. Write the New Job

Back in our repository, but this time editing the source code, we’re going to extend our .gitlab-ci.yml file, which is where we define our pipeline jobs, to add the new job that will upload the files generated by hugo. The new job will simply be called “upload_site”, will use the amazon Aws-cli Docker image, and has a couple of commands:

variables:
  GIT_SUBMODULE_STRATEGY: recursive

stages:
  - build
  - upload

make_site:
  tags:
    - docker
  stage: build
  only:
    - master
  interruptible: true
  image: registry.gitlab.com/pages/hugo:latest
  script:
    - 'hugo'
  artifacts:
    paths:
      - public
    expire_in: 1 day

upload_site:
  image:
    name: amazon/aws-cli
    entrypoint: [""]
  tags:
    - docker
  only:
    - master
  interruptible: true
  stage: upload
  script:
    - aws --version
    - aws --endpoint-url https://s3.us-west-001.backblazeb2.com s3 cp public s3://$S3_BUCKET --recursive

In this example, it’s important that you change your s3 endpoint that appears in the last line of the file to the one you have in your document. Also, if you’ve used any other key in the variable name for your bucket, update it.
If your repository’s main branch isn’t called “master”, change the name in the “only:” section for each job so it points to the main branch.

In my example, I had to change an important option in the site configuration to get aws-cli to correctly upload some files. By default, hugo generates files with the names of the tags and categories used on the website. Having my website in Spanish, some of these files include characters like accented letters. By default, Hugo will simply generate files with the full names of categories and tags, for example, “Tecnología.html”, “Programación.html”, etc. What happened when trying to upload the website with this kind of files is that a conflict was generated since the character encoding of the hugo container was different from the one used in the aws-cli image. The error was that files with special characters caused the entire job to be marked as errored. The solution to avoid this was to make hugo replace special characters in filenames with their English alphabet counterparts, just as all modern blog or CMS systems do. This is achieved by editing the site configuration in Hugo, and adding the parameter “RemovePathAccents = true” anywhere in the configuration. This will cause hugo not to require using special characters in filenames, helping it not to get complicated when trying to upload files.

Once finished with this file, when you send your next commit, the generation and upload job of your site to the Backblaze b2 bucket you’ve indicated should start. If something goes wrong at this point, you’ll probably have an email from the GitLab instance from where you can review the failed job. It’s important to fix any errors in this section before continuing with the next step, which is to verify everything in B2.

Step 3. Verifying the Bucket in B2

Once again, it’s necessary to enter our B2 account to verify that everything is working well and the files are where they should be, but more importantly, to get the URL address from where we can access our website. Subsequently, we’ll use Cloudflare to map that URL that B2 gives us to our own domain.

But we have to go step by step. The first and most important thing is to verify that indeed, the Bucket has the files that gitLab should have uploaded. This can be checked by directly accessing the Backblaze overview page, and using the services menu, access the link called “Browse Files”. On the “Browse Files” page, a list will be shown with the names of all buckets created in the B2 account. Each bucket name will be represented by a link, which should be accessed to see the files it contains. In my example, when accessing manuelcortez-net, I can see all the files and directories present in that bucket.

Now, once the files are located, it’s necessary to find out our query URL. Backblaze B2 calls these types of addresses “Friendly URL”. A friendly URL is the public address from where you can access a directory or file within the B2 system. By obtaining the friendly URL of one of the files located in the bucket, most of that URL can be used to know how to view our website, and indicate it that way in Cloudflare.

To do this, it’s only necessary to locate some file from our newly explored bucket. It’s important that it be a file and not a directory, since directories wouldn’t show their details the same way as files do. To show a file’s details, in B2’s web file browser, it’s simply necessary to search for the file and once found, press the enter key. If everything has worked correctly, a modal dialog will be shown with many important details about the file. Of all those details, the only thing that matters to copy is the file’s friendly URL. B2 calls this a “Friendly URL”. It’s usually quite a long URL that leads directly to a file. In my example, I used my website’s index.html file, and the friendly URL that the system gave me is the following: https://f001.backblazeb2.com/file/manuelcortez-net/index.html. If I used this URL and pasted it in the browser’s address bar, I could see the file without problems. It’s important to save this URL in a place we can use later, since this address will depend on Cloudflare finding our website, once the third part of this series of articles is completed.

Conclusion

Having reached this point, where we can now generate a site completely automatically thanks to gitlab runners, and that site goes directly to a bucket in B2, the only thing left to do is provide it with a URL that’s easy to remember when accessing the site. This is what will be done in part 3 of this article, using Cloudflare and a worker for it.

Notes Tutorials

S3 B2 Static website Gitlab CI/CD

Setting Up GitLab Runner on Windows – Thursday, December 29, 2022
Increasingly, I resort to GitLab Runners to build my projects. Being able to automate practically the entire configuration (from generating a tag in Git), it’s perfectly possible to publish new software versions, generate documentation, run tests automatically and upload the results of all that to some site, for example Backblaze B2. The result is a “pipeline” that’s a pleasure to use, because it simply works. At the same time, the creation process of each project is documented in the .gitlab-ci.yml file, which is useful to see exactly the steps to follow to get to generate some version of the applications I write.
Hosting Static Website Using CloudFlare and BackBlaze B2 (Part 3) – Thursday, June 23, 2022
This is the third and final part of a series of posts I started detailing the steps I’ve followed to publish a static website using Cloudflare and B2, along with GitLab runners, which are responsible for building and uploading the site files. You can find the other parts of this series at the following links:
Hosting a Static Website Using CloudFlare and BackBlaze B2 (part 1) – Tuesday, June 7, 2022
This is the first part of a series of posts I started detailing the steps I’ve followed to publish a static website using Cloudflare and B2, along with GitLab runners, which are responsible for building and uploading the site files. You can find the other parts of this series in the following links:
GitLab Runners on Windows for Continuous Integration and Delivery – Tuesday, December 28, 2021
For just over a couple of years, I’ve been using GitLab Runners to deploy and perform tests on some projects I develop. It all started as a test with simple projects, like Music DL and socializer. The idea of runners is to use them to complete the CI/CD Cycle (that is, continuous integration and delivery). In general terms, this makes it possible that whenever a specific condition is met within the source code repository (for example, creating a tag for each new version, sending a commit to a repository), a Runner takes charge of executing a series of commands or scripts to run the project’s test units, or can even compile, package and distribute versions ready for users directly. This is what we’ve been using in TWBlue for Snapshot updates for a couple of years, which in the end was an important factor when deciding to shorten the development cycle of the application versions. Personally, I also like that a runner executes all the tasks, as that helps us keep our personal or work computers out of TWBlue distribution. Since these are Python projects, sometimes the modules that help us generate distributable versions (such as cx_freeze) incorrectly include packages that aren’t used in the application. This can’t always be controlled appropriately and sometimes, a TWBlue version generated in the past included more or fewer extra and unnecessary modules depending on whose computer the distributable versions had been built on. With a runner creating each version of our software, we ensure we have a machine that only has the necessary tools installed for what will be built, and we also provide everyone who wants to review it, a list of reproducible steps that need to be executed to create an identical copy of our versions. Finally, the fact that all this happens completely automatically makes us have to worry less about preparing everything and executing each step manually on our machines: A developer only has to send a commit or create the appropriate tag, and the runner will test, build and upload the executables to our ftp site in a matter of minutes.