I intend for this article to be a somewhat thorough overview. It won’t be short but I’ll also cut fluff. Keep the following things in mind:
- If you’re looking for a simple script to solve your issue, this isn’t the article that will give it to you.
- This is intended for some very basic content pulled from Drupal. It doesn’t cover every possible aspect.
- It’s written by a primarily Windows user. I wrote my script for converting exported database data in PowerShell which isn’t (easily) available on all operating systems.
A History
SomeShinyObject has been through many iterations. It started off as a Wordpress. Then it moved to Concrete5. Then for several years I settled on using Drupal as my backend CMS. In the beginning, I really enjoyed learning Drupal. It was complicated and challenging but the community was great and there was a lot of information about how to do what I wanted to do with it; start a blog with the possibility of some extensibilty in the future. After about a year though, I got busy and finding time to maintain a complicated CMS became exhausting. I settled to just use this site as a blog and continue using Drupal as the backend.
I stumbled through using Drupal for several years, ignoring the default warnings that Drupal Core was out of date. I skipped some pretty big security updates and I absolutely refused to clean up modules in fear that something would break. Creating content became hectic and I eventually started to have security concerns when I saw some random accounts popping up in my users list that I didn’t put there. After changing my admin password, I began to search for solutions.
That’s when I found Hugo, a static-site generator. You create content in Markdown files with metadata at the top in something called “front matter”. I was quickly impressed with its granular configurations. It also accepts JSON as the configuration structure, something I am pretty familiar with.
My content was already written in Markdown and transformed via a Drupal plugin. This made my life ten times easier when bringing over my articles. Don’t worry if you have never heard of Markdown and/or your content is written in straight HTML. HTML content run through a Markdown parser gets left as HTML. Another factor that saved me was that in the beginning of my Drupal site creation, I offloaded my commenting to the Disqus service rather than a local setup. This saved me from having to factor that into my SQL query for articles.
Overall Process
While I had an idea of where to start, I had to identify the granular steps as I walked through the process. The major steps can be summed up into the following steps:
- Identify the content to migrate
- Write a SQL query
- Export the data
- Configure Hugo
- Write a script
- Migrate AdSense
- Migrate Disqus commenting
- Test locally
- Push to production
Identify the content to migrate
Identifying content for my site was fairly simple. All I had was a basic blog and a couple of standalone pages. The data contained within wasn’t that advanced. The two hardest parts were finding linked images and the tagging for the blog posts. For each post, I wanted to track the original date of posting and any last modification dates also. I collected the node ID to insert into my Hugo front matter and I kept the original slug so people who reference any pages from my site on their site won’t have a broken link. Through a bit of Google-Foo / trial and error I was able to assemble the exact query I needed to capture all of this data. Below is a table of everything I wanted to capture for an individual post:
Field |
Description |
Content |
The text content of the blog post. |
Description |
Each post had a description to display when it was listed on the pager. I don't use it now but I wanted to keep it just in case |
Created Date |
The date it was created |
Modified Date |
The date the post was last modified if at all |
Node ID |
The original node ID of the post |
Slug |
The original slug to be used as the direct URL |
Tags |
Any tags I created for each post |
Title |
The title of the post |
Field |
Description |
Write a SQL query
It had been a while since I had written any SQL, so this task took a some time. Luckily there is a lot of information out there about querying Drupal. First, I needed to get all tags as related to their posts because they are not stored together in Drupal’s relational database. I was working through PHPMyAdmin and I don’t believe it supports creating temporary tables, so I just made a normal table and stored the query returns in it.
-- SQL query to retrieve tags
CREATE TABLE IF NOT EXISTS tag_table AS
(SELECT n.nid,
Group_concat(t.name) AS names
FROM taxonomy_index n
JOIN taxonomy_term_data t
ON ( n.tid = t.tid )
WHERE t.vid = 1
GROUP BY n.nid);
After that, I needed a query to retrieve all the fields as annotated in the table. I came with one that included all the fields plus the tags that I needed.
SELECT n.nid,
title,
r.body_value,
r.body_summary,
n.type,
n.changed,
n.created,
t.names,
b.alias
FROM node n
LEFT JOIN field_data_body r
ON n.nid = r.entity_id
AND n.vid = r.revision_id
INNER JOIN tag_table t
ON n.nid = t.nid
INNER JOIN url_alias AS b
ON CONCAT('node/', n.nid) = b.source
Export the data
PHPMyAdmin has some pretty excellent options for exporting data. I intended to write my script with PowerShell and since it gets a long with JSON pretty well, I decided to go with exporting to JSON. I ended up with a JSON structure like this:
[{
"nid": , // The node ID
"title": "",
"body_value": "", // Corresponds to Content
"body_summary": "", // Corresponds to Description
"type": "", // Distinguished between a blog or a standalone page
"changed": , // Epoch formatted date for modified
"created": , // Epoch formatted date
"names": "", // Corresponds to tags, comma-delimited
"alias": "" // Corresponds to slug
}]
With the information now exported, I could begin configuring Hugo and then write the script to move all my data over.
This isn’t going to be a lesson on how to configure Hugo. Hugo’s documentation is excellent and very easy to follow. I will, however, go over what I did setup in order to make this conversion process easier.
Front Matter and Archetypes
All Hugo posts contain individual configurations called “Front Matter” that contain metadata about the post. “Archetypes” are self-made front matter configurations (templates) for your posts which contain all these fields when you first create the post in Hugo. I set up an archetype for blog posts that I can use whenever I need to create a new post. My posts archetype ended up looking like this (I used JSON instead of YAML or TOML because I find it more comfortable):
{
"title": "",
"description": "",
"tags": [],
"date": "",
"publishDate": "",
"modified": "",
"categories": ["blog post"],
"slug": "",
"draft": false,
"id": ""
}
Theme
Hugo has a pretty large variety of available themes so I went with one that was minimalist and one that I could easily modify if I wanted to. I chose Hucore by GitHub user mgjohansen. It uses Bulma as it’s CSS framework. Included in the theme was highlight.js but I used SyntaxHighlighter on my Drupal site so my content was already written for it, so I switched to it and removed the references for highlight.js.
Write a script
As a primary Windows user, PowerShell is my goto scripting language. I wanted a way to automate converting my export data into Hugo posts so I opened up PowerShell ISE and began writing the code to do so. It took several tries and there are a few gotchas that I had to learn, but overall it worked out pretty well for me. The following is what I came up with and I will highlight the lines that need further explanation.
Function ConvertFrom-EpochTime {
Param(
[Int32]$SecondsFromEpoch
)
$DateParams = @{
Year = 1970
Month = 1
Day = 1
Hour = 0
Minute = 0
Second = 0
Millisecond = 0
}
$Origin = Get-Date @DateParams
return $Origin.AddSeconds($SecondsFromEpoch)
}
Function New-HugoBlogPost {
Param([Parameter(Mandatory=$True)]$Data)
Begin {
$BaseObject = New-Object -TypeName PSObject -Property @{
"title" = ""
"description" = ""
"tags" = @()
"created" = ""
"publishDate" = ""
"modified" = ""
"slug" = ""
"id" = [Int32]
"body" = ""
}
}
Process {
$CreateDate = ConvertFrom-EpochTIme -SecondsFromEpoch $Data.created
$ChangeDate = ConvertFrom-EpochTIme -SecondsFromEpoch $Data.changed
$BaseObject.title = $Data.title.Replace("\'", "'")
$BaseObject.description = $Data.body_summary.Replace("\'", "'")
$BaseObject.body = $Data.body_value.Replace("\'", "'")
$BaseObject.created = $CreateDate.ToString("yyyy-MM-dd")
$BaseObject.publishDate = $CreateDate.ToString("yyyy-MM-dd")
$BaseObject.modified = $ChangeDate.ToString("yyyy-MM-dd")
$BaseObject.slug = $Data.alias -replace "posts/", ""
$BaseObject.tags = If ($Data.names) {
(($Data.names.Split(",")) | ForEach-Object {"`"$_`""}) -join ","
} Else {
""
}
$BaseObject.id = $Data.nid
return $BaseObject
}
}
$RawJson = Get-Content -Path "path\to\export.json" -Raw
$JSON = ConvertFrom-Json -InputObject $RawJson
$postTemplate = @"
{
"title": "{title}",
"description": "{description}",
"tags": [{tags}],
"date": "{created}",
"publishDate": "{publishDate}",
"modified": "{modified}",
"categories": ["blog post"],
"slug": "{slug}",
"draft": false,
"id": {id}
}
{body}
"@
$Fields = @(
"title",
"description",
"tags",
"blog_post",
"created",
"publishDate"
"modified",
"slug",
"id",
"body"
)
$JSON | ForEach-Object {
$BlogData = New-HugoBlogPost -Data $_
$Post = $postTemplate.Clone()
$Fields | ForEach-Object {
$Post = $Post.Replace("{$_}", $BlogData.$_)
}
$FileName = "{0}.md" -f $BlogData.slug
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
[System.IO.File]::WriteAllLines("C:\root\dir\of\hugosite\content\posts\$FileName", $Post, $Utf8NoBomEncoding)
# My archetype was called posts, so the MarkDown file goes in the content\posts\ directory
}
- Lines 1-17: PowerShell doesn’t have a built-in way to convert an Epoch time so I had to write one. Drupal stores dates for creation and modification as time equal to the seconds from the UNIX epoch. Turning them into a date that both PowerShell and Hugo like required me to write this function.
- Lines 38-40: Remember that JSON has to escape apostrophes. Make sure to replace those otherwise the front matter and content end up rendered incorrectly.
- Lines 98-100: I was having issues when writing the contents of
$Post
directly to file with the Out-File
CmdLet. The front matter was not working correctly and all the pages ended up either not rendering or rendering completely incorrect. I don’t remember the exact error but I stumbled upon this discussion which let me know that I had to use the correct type of encoding when writing the file otherwise Hugo won’t like it. The lines here help to write the file with the correct encoding (UTF-8 with no BOM encoding).
Migrate AdSense
This site runs Google AdSense. In order to continue using AdSense, I had to incorporate the markup/JavaScript provided by Google into some of the partials included in the theme. Partials are usually stored in the theme’s layout directory or in the Hugo site’s top-level layouts directory.
If you run AdSense, be careful when setting this up. Google's policy has changed about limiting three ads per page, but they still have certain guidelines on what is appropriate. Putting your Ads in partials means you have to think about how many ads may potentially be on one page at a time. Tread carefully!
As mentioned previously, I offloaded commenting to Disqus. It’s an extremely useful service and its ease of use really aided this migration. Disqus matches a sites discussion by the URL used. If the URL of a page changes or a new page is created with Disqus support added, a new discussion is opened. When I migrated to Hugo, a trailing /
was placed on the ends of my URLs. This in turn began to create new discussions. Luckily Disqus’ help site was able to assist me with using their URL mapper migration tool. All it involved was creating a quick CSV of the “from URL” to the “to URL” and uploading to their tool. After that, my Disqus discussions were returned to normal.
Test locally
Hugo comes with a built-in web server. To run it, in whatever shell you are running, change directories to your root hugo site then run hugo server
. It usually opens a listener to port 1313. With this, you can test your configuration, theme, and content including any drafts you may currently be writing.
For Disqus Users please note that Disqus discussions will be opened up for localhost URLs if you don't take steps to avoid it in your Disqus partial. I avoided this by adding the following code in the JavaScript that calls the Disqus discussion to be loaded.
if (window.location.hostname === "localhost") {
return;
}
//Disqus code
Push to production
For source control, I use git. I wanted to incorporate git into my deployments to simplify things and to also keep track of any changes I may make. The following is pretty close to what I did to ensure minimal downtime for the migration.
- Made one final commit to the git repo.
- Followed instructions on my hosting site on how to set up my service to act as a remote repository.
- Pushed my final changes to my hosting site.
- Staged a local (server-side) git repo by cloning the repo to the non-accessible portion of the site.
- I took one final backup of my SQL database via PHPMyAdmin and saved the file in a couple of places.
- I moved all the files located in my public HTML folder.
- Copied my Hugo site’s public directory to my site’s public directory
- Made a final backup of my Drupal files that I had removed from the public directory.
- Cleaned up any remaining remnants of past Drupal management on the server side.
- Tested the new production deployment.
The process was fairly straight-forward. Removing the final remnants of Drupal felt pretty good. I’m really glad to be using a system that will be much easier to maintain and write-content for. Hopefully with this change I will be able to write more blogs instead of worrying about screwing something up.
Conclusion
I know this was not too in-depth but it gives the basic idea of how to go about converting from Drupal to Hugo. My content was limited and you may have more to move if you are thinking of trying it. Feel free to comment about your thoughts or please link or your Drupal to Hugo conversion sites in the Disqus comments. If you need clarification on any aspect, don’t hesitate to ask that either.