New Blog Site and Engine Part II

2016-08-18 18:03:00 +0000 - Written by Carl Burks

Part of the conversion process for the migration to a new blog involved taking the XML extract from Google's Blogger application. The first step I took was to obtain an extract from the blogger site.

After downloading the file I began the steps of converting those into files I could use with my new blog engine.

  1. download the blogger images to a local source.
  2. convert links to blogger's images to new urls
  3. convert the xml nodes to html files
  4. verify the portions of the html files for manual changes
  5. provide a yaml file with the meta data for building pages

In order to download the links I wrote a powershell script:


$x = [xml](Get-Content oldblog.xml)
$files = @();
@([regex]"<img\s+[^>]*?src=(`"|')([^`"']+)\1",[regex]"<a\s+[^>]*?href=(`"|')([^`"']+)\1") | %{
$regex = $_;

$x.feed.entry | %{

$regex.Matches($_.content.'#text') | %{

$_.value.split("`"")| %{
#$_
if($_.StartsWith("http")){
$pieces = $_.split(".")
$ext = $pieces[$pieces.Length-1]
$g = Get-GUID
$newfile = (SanitizeFile( $g+".$ext"))
Invoke-WebRequest -OutFile $newfile $_
$files += @{old=$_;new=$newfile}
}
}

}

}
}

Switching the links out is a simple as


$x = [xml](cat .\oldblog.xml)
$counter = 0;

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False);
mkdir ".\oldblog\"
$x.feed.entry | %{
$content = $_.content.'#text'
$counter +=1;
    if($counter -gt 57 -and $content.Length -gt 100){

        $categories = $_.category | %{
            if($_.term -notlike "http*"){
             $_.term
            }
        };

        

        $author = $_.author.name;
        $date = $_.published
        $title = $_.title.'#text'

        
        $yamlcontent = "---`r`n"
        $yamlcontent += "render: pandoctemplate`r`n"
        $yamlcontent += "blogEntry: true`r`n"
        $yamlcontent += "date: `"$date`"`r`n"
        $yamlcontent += "author: $author`r`n"
        $yamlcontent += "showInMenu: true`r`n"
        $yamlcontent += "title: `"$title`"`r`n"
        $yamlcontent += "changefreq: never`r`n"
        $yamlcontent += "baseformat: html`r`n"
        if($categories.length -gt 0){
            $yamlcontent += "keywords:`r`n"
            $categories | %{
                $category = $_
                $yamlcontent += "    - $category`r`n"
            }
        }
        $yamlcontent += "...`r`n"
        mkdir ".\oldblog\oldBlog_$counter\"
        $newfiles | %{
            $content = $content.replace($_.old,("/"+$_.new))
        }
        [System.IO.File]::WriteAllText("C:\Users\crb02\Desktop\Blog\oldblog\oldBlog_$counter\config.yaml",  $yamlcontent, $Utf8NoBomEncoding) 
        [System.IO.File]::WriteAllText("C:\Users\crb02\Desktop\Blog\oldblog\oldBlog_$counter\content.html",  $content, $Utf8NoBomEncoding) 
        
    }
}

then it is on to a manual cleanup step.