Carl Burks is a software developer for a global financial institution. With over ten years experience in technology and software development for financial organizations and over twenty years of software experience, Carl Burks provides articles, musings and insight into technology issues, software development, and other selected topics.

New Blog Site and Engine Part II

2016-08-18T11:03:00.002-07:00

Authors:
Carl Burks

Part of the conversion process for the migration to a new blog involved taking the XML extract from Google's Blogger application. The first step I took was to obtain an extract from the blogger site.

After downloading the file I began the steps of converting those into files I could use with my new blog engine.

  1. download the blogger images to a local source.
  2. convert links to blogger's images to new urls
  3. convert the xml nodes to html files
  4. verify the portions of the html files for manual changes
  5. provide a yaml file with the meta data for building pages

In order to download the links I wrote a powershell script:


$x = [xml](Get-Content oldblog.xml)
$files = @();
@([regex]"]*?src=(`"|')([^`"']+)\1",[regex]"]*?href=(`"|')([^`"']+)\1") | %{
$regex = $_;

$x.feed.entry | %{

$regex.Matches($_.content.'#text') | %{

$_.value.split("`"")| %{
#$_
if($_.StartsWith("http")){
$pieces = $_.split(".")
$ext = $pieces[$pieces.Length-1]
$g = Get-GUID
$newfile = (SanitizeFile( $g+".$ext"))
Invoke-WebRequest -OutFile $newfile $_
$files += @{old=$_;new=$newfile}
}
}

}

}
}

Switching the links out is a simple as


$x = [xml](cat .\oldblog.xml)
$counter = 0;

$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding($False);
mkdir ".\oldblog\"
$x.feed.entry | %{
$content = $_.content.'#text'
$counter +=1;
    if($counter -gt 57 -and $content.Length -gt 100){

        $categories = $_.category | %{
            if($_.term -notlike "http*"){
             $_.term
            }
        };

        

        $author = $_.author.name;
        $date = $_.published
        $title = $_.title.'#text'

        
        $yamlcontent = "---`r`n"
        $yamlcontent += "render: pandoctemplate`r`n"
        $yamlcontent += "blogEntry: true`r`n"
        $yamlcontent += "date: `"$date`"`r`n"
        $yamlcontent += "author: $author`r`n"
        $yamlcontent += "showInMenu: true`r`n"
        $yamlcontent += "title: `"$title`"`r`n"
        $yamlcontent += "changefreq: never`r`n"
        $yamlcontent += "baseformat: html`r`n"
        if($categories.length -gt 0){
            $yamlcontent += "keywords:`r`n"
            $categories | %{
                $category = $_
                $yamlcontent += "    - $category`r`n"
            }
        }
        $yamlcontent += "...`r`n"
        mkdir ".\oldblog\oldBlog_$counter\"
        $newfiles | %{
            $content = $content.replace($_.old,("images/"+$_.new))
        }
        [System.IO.File]::WriteAllText("C:\Users\crb02\Desktop\Blog\oldblog\oldBlog_$counter\config.yaml",  $yamlcontent, $Utf8NoBomEncoding) 
        [System.IO.File]::WriteAllText("C:\Users\crb02\Desktop\Blog\oldblog\oldBlog_$counter\content.html",  $content, $Utf8NoBomEncoding) 
        
    }
}

then it is on to a manual cleanup step.