Next PDF backup, try CloudConvert to see if it will doenload all 700 pages from sitemap to PDF files for 1 month of $10
The objective of DTHS website and IT Systems is to:
- assist in administration of the Society
- maintain records of the society, it's activities and the objects in the museum collection
- publish as much of the collection online as possible through the website
(we are also intending to upload all text and images of our object collection into Victorian Government Online database, Victorian Collections.
DTHS Website
This website is built using the google service "blogger". Use of this service has many advantages over other approaches:
- blogger requires very little technical expertise to setup and operate
- blogger does not require purchase and annual registration charge of a domain name.
- blogger does not require annual payment for "web hosting" i.e. storing text, images and video online
- blogger does not require site security management against online cyber attacks such as denial of service etc. because google manages these.
- blogger has no limit to the number of pages or images uploaded.
- blogger is automatically promoted in google search by google resulting in excellent "searchability".
Website Blogger Limitations:
- Page design is limited to available templates/themes (free and purchaseable) which can be altered by people familiar with XML and python languages.
- No undo i.e. editing mistakes/ content deletion have no easy undo or fall back other than the most recent full site backup.
Website Blogger Management:
- one person has full control of the site. Others can be involved through page duplication in a seperate blog, editing there and copy back.
- pages are added on new topics as often as possible
- customised google search is used on the search page as it allows more complex seaches than the built in blogger search panel
- blogger is easily backed up (periodic xml download from blogger, and periodic download of full text and imagery from every page using sejda service)
Website Content Policies:
- Original unedited full text of all sources is retained. New revisions are added as separate sections of a page or in a new page.
- Sources are identified at end of every section - hopefully with a link to an original source, or to an online version in original format (usually as pdf)
Blogger Backup
- XML format
- for use to re-upload into blogger, or analyse in other software. Does not contain any image files or other media. (i.e. just code (not pictures) for all pages, posts, widgets, customisation etc.
- Settings - Manage Blog - Backup Content - Download file in xml format e.g. blog-08-01-2021.xml
- PDF Format
- This backup ensures that as information is moved onto the website, a copy is searchable within the Research Resources folder on all our desktop computers.
- It also insures against some online disaster that deletes our website !!!
- Contains embedded images at standard resolution (i.e. NOT full resolution)
- Requires multiple software products to produce.
- Recreate the Blogger SiteMap page
- Check that Google Chrome Extension "Link Klipper" is installed and operating.
- Extract URL's to all pages in our website
- Check that Google Chrome Extension "Link Klipper" is installed and operating.
- Browse to DTHS Website SiteMap
- Activate the extension and download links to a csv file
- Open the csv file and remove the quote marks
- Download all post URL's as seperate PDF files
- Buy 1 week access for $8 from Sejga (accepts URL lists and downloads them as PDF's)
- Go to the HTML to PDF Conversion page
- Check that "More Options" are set to
- portrait
- A4
- 20mm margin,
- Show URL and date on PDF.
- Paste the limit of 100 URL's at a time
- Click the "Convert HTML to PDF" button
- Leave the webpage open while it processes the task OR THE JOB WILL BE DELETED. Processing of each batch may take 10 minutes !
- When finished, a download link to a compressed zip file containing all pages will appear.
- Download and unzip the file to see each page as a PDF
- After completing download of all site pages/posts, delete the contents of the existing folder "Research Resources/DTHS Website Backup"
- Copy the new backup files into the folder.
DTHS Database of Museum Collection
- As at May 2026, the inventory of all objects is about 3/4 complete.
- Museum photograph and document information is being continually entered into a database called Inmagic DB/TextWorks. Phtoographs are being scanned as they are entered.
- Objects are being described and photographed
- Hopefully some time in 2023, we will be able to get assistance from Victorian Collections to bulk upload our records of photographs, objects and documents into their online database.
- Online editing of the Victorian Collections records with then replace the used of Inmagic DB/TextWorks.
- As at May 2026, the inventory of all objects is about 3/4 complete.
- Museum photograph and document information is being continually entered into a database called Inmagic DB/TextWorks. Phtoographs are being scanned as they are entered.
- Objects are being described and photographed
- Hopefully some time in 2023, we will be able to get assistance from Victorian Collections to bulk upload our records of photographs, objects and documents into their online database.
- Online editing of the Victorian Collections records with then replace the used of Inmagic DB/TextWorks.
Online document collection:
- Microsoft Not-For-Profit Program provides 1Tb of online storage so all files are stored online for partial sharing as well as backup.
- In this way we can provide direct access to all files (all pdf originals, all photos, all files etc etc) where necessary.
- Microsoft Not-For-Profit Program provides 1Tb of online storage so all files are stored online for partial sharing as well as backup.
- In this way we can provide direct access to all files (all pdf originals, all photos, all files etc etc) where necessary.
Hardware
- 3 desktop computers
- 1 notebook computer
- 2 multifunction scanner/printer/photocopier
- 1 desktop scanner
Software
- 3 desktop computers
- 1 notebook computer
- 2 multifunction scanner/printer/photocopier
- 1 desktop scanner
Software
- Inmagic DB/TextWorks (database)
- Microsoft Office 365 (complimentary software from Microsoft)
- Inmagic DB/TextWorks (database)
- Microsoft Office 365 (complimentary software from Microsoft)
SiteMap listing titles of all pages within the blogger website
- Add page that will automatically create a "sitemap" i.e. list of all pages in the site.
- 2 main functions:◦ Assist people and website "spider" "crawlers" to find the site contents◦ Make it easy to periodically download the site contents
- Code obtained from How to List the Post Titles of all POSTS from your Blogger/Blogspot Blog [Tutorial] (addictofblogging.blogspot.com)
- Collect code from http://addictofblogging.blogspot.com/2015/09/List-Post-Titles-of-all-Posts-from-your-Blogger-Blogspot-BLOG.html
- Create page in blogger: Paste code
- Working successfully for more than 450 page titles (July 2021)
- Have found NO WAY to list alphabetically
Anchor points within a blogger story/post/webpage
Allows links to be made to specific locations in very long pages.
At the location within the story place an "anchor" point id e.g.
<a id="4ElvisSt">4 Elvis Street</a>
You can now jump to this location by adding #4ElvisSt to the page link address e.g.
https://dt-hs.blogspot.com/2021/06/donview.html#4ElvisSt
The code for the link should now look like
<a href="https://dt-hs.blogspot.com/2021/06/donview.html#4ElvisSt">4 Elvis Street</a>Calendar sidebar widget code (not in use)
2020 - was in RH block BUT block not wide enough to show titles so was misleading
Moved into full page
Copy the following link code to subscribe to this calendar on your phone or computer: https://calendar.google.com/calendar/ical/doncastertemplestowehs%40gmail.com/public/basic.ics
Copy the following link code to subscribe to this calendar on your phone or computer: https://calendar.google.com/calendar/ical/doncastertemplestowehs%40gmail.com/public/basic.ics
Method 1: Use Blogger to backup code (i.e. no images) in XML format. Suggest backup every 6 months.
Settings - Backup Content
Method 2: Use a page downloader to save PDF copies of all pages (includes images)
Clear the downloads folder of the computer in readiness for downloading the site web pages.
Install a browser extension link extractor like Link Graber
Go to the Website SiteMap
Hold down the z key on the keyboard and the left browser button while you drag a selection rectangle around the whole list of site pages.
Release the button and the z key.
Open a text file editor (e.g textedit or notepad.
Paste the links into the pad.
Go to sejda (URL to PDF conversion service)
Pay for 1 week access ($8 as at June 2022)
Go to the HTML to PDF page.
Copy 100 links:
Paste the 100 links into the text box:
Scroll down to the bottom and click "Options", then check the options are correct:
Click "Convert HTML to PDF"
After about 5 minutes, the task will complete and the compressed file can be downloaded:Extract the files to see a pdf created from each web page including images and links.
Repeat the process for the next 100 links until all pages have been downloaded.
When finished, delete the files from the previous DTHS Website Page Backup, and paste the new backup into the folder.
This process, not only safeguards the information in the website, BUT also enables all website content to be included in the offline "Research Resources" so that a single offline desktop search covers all DTHS files whether on they are online or not.
Errors
If you get a timeout message from sejda, then eliminate this page from the download list and repeat the download.
Print the problem page manually.
Split the page content across 2 pages to reduce the size of the webpage before the next download.











No comments:
Post a Comment