Archive for June, 2011
Generating HTML pages from Latex
Posted by cheshirekow in LaTeX on June 29, 2011
While latex is pretty much “not designed” for web content, it is very useful to generate a web-version of a latex document. The purpose of latex is clearly for typesetting layouts on a pre-defined page, but when you want to share the information with others, it’s generally a lot easier for them to go to a webpage then it is to download and open a PDF. In addition, it’s generally easier to view a webpage than a PDF because the content is continuous, and one can scroll around and click hyperlinks in a way that is far more fluid than on a PDF.
Now that MathML and SVG are becoming more supported by web browsers, there is a strong case for sharing mathy documents on the web in addition to paper documents (or PDFs, which are only slightly more readable than paper).
To this end, I’ve been evaluating various different Latex to HTML converters. I’ve tried the following on Linux (Ubuntu):
By far my favorite is LaTeXML. It generates crisp, simple pages using MathML and CSS, making it easy to customize the style. It doesn’t support a whole lot of packages that I generally would like to use (like algorithm2e), but then again none of them do. Also, the ArXiV project is working on a branch of LaTeXML so there is promise that it will grow quickly to support a lot of the best packages.
Document Setup
My current approach to generating both PDFs and HTMLs from latex source is to use separate top-level documents for both. The directory structure looks something like this:
document |- document_html.tex |- document_pdf.tex |- document.tex |- preamble_common.tex |- preamble_html.tex |- preamble_pdf.tex \- references.bib |
The two versions of document_[output].tex
are the top-level files. They look like this:
1 2 3 4 5 6 7 8 | %document_html.tex \documentclass[10pt]{article} \input{preamble_common} \input{preamble_html} \begin{document} \input{document} \end{document} |
The pdf version is the same but it uses preamble_pdf
as an input. Note that in latex you cannot nest \include
directives, but you can nest \input
directives. Also, \include
inserts a page-break so there is no need to use them here. Rather document.tex
may \include
it’s chapters as tex files or the like.
Makefile
To ease the process of generating the different types, I’m using a makefile.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | # The following definitions are the specifics of this project PDF_OUTPUT := document.pdf HTML_OUTPUT := document.html PDF_MAIN := document_pdf.tex HTML_MAIN := document_html.tex COMMON_TEX := document.tex \ preamble_common.tex PDF_TEX := $(COMMON_SRC) \ document_pdf.tex \ preamble_pdf.tex HTML_TEX := $(COMMON_SRC) \ document_html.tex \ preamble_html.tex BIB := references.bib # these variables are the dependencies for the outputs PDF_SRC := $(PDF_TEX) $(BIB) HTML_SRC := $(HTML_TEX) $(BIB) # the 'all' target will make both the pdf and html outputs all: pdf html # the 'pdf' target will make the pdf output pdf: $(PDF_OUTPUT) # the 'html' target will make the html output html: $(HTML_OUTPUT) # the pdf output depends on the pdf tex files # we use a shell script to optionally run pdflatex multiple times until the # output does not suggest that we rerun latex $(PDF_OUTPUT): $(PDF_TEX) @echo "Running pdflatex on $(PDF_MAIN)" @pdflatex $(basename $(PDF_MAIN)) > $(basename $(PDF_MAIN))_0.log @echo "Running bibtex" @-bibtex $(basename $(PDF_MAIN)) > bibtex_pdf.log @echo "Checking for rerun suggestion" @for ITER in 1 2 3 4; do \ STABELIZED=`cat $(basename $(PDF_MAIN)).log | grep "Rerun"`; \ if [ -z "$$STABELIZED" ]; then \ echo "Document stabelized after $$ITER iterations"; \ break; \ fi; \ echo "Document not stabelized, rerunning pdflatex"; \ pdflatex $(basename $(PDF_MAIN)) > $(basename $(PDF_MAIN))_$$ITER.log; \ done @echo "Copying pdf to target file" @cp $(basename $(PDF_MAIN)).pdf $(PDF_OUTPUT) # the html output depends on the html tex files # we have to process all of the bibliography files separately into xml files, # and then include them all in the call to the postprocessor $(HTML_OUTPUT): $(HTML_TEX) @echo "Running latexml on $(HTML_MAIN)" @latexml $(HTML_MAIN) --dest=$(basename $(HTML_OUTPUT)).xml > $(basename $(HTML_MAIN)).log 2>&1 @BIBSTRING=""; \ for BIBFILE in $(BIB); do \ echo "Running latexml on $$BIBFILE"; \ XMLFILE=`basename "$$BIBFILE" .bib`.xml; \ LOGFILE=`basename "$$BIBFILE" .bib`_html.log; \ latexml $$BIBFILE --dest=$$XMLFILE > $$LOGFILE 2>&1; \ BIBSTRING="$$BIBSTRING --bibliography=$$XMLFILE"; \ done; \ echo $$BIBSTRING > bibstring.txt @echo "postprocessing with `cat bibstring.txt`" @latexmlpost $(basename $(HTML_OUTPUT)).xml `cat bibstring.txt` --dest=$(HTML_OUTPUT) --css=navbar-left.css # the 2>/dev/null redirects stderr to the null device so that we don't get error # messages in the console when rm has nothing to remove clean: @-rm -v *.log 2>/dev/null @-rm -v *.out 2>/dev/null @-rm -v *.aux 2>/dev/null @-rm -v *.xml 2>/dev/null @-rm -v *.pdf 2>/dev/null @-rm -v *.html 2>/dev/null @-rm -v bibstring.txt 2>/dev/null |
Some notes on the makefile. I execute bibtex ignoring errors (the dash symbol before ‘bibtex’) because bibtex will exit with an error if it doesn’t find any citations, or if there is no bibliography. Each iteration of pdflatex is output to a logfile named “document_pdf_<i>.log” where “<i>” is the iteration number. The output of pdflatex and bibtex is supressed by dumping it to the logfile (I the verbosity useless to have in the console).
The shell script in the PDF recipe iterates up to four times. The first thing it does is greps the output of the most recent run pdf latex looking for the line where latex recommends that we “Rerun” latex. If it finds such a line it sets the shell variable STABELIZED
to that string. Otherwise it gets the empty string. Then we test to see if the string is empty. If it’s empty, we’re done so we break the loop. If it’s not, then we rerun pdflatex.
The shell script in the HTML recipe iterates over each of the (potentially multiple, potentially zero) bibliography files, processing each of them with latexml. It then appends the string “–bibliography=<filename>.xml” to the BIBSTRING
shell variable. The last thing it does is echos the contents of that shell variable to the file “bibstring.txt”. This so so that subsequent commands by make can find it.
Personal Dynamic DNS in Ubuntu
Posted by cheshirekow in Uncategorized on June 13, 2011
I finally got around to purchasing a personal server and one of the first things I did was set up a private DNS server for cheshirekow.com. As it turns out, setting it up to be dynamic is quite easy. In this post I’ll go through the steps I took to get it up and running.
I wont bother with all the fun stuff about how dynamic DNS works or how to properly configure everything, but instead I’ll just post my configuration files for posterity.
More detailed information on configuring bind can be found in the Ubuntu Server Guide. A good article on nsupdate and dynamic updates to bind can be found on jeff garzik’s linux pages. I found the information I needed on Network manager hooks from sysadmin’s journey
Why Dynamic DNS?
Mostly because I’m lazy. I have a work laptop, a personal desktop, a netbook, an android tablet, and an android phone. I’m constantly scp’ing files from one to another, and I really hate having to write out the ip address specifically all the time. Since I own the domain cheshirekow.com, I figured it would be really slick to be able to address all of my machines as subdomains. For instance, I could label them as “laptop.cheshirekow.com”, “desktop.cheshirekow.com”, “netbook.cheshirekow.com”, “tablet.cheshirekow.com”, and “phone.cheshirekow.com”. If these dns entries are automatically updated when each of these devices connects to a wifi access point using DHCP, then I can even get files from one machine to another without even being physically near them.
named.conf.local
Following the ubuntu guide, I edited /etc/bind/named.conf.local
to look like the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | // // Do any local configuration here // // Consider adding the 1918 zones here, if they are not used in your // organization //include "/etc/bind/zones.rfc1918"; zone "cheshirekow.com" { type master; file "/var/lib/bind/db.cheshirekow.com"; allow-transfer { aaa.bbb.ccc.ddd; }; allow-update { key "user.cheshirekow.com."; }; }; |
Note that the file is in /var/lib/bind/db.cheshirekow.com
not in /etc/bind/db.cheshirekow.com
like a lot of tutorials will tell you. This is because ubuntu prevents bind from writing to files in /etc/bind. You can either change the apparmor profile for bind, or, just do as I do, and put the file where you’re supposed to go in /var/lib/bind/
(there’s a note in the bind apparmor profile about this). Putting it in “/etc/bind” is fine if the dns entries are all static, but if there are dynamic entries then bind will try to create a .jnl
file in the same directory as the db.xxx file. Since bind can’t write to /etc/bind
we need to put the db file somewhere else.
Also, note that aaa.bbb.ccc.ddd
is the ip address of my secondary name server for cheshirekow.com. I’m using afraid.org to host my secondary DNS.
The allow-update line allows the user user@cheshirekow.com
to update the dns entries (the dynamic part) as verified by a keypair (generating the keypair comes later). Note that I don’t use the literal “user”.
/var/lib/bind/db.cheshirekow.com
The next thing was to create the db.cheshirekow.com
file which looks like this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | $ORIGIN . $TTL 604800 ; 1 week cheshirekow.com IN SOA ns1.cheshirekow.com. cheshirekow.gmail.com. ( 9 ; serial 604800 ; refresh (1 week) 86400 ; retry (1 day) 2419200 ; expire (4 weeks) 604800 ; minimum (1 week) ) NS ns1.cheshirekow.com. A aaa.bbb.ccc.ddd AAAA ::1 $ORIGIN cheshirekow.com. ns1 A aaa.bbb.ccc.ddd www A eee.fff.ggg.hhh |
Note that aaa.bbb.ccc.ddd
is the ipaddress of the name server itself and eee.fff.ggg.hhh
is the ip address of my web server (where you are currently reading this). Also note that my email address is cheshirekow@gmail.com
but is written in this file as cheshirekow.gmail.com.
.
You can (should?) also set up reverse dns entries for all these things but I did not as the server is actually sitting in a different physical domain. In other words I don’t own a network of ip-addresses so there’s no reason to expect my server to be queried for reverse dns lookups.
Create Keys
The next thing we need to do is setup a key that we can use to do dynamic updates. This can be done on a separate machine from the name server… it doesn’t matter.
user@ns1:~$ mkdir .bind user@ns1:~$ cd .bind user@ns1:~$ dnssec-keygen -a HMAC-MD5 -b 512 -n USER user.cheshirekow.com. |
Note that “USER” is a literal string, not a placeholder for something that you create. Also note that “user.cheshirekow.com” is the name of this key, and corresponds to the email address “user@cheshirekow.com”.
This command creates a public and private key.
user@ns1:~/.bind$ ls -l total 8 -rw------- 1 user user 127 2011-06-10 16:51 Kuser.cheshirekow.com.+157+56713.key -rw------- 1 user user 229 2011-06-10 16:51 Kuser.cheshirekow.com.+157+56713.private |
Install Keys
Now we create a file to store these keys. I put them in /etc/bind/keys.local
1 2 3 4 | key "user.cheshirekow.com." { algorithm HMAC-MD5; secret "2345A/bkd7GDcu9orjzblkj2r37ajglk489DLHD/m987addzjDCadsh8 bbIUOY809glkashDEmPj5alIUoiEeA=="; }; |
Note that this is not a real key, but random gibberish I pounded out on the keyboard. In reality, this key is copied directly from Kuser.cheshirekow.com.+157+56713.key
.
I then added this file to named.conf.local
so that it looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 | // This is the primary configuration file for the BIND DNS server named. // // Please read /usr/share/doc/bind9/README.Debian.gz for information on the // structure of BIND configuration files in Debian, *BEFORE* you customize // this configuration file. // // If you are just adding zones, please do that in /etc/bind/named.conf.local include "/etc/bind/named.conf.options"; include "/etc/bind/named.conf.local"; include "/etc/bind/named.conf.default-zones"; include "/etc/bind/keys.local"; |
Restart bind
That’s it for the bind setup so restart
user@ns1:~$sudo /etc/init.d/bind9 restart |
Client Update Script
I then created the following update script in /etc/NetworkManager/dispatcher.d/99updatedns
. This script is called as a hook from network manager every time an interface goes up or down. It receives two parameters. The first is the name of the interface (i.e. eth0
or wlan0
) and the second is the status (i.e. up
or down
).
1 2 3 4 5 6 7 8 9 10 11 12 | #!/bin/bash INTERFACE=$1 STATUS=$2 DIRECTORY="/home/user/Codes/shell/dyndns" if [ "$STATUS" = "up" ]; then IPADDRESS=`ifconfig $INTERFACE | grep inet | grep -v inet6 | cut -d ":" -f 2 | cut -d " " -f 1` cp $DIRECTORY/nsupdate_src.txt /tmp/nsupdate.txt sed -i "s/IPADDRESS/$IPADDRESS/" /tmp/nsupdate.txt nsupdate -k /home/user/.bind/Kuser.cheshirekow.com.+157+56713.private -v /tmp/nsupdate.txt fi |
Note that this script requires the nsupdate_src.txt
which is here:
1 2 3 4 5 6 | server ns1.cheshirekow.com zone cheshirekow.com update delete netbook.cheshirekow.com. A update add netbook.cheshirekow.com. 86400 A IPADDRESS show send |
The script extracts the ip address from the output of ifconfig
for the correct interface, copies the file to /tmp/
, replaces IPADDRESS with the actual address of the machine, and then calls nsupdate using the private key and the file. This script is saved as /etc/NetworkManager/dispatcher.d/99updatedns
, owned by root
and flagged executable. Note that this script accesses the key for my specific user, which is fine in my case because my netbook is a single-user machine. If the machine has multiple users, you may want to store the key and text file in /home/root
or something.
Result
The result of this process is that netbook.cheshirekow.com
always points to the ip address of my netbook, given that it is connected to a wifi access point. Whenever the netbook (re)connects to an access point, the network manager calls the script, and the dns entry on ns1.cheshirekow.com
is updated.
(Update) Better Script
I changed the update script a little bit. Since I use a wired connection on my laptop most of the time, I don’t want the ip address for the wireless connection to supercede that of the wired connection if it is active.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | #!/bin/bash INTERFACE=$1 STATUS=$2 DIRECTORY="/home/user/Codes/shell/dyndns" echo "network interface change hook:" echo "----------------------------"; #first, check to see if eth0 is up and running ETH0STR=`ifconfig eth0 | grep inet | grep -v inet6` if [ -z "$ETH0STR" ] then echo "eth0 has no address (probably is down or disconnected)" echo "checking interface $INTERFACE whose changed launched this script" if [ "$STATUS" = "up" ] then IPADDRESS=`ifconfig $INTERFACE | grep inet | grep -v inet6 | cut -d ":" -f 2 | cut -d " " -f 1` if [ -z "$IPADDRESS" ] then echo "$INTERFACE has no address, aborting (str = $IPADDRESS)" else echo "$INTERFACE has address $IPADDRESS" cp $DIRECTORY/nsupdate_src.txt /tmp/nsupdate.txt sed -i "s/IPADDRESS/$IPADDRESS/" /tmp/nsupdate.txt nsupdate -k /home/user/.bind/Kuser.cheshirekow.com.+157+56713.private -v /tmp/nsupdate.txt fi else echo "Status is not 'up', aborting" fi else IPADDRESS=`echo $ETH0STR | cut -d ":" -f 2 | cut -d " " -f 1` echo "eth0 has address $IPADDRESS, ignoring changed interface $INTERFACE" cp $DIRECTORY/nsupdate_src.txt /tmp/nsupdate.txt sed -i "s/IPADDRESS/$IPADDRESS/" /tmp/nsupdate.txt nsupdate -k /home/user/.bind/Kuser.cheshirekow.com.+157+56713.private -v /tmp/nsupdate.txt fi |
Edit:
For some reason whenever I update db.cheshirekow.com bind refuses to restart correctly. When I do this update, I have to delete the file /var/lib/bind/db.cheshirekow.com.jnl and restart.
Inkbook Introduction
Posted by cheshirekow in Inkbook, Uncategorized on June 7, 2011
Inkbook is a new project I’ve started to replace Xournal for my needs. What I really want is a tightly integrated, full-features inking experience for Ubuntu.
What’s wrong with xournal?
Xournal is great. I use it all the time. However, there are a lot of really simple features I would like it to have. I took a look at the code, and it’s pretty hard to understand. The lack of good documentation means it’s not worth my time. There’s no sense in committing a ton of time trying to learn the code base, just to find out that an apparently simple feature is impossible to implement without restructuring the whole thing. So, I’m just restructuring the whole thing :).
I’ll start by going through all the things that I don’t like about Xournal.
Memory Usage
One of the biggest problems I have with Xournal is it’s memory usage. A typical 10 page Xournal document consumes around 300MB of RAM, and takes about 60 seconds to open. This is a big nuisance to me. I suspect that Xournal stores the whole document in memory, which is the cause.
Bitmaps
A lot of times I really want to paste some snippet into my notes. There is a Xournal patch for using bitmaps, and it’s not terrible, but the images render fuzzy and it’s difficult to scale and place them in the document. I usually end up exporting the whole thing to PDF for later reference. I’ve written a script which can copy parts of the screen to the clipboard (like the Adobe Reader snapshot tool), so I’d really like to be able to drop a bunch of images into a notebook and draw around them, write on them, etc.
Layers
I think that layers are a really useful tool, but it’s hard to use them in xournal. First of all, you have to select them from a drop down list at the bottom of the screen, not a list box. You can’t reorder them. And if you move to a lower layer, all the layers above it disappear.
Pen Options
Only three line widths and no fast-access colorwheel.
Shapes
Can only draw shapes by having the recognizer interpret them. Why not have shape tools that allow you to drop the shape and then resize, move around?
No lasso tool
Rectangular selection just doesn’t cut it for me. Especially when I have potato shaped drawings that I want to move around, without moving the text around it.
Inkbook
What I really want is a digital notebook. Inkbook aims to be just that. Inkbook is really a merger of features that I like from both Xournal and Inkscape, and an attempt to fix some of the problems I have with both. Here is a list of the features I’m currently focusing on.
- very large documents
- ability to organize notebooks (like folders)
- ability to link individual pages to multiple notebooks
- multiple layers per page
- multiple page sizes
- continuous range of brush sizes
- continuous color picking
- bitmap cut & paste
- grouping of paths
- objects (shapes)
- collaboration (openbook module?)
Very large documents and Organization
I want to be able to have several dozens of pages in a document, which basically means that the entire document can’t be stored in memory. Therefore, I’m attemping to store the data an a sqlite database. This also addresses the desire to have better organizational facilities. I’m implementing separate database objects for notebooks, pages, layers, objects, and paths.
A notebook is an ordered list of notebooks and an ordered list of pages (i.e. a folder). A page is an ordered list of layers. A layer is an ordered list of objects. An object is an ordered list of objects, images, or paths. A path is an ordered list of drawing primitives (most likely a one-to-one mapping to the cairo API).
Organization and View
For organizing notebooks, I plan to have a triew-view (i.e. directory tree). I’ll have a thumbnail page view which shows the current pages and those near it, and allows for scrolling through the whole notebook. This will be a custom widget which renders each of the pages via their thubmail image. I’ll have a list-view to organize layers on the page. The list view will also show list complex objects so they can be easily selected and edited (but it wont display any information about handdrawn paths, as there will be a large number of these). The main view will display a viewport of the page.
Current Progress
I’ve got a proof-of-concept running with the sqlite database file backend and working views the notebook organization and layers. I’ve got a proof-of-concept for the thumbnail view but it needs more work. It’s written in C++ and meant to be very easy to understand and extend. I’m using Gtkmm3 (unstable) because it’s GTK, but it’s C++, and it has cairo as the native API. Here’s a screenshot: