Which Tablet Should I Choose?

October 2012 brought forth a rash of new tablet offerings from Apple, Google and their partners as well as Amazon. But which tablet should I choose? If you have a high speed (Not dialup) internet connection both at home and work then you may make very little use of cellular internet access so choose one of the laptops below based largely on the price point.

Nexus 7 7″ 16GB                  $199
Fire HD 7″ 32 GB w/o Offers $264*
iPod Touch 4″ 32GB            $299
iPad2 9.7″ 16GB                 $399*
iPad mini 7.9″ 64GB            $529
iPad(‘4’) 9.7″ Retina 64GB    $699
iPad(‘4’) 9.7″ Retina 3G. 64. $829
*Best Choice

If you don’t have WiFi access both at home and at work then read my recommendations for devices with cellular access. For laptops read here.

Moving Our Polycom’s Weather Page from Google’s API to Weather Underground

After the recent demise of the unofficial Google Weather API documented here I have just posted a rewrite as a GIT gist that uses Weather Underground’s service. The code can be found here.

The perl script I have provided can be used from a cron to to generate an updated weather file four to six times per hour. That file should be on one of your, probably local, servers so that it can be polled by your phones every 15 minutes or so. The service will display a weather page in xhtml that can display has the default page of your phones. Weather alerts will replace the normal temperature and forecast whenever your location has a weather alert.

You will need to register with Weather Underground for an API key but the 3 requests per run at 4 times per your will be well under the 750 requests per day that are permitted with a free API key. If you need more than one location (For multiple offices) then you may need to use a paid API key.

Removing Spam User Posts

The Problem
Faced with a deluge of unwanted user posts across a set of blogs the perils of leaving the new new user registration creating ‘contributors’ instead of ‘subscribers’ are far more apparent. The number of posts involved was over 15,000 on two of the blogs.

The posts unfortunately were not of avid on topic fans but the endless spam variety. The English strangely unreadable. It seems that the majority of these promotional posters were using automated bots that modified the text to avoid the text having a familiar signature to the anti spam tools. Multiple adjectives in each post were substituted for similar thesaurus style substitutes often rendering very clumsy sentences. For example the ‘Different Minimum Tax’ instead of the ‘Alternative Minimum Tax’. Not to mention the fact that nearly all the posts were off topic. I’m sure that the brands involved were paying for these posts at a few cents at a time as an SEO tactic.

The Tools
Wordpress unfortunately does not provide effective tools for managing (deleting) these thousands of posts. After some experimentation with ineffective plugins. The solution I arrived at was:

Setup

  • Switch Setting/General to create users as a Subscriber
  • Install: ‘amr users’ and ‘User Spam Remover’ plugins.
  • Go to ‘User’ / ‘User Spam Remover’.
  • Press ‘Remove spam/unused accounts now’
  • Repeat pressing until less than 1000 users are removed.
  • Go to the bottom of the menu, ‘User Lists’/’General Settings’
  • select a ‘Default rows per page’ size of 350 (See note below)
  • Under ‘User Lists’/’Caching’ deselect ‘Do NOT re-cache on user update’

Removing the bulk of Posts
Now to remove the spam posts we are actually going to remove the users that created them. This prevents further use of the account and also deletes all the associated Posts and Links. But how to identify the users without reading all those thousands of posts? What I have found is that across a series of months 95% of the users will arrive in a single week or ten day period with tens of users being created on those spammer arrival days and only a few a week the rest of the time. So I have resorted to deleting all users created in those problem weeks. The ‘amr users’ plugin above provides the ability to sort users by creation date.

  • Select ‘Users’/’Users: Details’
  • Select the ‘Registration Date’ sort order once
  • Find where complete pages of spam users start (Probably page 2)
  • Select the whole page with the tick box at the upper left of the table
  • Scan down to make sure that all the entries are part of the spam sequence
  • Select the Bulk Operation Delete
  • Hit confirm on the delete button (This can take a few seconds)

All of the associated posts and links will also be deleted. Once you have done this for the bulk of the users you may want to reduce the default page size and do a more detailed scrub. Then go back to the now reduced list of posts and look for spam posts. If tags are used select all posts used by that tag. Open the users page in another browser tab and search for the associated user, hover over the row to display the additional options and delete the user.

For users using a display name notice the email address and manually find the email address. Do this as the last step so that you aren’t wasting time searching through a long list.

The Future
Now the next question is. Now that you perfected these tools could you manage the blog while still allowing user registration to create a contributing user. I think that depends how much time you are spending writing the blog. Deleting spam if you leave the default as contributor will be a couple of hours per month of effort but if that would be too much time for you then leave the default as ‘Subscriber’. I have left it to ‘Subscriber’ on all but one of my blogs.

Note
Regarding the number of rows per page. The number of rows in a page is determining the largest number of items you may delete at a time. 400 items will send to large request to the server and immediately fail, hence the choice of 350, however if there server is slower or otherwise constrained you may need to use a lower number. If many users has a very large number of posts the deletion will be slower. Of you are using the S3Cache plugin there is a two minute timeout imposed. In some circumstances I have found that I need to reduce the page size to 150 in order to get deletions to complete. The failed/timed-out deletions do not appear to cause an problem other than the error messages you see but you will have to repeat the deletion so moving to a lower page size in order to achieve dependability is worthwhile.

Identifying your Linux version

To identify the linux installation that your hosting service is using try “uname -o”. If you need more details try:

  • uname -a
  • cat /proc/version
  • cat /etc/issue
  • uname -r
  • cat /etc/*version
  • cat /etc/*release
  • cat /etc/*config

To understand if it is a 64bit or 32bit system look for a 32 or 64 in the result.

The following strings indicate 64bit:

  • x86_64
  • ia64

The following strings indicate 32bit:

  • i386
  • i486
  • i686

Under Ubuntu

/usr/bin/lsb_release

will provide the version string.

The command “uname -m” should also indicate the machine type.

If in doubt another indicator is to look at an executable such as “file /sbin/init” where the result may indicate something like “ELF 32-bit LSB executable”.

The performance difference between the two is typically 4% to 10% but can be as high as a modest 20% so stability and availability of drivers is probably a more important consideration.

Installing the s3cmd to Access Amazon S3 cloud file storage

A convenient way to access amazon S3 storage is with the s3cmd. It is similar to an ftp command and rsync combined so as to allow you to script and automate copying of files into S3 storage. You might do this to mirror your static files in S3 from where you are using them for CloudFront as a CDN for faster web access or it might be a method of backing up or distributing software for S3.

To install the command you need to be on a UNIX machine. It can be a dedicated host, virtual host, in the cloud or simply your MacBook.

Check to see the latest version at http://sourceforge.net/projects/s3tools/files/

cd
mkdir Downloads
cd Downloads
wget http://sourceforge.net/projects/s3tools/files/s3cmd/1.0.0/s3cmd-1.0.0.zip
unzip s3*zip

cd s3*

python setup.py install

This will give an error if it is not able to place the resulting command into some root owned system directories however if this is the case the command still generates a local copy which you can further copy into a personal bin directory under your account allowing you to use the command. Instructions to copy the command to a personal bin directory:

mkdir ~/bin
cp s3cmd ~/bin
cp -R S3 ~/bin

You should now be able to execute the command and need to configure it to have access to your Amazon S3 account. Get the keys from https://aws-portal.amazon.com/gp/aws/developer/account/index.html

cd
s3cmd –configure (That is two hyphens)

If the s3cmd isn’t found add the following to your ~/.profile so that your PATH is searching ~/bin for commands.

# set PATH so it includes user's private bin if it exists
if [ -d "$HOME/bin" ] ; then
PATH="$HOME/bin:$PATH"
fi
then type:
. ~/profile
s3cmd –configure (That is two hyphens)

If you run into problems with prerequisites (Software that needs to be on your machine before you run the process above) then please leave a comment. For example you may need to install Xcode (The developer package in the AppStore) on the Mac if you find that you do not have python or wget installed.

The command can be used for example to copy a file or synchronize a directory. In these examples the list of buckets defined (Either from a command line or in the Amazon AWS console) is listed and a set of javaScript files are placed in a S3 bucket.

s3cmd ls

s3cmd put –acl-public –guess-mime-type mymasterdirectory/*js s3://somebucket

Using Amazon Web Services

I am writing a series of posts related to different aspects of Amazon Web Services (AWS) as I increasingly use AWS for new projects. Different projects will need different levels of scalability so different services make sense. In particular the use of geo location to retrive access to the dynamic elements of a website from the nearest datacenter is not important for a local business. While load time is important for most websites and the use a content deliver network (CDN) to host static files such as images near the user of the site the additional effort of a couple of hours may not even be justified for a site with minimal content and little use of the site.

Some serious complexity arrises for sites which have authentication in the form of a login and which need to operate across multiple regions but this in many cases can be handled in a less than perfect manner to reduce the complexity and cost of the implementation. The final stages of segmenting data from different users so as to reduce the volume of data being synchronized across databases is necessary only in situations where the frequently required user related data is greater than 5 GB causing data not to be cacheable in memory. Once over 100GB of database data is needed by the system as a whole it may also be necessary to segment data over multiple databases in order to improve the manageability of the databases.

The series of posts will include:

  • Use of online storage (S3)
  • Relational Databases (RDS)
  • Authentication
  • Database Synchronization
  • Geo Location by DNS
  • Geo Location with Edge Servers
  • Data Segmentation with Edge Servers
  • Workflow management with SQS

There will also be some pre-requisites such as the installation and configuration of tool sets which also warrant separate posts.

Network services v Cloud Services

Having spent a good part of yesterday trying to get a file share working from a new server to a variety of Mac and PC machines around the home office I can’t help reflecting on what a complete waste of time it was. The elusive goal was the same functionality that is offered by Dropbox.

Instead of using this I had to format a drive. Plug it into different machines to do this. Deal with fact that OS X file systems were not supported and that OS X 10.5 can read but not write to NTFS. At the end of the process I was left with stubborn permisioning errors from the Mac which need resonlution through changes to the SMB server which had a hard wired smb.conf.

Another elusive goal was media server services being able to operate from the device. This is of course of no use for any equipment in a primarily Mac based home. The services which failed to materialize after about ten hours of research and s significant hard cost on equipment were no more functional than with dropbox and youtube. In fact the throughput of the local server at 3MB / second was less than the FIOS line to online services.

Once again one is reminded why the changes to the cloud are happening. It saves time and money and actually works better than integrating solutions ones self.

The Chrome Browser as an OS

Having spent several days using chrome 8.0 (Available for both Windows and Mac) it is clear how the new Chrome application store and Chrome apps could replace the overlapping window environment that we have been playing with sine the 8O’s.

The applications appear as windows which are tabs within the browsers. For the technically sophisticated we know that different technologies are being used as compared to native applications but for the end user it is the user experience that is so improved. They now have tabs. Tabs which don’t obscure each other and which can be managed and grouped. If you want to see a natural extension of how this can work use the new Beta of Firefox 4.0 which allows you to group tabs into a desktop like layout.

The user experience is so much improved that even users who are unaware of where there files actually are stored and where they are cached will prefer the experience and people will choose to shift towards the environment. The new CR-48 which demos the Chrome Operating System (OS), is going to be just the beginning of the first of many tools using Chrome OS which assumes that the Chrome browser is the sole user interface and which supports the use of cloud based tools and a client which is again a thin client with cached data from the cloud.

Maybe this time around the thin client will succeed as the setup of the server end is a one time process within the cloud. I think on the tablet type environments this outcome will appear natural and there will be users flocking towards low cost tablet solutions providing a Chrome/Chrome OS environment.

Creating a Bash Shell Progress Indicator

In writing a script which iterated over a matrix taking over a minute for each ‘column’ of the matrix I decided it would be much more meaningful to see the progress as a percentage. The following snippet of code does this for you.

The line:

echo -n "$((${NUMBER}*100/${MAXPROG})) %     "

calculates the percentage of NUMBER/MAXPROG which in this case was a number that indicated the progress across an ordered and evenly spaced set of numbers. Your application will of course vary but NUMBER should be something that indicates how far along the process is as a percent where as MAXPROG is some value that indicates the maximum amount of progress. How you calculate NUMBER will depend on your application and the accuracy and ease of calculating the progress is of course your real task.

The echo -n means that the percentage is printed without moving the cursor to the next line. The series of spaces after the percentage symbol, and the double quotes which cause these spaces to be printed, are there to cope with a situation where you are resetting the percentage or where NUMBER doesn’t always follow a strict numerical sequence. i.e. having shown 15% progress you suddenly estimate that it was actually only 5%. In this case it may be that you increased MAXPROG.

In the script below I calculated MAXPROG as the number of items in the list. Your mechanism will vary. Using a number as the value from the list will only result in a true percentage of progress in a situation where the list is a list of incrementing integers from 1. i.e. counting. You will have to calculate NUMBER in a way that is appropriate for your application as well and operate the loop in whatever way makes sense for your situation.

The echo -n R | tr ‘R’ ‘\r’ is really not elegant. It is just a way of me printing a carriage return which is portable. A carriage return code was used on early electrical printers as a instruction to move the print head back to the left and start again at the first column. It moves the cursor to the left and again due to the -n flag does not move the cursor to a new line. As a result the next time you go around your loop the percentage will be printed over the top of the first percentage.

The final echo statement moves the cursor to a new line so that your prompt is not printed over the top of the final 100%.

#!/bin/bash

LIST="1 2 3 4 5"

MAXPROG=$(echo ${LIST} | wc -w)

for NUMBER in ${LIST};
do
 echo -n "$((${NUMBER}*100/${MAXPROG})) %     "
 echo -n R | tr 'R' '\r'
 sleep 2
done

echo