Thursday, December 1, 2011

Google Verbatim Search

Recently, I've found that the quality of search results from Google and the lack of the '+' operator have driven me to trying out all sorts of new ways to find technical info on the web. I've tried duckduckgo and Bing, but they are actually quite a bit worse. I've found that the verbatim search in Google is quite handy but is a pain to get to. The alternative is to enclose all your important keywords in double-quotes.

Since I use Firefox the most, I decided to put in a custom search provider that basically performs a verbatim search on Google. After looking at the URLs for two searches, one with the verbatim turned on and the other without, I found that you need to append "tbs=li:1" to the querystring to enable the verbatim search.

I fired up Notepad++ and edited the "google.xml" file in the "searchplugins" folder of firefox's folder (C:\Program Files (x86)\Mozilla Firefox\searchplugins on my comp). I made the following addition to the XML:
<Param name="aq" value="t"/>
<Param name="tbs" value="li:1"/>
<!-- Dynamic parameters -->
<Param name="rls" value="{moz:distributionID}:{moz:locale}:{moz:official}"/>
<MozParam name="client" condition="defaultEngine" trueValue="firefox-a" falseValue="firefox"/>
I had to save the file as "google-verbatim.xml" and change the name of the plugin to avoid a conflict because apparently saving over the existing one does not make it refresh in the browser even after a restart. After restarting the browser, I was able to choose the new search provider using the dropdown in the search bar:

Now all the searches from the search bar are verbatim. It's been helping a lot!

Labels: , ,

Tuesday, August 2, 2011

Faking a file upload to Sharepoint

I am working on a project that requires me to do some heavy lifting with javascript in Sharepoint.  The project involves using a custom forms web part (which mimics the old Microsoft CMS forms functionality) to create a page where a user can browse a list of PDFs and opt to receive an email with a formatted set of selected PDF articles (links to them, anyway, with a title and description).

The problem with this web part is that you need to edit the page, paste in the XML, and then hit a button in the form to submit before you publish the page.  Hitting this button can't be simulated (I'm going to call this the Evil button from now on).  Further, in order to send out the custom email, it doesn't rely on adding an item to a list: you need to upload an XSLT file to parse the form submission.

Since this "application" is supposed to eventually be presented at trade-shows (as well as openly on the web), there is the distinct possibility that there might be some javascript injection.  The forms web part is pretty rigid in what it allows, but how would we transfer the PDF titles and abstracts over to the XSLT file?  It would be simple to do if we are editing the XML and XSLT manually each time as we could just code it right in, but for a better experience, I am opting to let the client use a cutom list to hold all the PDF information.  This is then pulled into the page via a content query web part (I am using this in place of SPServices now for certain situations: javascript being disabled and/or a need for caching).

The problem now is getting the PDF details into the XML and XSLT in such a way that we are not relying on it being transfered through the form itself, as this can be manipulated via javascript to send out arbitrary links or text in emails (worse, coming from the trusted company domain!).

The best solution I came up with was to make a compromise when editing the form.  After changes are made to the custom list, someone has to go edit the form page and hit some buttons to auto-generate the XML and XSL through javascript, and then the person would hit the Evil button to save the form definition.  They then have to copy the XSLT to ta local file and upload it to the right place.  The PDF title, abstract and link are all contained in the inaccessible XSLT file and only the ID is displayed on the form, which allows us to determine which PDF was chosen on submission - just what we want.

On digging a bit further, I found that you are able to fake a file upload to MOSS via the copy web service.  The web service has a loophole in that while copying, you are able to change metadata of the what is being copied, which includes the contents of a file.  Using SPServices, I was able to inject the XSLT contents into a fake file copy (pretending to copy a file in from a different location) during the transfer, so that the file contained my XSLT after.  The thing about the copy web service is that it completely ignores the initial file being copied if you specify some file contents.  This is a Base-64 stream.

So converting the XSLT to Base64 and injecting this into the XSLT file allowed me to bypass the need for someone to manually download the XSLT and upload a file to MOSS which could have opened the door to a great many things-that-could-and-probably-would-go-wrong.  The injection still retains my logged-in user information in the last update, so the XSLT file shows that I modified the file and also indicates the time of the modification.  The copy also creates a new version of the doc when it finds a duplicate, so we are able to keep the filename and just copy over it repeatedly.

Now the whole process has been distilled to editing the page, pushing the Evil button (all the parsing is done on page load to read in the latest values from the custom list and generate the XML and XSLT), and then pushing the "upload XSLT" button.  The Evil button causes the page to reload, but the upload button doesn't - it just updates a status when done.  The user then publishes the page and the form is ready for the end-user to use.

This whole process took a week to engineer.  I have learned a few very useful things this time - the file copy web service is quite useful, forms architecture is much better handled as a custom list item addition triggering an email workflow, and almost anything is possible in middle-tier development!

Labels: , , , ,

Tuesday, June 28, 2011

What I've come to understand about NoSQL

I've been following the movement for a while now, and have a few favorites that I want to succeed, but I am still waiting for all of them to mature more.  I want to provide an overview of my understanding of what the movement is about and how I approach their use, since they are not general-purpose.

The first thing I've understood is that although the name suggests "no" SQL or as some may say, "not only" SQL, I think that NoSQL is not really related to SQL at all, apart from both of them being able to use the same data.  Whereas SQL is all about relationships and normalizing data, NoSQL is basically going in the opposite direction, supporting at times completely denormalized data.

In order to understand this about NoSQL, I had to first come to grips with why it was needed.  Until an organization or developer reaches a stage with their database where they have to perform either incredibly complex queries spanning many tables, with various unions, joins and optimizations, there really is no need to deviate from the excellent querying abilities of a relational database.  MySQL is massively scalable (as are most commercial databases).  However, there comes a time when you really need to start 'sharding' your data, and this starts off the process of denormalization for performance.

For instance, take the typical 'user' table.  In a normal site, you create the user table to hold all the user and login data, and then you join this to other tables to get at user-specific information.  Once you cross a certain massive number of users, this model creates bottlenecks with querying, with the same tables being hit many more times than necessary with expensive join operations.  You also realize that it doesn't make sense to have the login info set with other user data, since that is contributing to this problem.  You can deal with this either by sharding the rows of the user table (keep chunks of rows separate from each other so that queries for specific groups of users would go to a different physical locations), or by creating separate tables for each user and letting them join to other tables specific to them.

What you are starting to see are the seams in the process.  The relational database was not meant to function like this.  The real problem comes from synchronizing the information across your server farm(s).  The user data needs to have fall-backs and backup machines, but the data in them needs to be consistent.  Sharding your data helps with balancing the load on your database server, but adds overhead in lookup times.  Also, your sharded data needs to be spread out well, and therefore needs consistency checks.

This is the whole problem with big data - beyond a threshold, the model is not able to maintain reliability, consistency and availability.  One or more of those need to be sacrificed, and with relational models, that is availability.  If your queries take too long though, you are likely to lose your users completely.  How do you go about providing a good balance of all three to a user when you have a massive number of them?  Note though that we are only talking about users because that is the example we started with.  It can be anything: sales orders, logs, interaction data, basically anything with the capacity to have logarithmic expansion in data as your user base grows larger.

I'm going to discuss two of my current interests with regards to this problem of big data.  The first one is going to be a graph database (neo4j).  In order to provide my best use-case for a graph database, I'm goign to discuss my pet project that I've been working on for 2 years now, on and off.It is still nowhere near a state where it can perform anything, but I am slowly realizing what parts I need to make it work.  The problem has always been with data structures.

The problem at its core is organization of information.  At the very root of this is the concept of a personal database.  Most people would never think that they need one, and they definitely think they would not want to bother with programming them in order to use them.

Let's take a small example: let's say that we want the user to be able to store their workout information.  There are a lot of gym apps out there, so we are safe in discussing this one.  When deciding on what to store, you would consider things like sets, reps, weight, time, etc. but then you start to wonder if you should maybe consider muscle groups, and meta-info on each exercise (maybe a short video showing someone performing the exercise with correct technique).  In order to capture all this information, you decide to use a relational database.  You create tables for 'exercise', 'muscle', 'user', 'workout_template', 'actual_workout', etc.  You then create all the associative tables to properly normalize all the information.

Let us try to query the information that I need to determine whether I satisfied my workout requirements today (after I just finish a workout).  I would need to touch every single one of these tables with the exception of the muscle-related tables.  Worse, I might need to run union queries to account for all the different templates and set/rep combinations.  It all depends on how detailed you want your results to be.  Let us assume here that in my usual style I want the kitchen-sink approach.

The relational database would work here, but it is not ideal.  This seems to be the trend with the whole NoSQL movement: a relational database would fit here, but it is not ideal.  In this example, there are too many relationships among the data.  It would be much better to just have all of this data denormalized to help in the queries and not in the inserts/updates.  I am of course looking at the problem from the point of view of someone that has been using the app for a long time and has a lot of data to comb through (let us say that there is about 2 years' worth of workout data, and the user went to workout almost every week, five days a week with five different workouts and had some seasonal work of going heavy/cutting).  Querying this beast would be very difficult with SQL.  What if I added in notes every workout or guaged the difficulty of each workout?  The problem lies with the fact that we are segregating the information by type and not by intent.  Sets are sets and indexed as sets, reps are likewise.  When you query by intent, but store by type, you are not taking advantage of the platform you are using.

Let us consider the graph approach.  The graph database assumes that everything is connected - you just have to fill in the specifics.  It would have no problem with querying something this complex, since it stores data by intent.  Data is just stored, but the relationships are indexed, and this makes combing through them for intent very fast.  I should elaborate a bit on querying by 'type' and by 'intent'.  In the former, we are mainly querying information as it is stored i.e. in the relational database, querying all the 'set' infromation to see how many sets I did per week.  By 'intent'-based querying, I mean querying across multiple tables to derive information from it that is not inherent in the way it is stored.  For example, I might want to know how balanced my workout is for my body and watch it over time.  Perhaps I discovered last year that I was working out my chest a lot more than my back and decided to change it and want to compare how I am doing now vs. how I was doing last year.  This would be very difficult to do efficiently with the relational model, but might be easier on the graph database, if your front-end was able to handle the data properly (in the case of neo4j, the java/android program).

There is also a case for CouchDB here, as all the workout data can be stored as one unit of JSON.  Since I don't expect to go editing my workouts all the time (unless I am cheating), the append-only structure makes using Couch more advantageous due to low lookup times.  I can also query the changes feed for timeline-based queries.  This approach cuts down on the amount of implicit relationships I have in the data.  I probably really only need a few "large tables" worth of different types of JSON.  For example, I might have a "workout_template" and "workout_actual" to store all my workout information.  Any analysis would then make use of the front-end to handle some of the leg work.  In the case of the query for my muscle-group balancing, it would be trivial to write a map-reduce query (three, actually, one synchronous and the others asynchronous) and use the results to manipulate in javascript (or just display in a tabular format).

I think that most of all, I feel that NoSQL solutions are completely separate from relational ones.  However, mixing the two might or might not make sense for your data.  The overhead of connecting to different datasources might be a lot less than the overhead of stuffing all of your data into one kind of data store and trying to query them in ways that are inefficient.

Labels: , , ,

Thursday, March 17, 2011

Change of direction!

I've basically been using this blog as some kind of appeal to potential employers, talking about what I think makes me a good resource to work with, but I think the space is better used to keep track of actual code that I write, and problems that I solve.  You should probably not be expecting me to solve P vs. NP or anything like that (whatever happened with Vinay Deolalikar's proof anyway?)

There are actually a few more reasons for this; a major one being that I will be migrating this blog itself to a custom one that I want to make using either couchbase or neo4j (haven't decided as yet).  I am leaning towards neo4j to try and develop some tag-based traversal/retrieval.  Another reason is that I find myself wanting to go back and remember how I solved a particular problem, but not being able to remember it.  It would help to be able to track these items for posterity.  It might even help out others that are searching for a solution.

Today's very small problem solved is one relating to Sharepoint SOAP calls to a list that you want sorted by a custom date field (Mine was called "Visible Date", which is internally represented as "Visible_x0020_Date".  A space is converted to "_x0020_" in Sharepoint.)  Sorting in descending order requires the CAML query to be as follows:
<Query>
    <OrderBy>
        <FieldRef Name=\"Visible_x0020_Date\" Ascending=\"False\" Type=\"DateTime\" />
    </OrderBy>
</Query>

You need to specify the whole thing in the Query tag.  You also need to capitalize name, strangely enough.  Lesson of the day: watch out for capitalization in SOAP calls!

Labels: , , , ,

Friday, February 4, 2011

Recap

It's a bit late to recap the year of 2010, but since I mentally make milestones from birthday-to-birthday instead of arbitrary Dec 31-to-Dec 31, I suppose I am actually in the middle of the year I am going to recap.  The next one will in fact be in May 2012 :)

So, what did I do in 2010?  I studied a handful of books:
  • XMPP: The definitive guide
  • C# in Depth
  • F# For Scientists
  • Expert F# and
  • Beginning Blackberry Development
I experimented with these in various ways.  I made a small Blackberry app, which I had grand visions for; but I abandoned this when I realized that RIM was going to dump the O/S in favor of a new one.  At the time, I also wanted to get into Android development as well, so it meant that although I was still going to work in Java, I was working on a better platform (that can actually use generics, among other things).

I worked through quite a few Project Euler problems, and then grew a bit weary of F# in general.  I just didn't have any interesting projects that would make use of it.  I would still be very open to anything that came up, but overall, I think that I won't be pursuing functional programming right now.  I suppose this will change next year, after I have experimented with other things that I will write about towards the end of this post.

C# in Depth was quite a good book to read.  I was often confused about the different versions, and the book not only points out what changed, it actually takes you through different techniques and examples, showing you what it looked like in the different versions.  Version 4 came out after I read this, and it is actually quite powerful.  I suppose you don't really need F# if you have C#, but there are some really good things about F# that make programming in it much more pleasant.  There are a lot of things to like about F#: computational expressions, active patterns, even the syntax itself looks very pleasant (if you don't mind whitespace instead of braces).

XMPP was the most interesting thing I delved into last year.  I had dabbled in some Jabber implementation very early in my career, but didn't pursue it at all.  When I read through the book and went through the docs online, I saw that it was actually being used everywhere I looked.  Later on in the year, I found out even Facebook chat was XMPP-based (as is Google talk and Wave [RIP]).  I was quite excited by it and ran an ejabberd server for a bit while I tried to figure out some server components to add in.  It was difficult to start programming for XMPP though, with a lot of big players in the market.  Since I was trying to use F# in those days, I wanted to use C# libraries, which are all closed-source and needed licensing.

Setting up XMPP is not easy.  The servers are fine, but adding in TLS and trying to program for it seems to be very difficult to get into.  I was able to get TLS working on ejabberd, and was able to chat using Trillian with Google talk accounts.  However, I didn't enjoy wrestling with all the different libraries to try and extend either servers or clients.  I later discovered that there are better implementations, but they haven't been around as long and decided to wait it out.  I am talking about telehash (which is gaining a lot of traction these days), and also STOMP.  Anyway, I am now looking into node.js for similar functionality.

With respect to work, I mainly spent the time working with Sharepoint, and have reached a stage where I think I am content with moving on.  There isn't much left to work on, and I think I will not move on to Sharepoint 2010 as I had planned.  I successfully managed to implement a "minimal master page" and managed an HTML5 boilerplate-style reset, which is actually a lot harder than it would seem.  I had to get rid of Windows on my server though, to start experimenting with node.js.  The reset was actually quite stimulating, and if Silverlight wasn't all over 2010, I think Microsoft would have had a lot more success with it.  If I had continued with my experimentation, I would have sought to completely replace Silverlight with HTML and javascript, making it a completely HTML5 (now just HTML) solution with a minimal master page (FTW).

I was approached to work on a Drupal site for a large company here, but after a bit of soul searching, I think I've come to the conclusion that I am not content with CMS development.  I think I've done this kind of thing for 5 years now, and I am discovering a lot more new stuff now that excites me.  I'm going to let it plateau for a bit as I explore the following technologies:
  • Neo4j - a graph database, which allows you to query large amounts of relationship data.  You can think of it a bit like LinkedIn, which is their showcase example.  Think of the query which lets you display the connections withing 5 hops of you in some kind of field like "programmer".  Doing this in a relational database wouldn't perform very well with a large data set.  I am quite excited to be working on a test neo4j database, trying to work out how to solve these kinds of problems, because these kinds of problems are actually quite common-place.
  • CouchDB - this is a document-based noSQL solution, which is great for storing and retrieving data that doesn't have set form.  What I mean by this is that whereas you would hard-wire relationships between entities in a relational database, in Couchdb, you generally "shard" them into "databases" and then code the relationships in your code itself.  This way, you are able to store a piece of information, along with all of its related info in one place, and slice it up as you want using code (in this case, map-reduce in javascript).  The data is all stored as JSON, and you can attach arbitrary binary data as well (though retrieving these is slow).  I see it as being great to store things like blog posts, comments, etc., but it has quite a few applications.  I will be using it a fair bit this year, I think,
  • Node.js - This basically allows you to process a lot of simultaneous client connections on the server, where you run javascript.  It basically can accomplish most things that XMPP would be able to.  Chat, games, presence info, etc. can all be programmed in, and it can connect to couchDB for added javascript goodness.
  • GWT - I am quite happy with how GWT is performing.  I have a unique need for it, as I am working towards an app for educational institutions, which has a dependence on Google Apps for Educational institutions, Google APIs, and Neo4j; and needs a web front-end.  This fit the bill nicely.
  • HTML(5), javascript, CSS3 - The cornerstones of what I do will not be neglected this year.  I think out of the three, javascript is going to be in everything I work on, but I will still keep my eye out as IE9 pops out and the browser wars continue.  Let's hope that IE6 dies a slow quick death this year.  I am also going to be taking a closer look at user interface this year.  There are a lot of problems that I'm working on that actually translate into how well you can get the user to enter in what they want.  The interface solves a lot of challenges that you can't possibly account for in the back-end, and instead of second-guessing, just make it so the user can tell you what they want, using the broadest toolset, which needs to be as broad as it is abstract, without alienating the user and having the shallowest learning curve possible.  It's actually a huge problem, but I'm hoping to come up with some solutions that can be used as a base in future work.
I suppose that covers the gist of what I am interested in these days.  I am always on the lookout for interesting projects, but I think I have my hands full with my own ones right now.  The best part of that is that the things I am working on are translatable to things others are also looking for, so I'm hoping this pays off down the road.

Labels: , , ,

Friday, January 7, 2011

Progression

I've been struggling with one aspect of Sharepoint for over 4 years now, and have implemented many different solutions to deal with it, and over time have come to understand what the problem is at a fundamental level.  The problem is quite easily defined, and one would be forgiven in thinking that a solution would be equally simple: how do you present a custom-styled view of dynamic data in a Sharepoint page?

The problem of course takes on certain attributes as you go deeper, and there are a few caveats.  For one, let us assume that we cannot simply create a new web part, nor can we extend an existing web part.  We are only able to use the Sharepoint front-end (which is not uncommon).  Also, we need to be able to use a WYSIWYG tool to put in the actual content.  So if we are displaying a stylized news feed of the top X stories, the news items are added into the news list with a WYSIWYG tool.  It's not so important if you think of the presentation as a table-view of titles and dates, but it becomes critical when you think of a carousel-view, with rich imagery and stylized text for each item (with a link to read the full article).

When I first started out, I did my research and correctly arrived at the conclusion that the content query web part was what I was looking for.  However, I came to realize that this is a very difficult web part to figure out.  For one thing, it is not clear how you go about customizing the XSL, which I later found was coming in from three different files.  When I realized that I needed to edit the ItemStyle.xsl file, which is shared across the whole site collection, I gave up and decided that this approach was not the right one.  It was not clear at this time that one was able to change the web part to direct it to your own versions of these files (and at the time I would not have had the XSL knowledge required to do it anyway).

I then explored the possibility of using the XML reader web part, and this remained my method of choice for a couple of projects.  I was able to retrieve the RSS feed for the lists inside of this web part and parse the XML to render the items that I wanted.

The problem with using the RSS feed is that it is, well, an RSS feed.  For some inexplicable reason, the RSS feeds that MOSS generates don't actually make use of XML at all.  Below is a snippet of one such feed, which is pulling items from a list containing some custom fields:

Original list - Note the column names

RSS - Try to find the column names in the "XML"
What you can see above is that MOSS doesn't actually present the column data as XML - it creates a "description" tag and then just dumps all the information into that tag.  What's worse is that there is no reliable way to differentiate the column headings from the content at all.

There is an additional problem with using the RSS method - there is no authentication.  MOSS requires that the RSS have anonymous access, which means that if you are just using it to shuttle dynamic data between pages on the site, your content is visible to anyone with the URL.  In most cases this wouldn't be a problem, but it is definitely a security hole in terms of content.

It became apparent that another solution was needed.  At this point, I revisited the content query web part, still convinced that it was the best solution.  Eventually, I came across a few very illuminating articles, which outlined exactly what I was trying to accomplish:
I had a huge epiphany when I realized that it was possible to redirect to custom versions of the 3 XSL files that the content query web part uses.  However, it is not actually accessible through the MOSS interface.  You have to export the content query web part to your computer and open it in Notepad++ or an IDE.  After editing the web part, you have to upload this back to the page using the MOSS interface.  There are fields in the XML allowing you to specify alternate locations for the 3 files (MainXslLink, ItemXslLink, HeaderXslLink).  HeaderXslLink is not that important.  However, the other two are vital.

The problem with the web part in general terms is that it is similar to the RSS feed: the data is not presented as XML; there is just a dump of all the content fields into a "description" tag.  It's inexplicable as a design choice.  But there are ways around it, as I found.  Firstly, I had to remove the table layout completely.  To do this, one has to manually edit the XML in the MainXslLink file and alter the grouping mechanism.  Basically, strip out everything to do with creating table, tr, and td tags, which is actually quite involved (most of it is in the OuterTemplate.Body template).  Then, you have to alter the OuterTemplate template to specify your outer div and optional ul.

The web part file that you edit also has other fields, which allow you to select which fields from the list you want displayed (CommonViewFields) and allows you to rename them (DataColumnRenames).  CommonViewFields is tricky to use - you need the exact column name but you have to remember that spaces are converted to "_x0020_", and other special characters are converted similarly.  You also need to know what the data type is, although you are able to specify Text and get back a textual representation for some fields.  There is a convenient list here.

This brings us to ItemStyle.xsl, which is the real beast.  It controls what happens for each record that comes up in the list.  If you chose to wrap all this into a ul above, you can generate a li here for each item, or you can use div instead.

I found out the hard way that you have to remember to escape all sensitive characters like &.  Your XSL should also be perfect or the web part will simply die and not tell you why.  This is because there is no way for us to communicate with the parser.  To get around this somewhat, I used Visual Studio, which is good for debugging XML and XSL.

After all this, I was able to pull out the data presented in HTML that was able to be styled through CSS.  Another benefit was that this web part is able to be cached, saving us some time on page load.  At this point, I channeled my inner GW Bush, Mission Accomplished!

I was able to use this approach for a while for projects, modifying it as I went to learn how to do things like parsing dates into custom formats, varying actions depending on the value of the field, etc.  My XSLT came a long way over the course of this period.

Side note: I use XSL and XSLT almost interchangeably, as most people do, since there is a lot of ambiguity on the terminology.  Please disregard this if you have a different preference.  I use it mainly to refer to the (platform-independent) technique of transforming XML to another format (in this case, HTML).

Then one day I received an oddball request to combine the content from two separate lists into one styled content area.  For this project, we had one main site containing three lists, which stored HTML content that was shared amongst the 50 or so sub-sites.  It was broken up into sections, defining how they appeared on the page.  Each of the 50 or so sub-sites had their own versions of these lists, where they were able to append to the various sections of the page content.  The global content always appeared first, and the two shared a mutual section heading.

I was using two content query web parts to pull in the two pieces of content (there are no filter web parts available, so I had to generate all these manually and upload to the page).  However, how would one combine the two lists' content into one contiguous piece of HTML?  I had to resort to javascript.  We needed to use javascript anyway, to allow show-hide type of functionality.

So I was using jquery and javascript to read in the content and to store the content in a huge array with a hashed section heading as the key.  I needed a separate array to store the positioning information of each section, but it finally worked.

Here is the dirty secret: Up until this time, I had not actually made full use of the custom XSL files. I had only succeeded in removing the table-layout, but I had not been able to think of how to actually make everything be easier to parse via javascript (there had just not been any need).  As a result, on this project, I was using some really heavy javascript to parse all this, and it meant that the HTML had to come in perfectly (which is difficult to ask for when the content is to be maintained by the client via an unreliable WYSIWYG tool). During the course of this project, I realized that I was able to use CommonViewFields in my rewired web part to put in attributes and tags that allowed me to manipulate them via javascript directly.

I was now able to use the content query web part to its full potential by presenting the HTML in a manner that would let me manipulate the content via javascript as I saw fit.  However, I was still waiting for the content to fully render as HTML and the document to be in a ready state before I would then manipulate the HTML into different HTML, invoking a lot of DOM manipulation.

There is also one fundamental flaw with the web part.  It is for some reason, unable to pull in values specified for the "lookup" field, if you have set it to have multiple values.  This is important because you are able to set up one list for managing, say, "Categories" for a blog, and then use a field of type "Lookup" in another list (or blog) to specify as many categories as you want.  The content query web part will not retrieve any, which I found out after a lot of searching.  It is a problem with the web part itself and it has still not been fixed.

It is a big obstacle when you are trying to allow the client to edit their site as they want, but in order to maintain the categories, they need to go into the list settings and mess around in there (which always leads to bad things).  The bad thing about replacing this lookup with just a "choice" type field and putting in the choices manually is that the choices are not linked via ID, so if you change a choice or delete it, it will not automatically change in the existing list items.  If you changed a linked list's item though, this would propagate if you were using a "Lookup" type field.

This is when I stumbled on a technique I had previously ignored as being inadequate.  A mad genius of a programmer that had come before me onto a project had used SOAP to query the database directly to retrieve data from a custom list.  At the time, I had failed to understand how it was possible, and I found SOAP in general to be too cumbersome to use, especially when its 'interface' was being implemented from scratch in javascript.  I eventually decided to re-visit the script, and eventually stumbled on the MSDN articles outlining how to query the database directly, and it opened my eyes.

After trying unsuccessfully to come up with a nice way to encapsulate all this, I stumbled on to a script by another genius.  This is what I am using now.  All the mucking around with XSL has been thrown out of the window, as I can now just use javascript (asynchronously at that) to retrieve the dynamic content whilst serving up the other static HTML content.  It really is something to be able to behold something like this after struggling for 3+ years in search of a solution.  I only use it to query custom lists for items, but the script is actually able to do pretty much anything you could do on a page, including pushing items to the database.  You don't really need to ask the use to submit a form at all in the traditional sense.  You can just use javascript to send their form info to the correct location and save yourself the trouble of manipulating MOSS so that it doesn't show the user the full list-view or having to manipulate the form-view pages at all.

Thus ends a 4 year struggle to find a way to present dynamic data to the user from within a MOSS page.  Today, I discovered that I can use this script in a pure HTML page that I manually upload to any "Pages" folder in MOSS to present content entered through a WYSIWYG editor in a custom list, bypassing MOSS-generated master pages and other markup completely.  With that, I think that I have explored this problem to its fullest, and have just discovered how far I am willing to go to solve something that I think should be solvable.

Labels: , , , ,

Monday, December 13, 2010

Tools of the trade

This is a continuation of the series on what happens behind-the-scenes during the development process when I work on a site or project. Today, I will outline some productivity shortcuts that I use constantly that makes an hour of my time these days worth a few hours of my time when I first started out. I'm going to start out with smaller things and work up to the major ones.

  • First up is a small program called SlickRun. It is a launcher for Windows. It's highly customizable and very handy - it also comes with something called QuickJot, which is a tiny program that lets you put in arbitrary text and stores it for you (it is completely persistable, in that it will re-appear even if your comp crashes). You call up SlickRun with Win+Q and QuickJot with Win+J. These two shortcuts are now almost like reflex reactions.
    Why are they important? Well, pressing Win+Q then "ps" pops up Photoshop, "wiki" and then a query opens up a new tab with search results in Wikipedia, "ie" pops up IE; the list goes on. It's very handy.
    Win+J is also a time saver. It lets me hold something in memory while copy-pasting, like license keys, unformatted HTML, etc. and is also my place to store ideas that I have that I can think about later.
  • In the same vein as the above is a small utility called AutoHotKey. It allows you to program "macros" for Windows. It's hard to describe, but you can make it do just about anything. I've currently got it only doing basic text-replaces for things I type so often that I would get RSI doing it as many times (URLs, my email addresses, the basic HTML template page markup, etc.) It's very powerful though, and I could technically replace SlickRun with it if it weren't for QuickJot.
  • In order to keep track of projects and to estimate my time, I use a program called ToDoList. It is very easy to use, and in all my searching, I haven't found a better tool.  I am consistently getting very good results for my estimations these days, as I can look back on previous projects and use the estimated/actual times from them to help me.
  • Coming in to actual coding, I've got my IDE of choice - NotePad++. It has some excellent features and plugins by default, which save me a lot of time: ZenCoding plugin, escaping " to \" and back, upper/lower case, HTML Tidy, block and line duplication, multi-caret (allows you to type the same thing in multiple places in the file simultaneously), search-replace in multiple files, conversion of tabs to spaces and back, split line by clipboard character, and so many other little things that can all be assigned keyboard shortcuts. Some very common scenarios are below:
    • Inline CSS to block

      Say I had this CSS:

      #elem { background:transparent url(/someurl) no-repeat 0 0; color:#FFF; text-decoration:none; display:block; padding:3px 0 5px; margin:0 5px 0; }

      Now I want to re-write it block-format. All I have to do is copy ";" to the clipboard, highlight the line, hit my shortcut (Ctrl+Alt+1) and it splits it all into lines that I can re-indent with block-level indenting with the tab key.  You might think to do this with a search-and-replace, but this is a lot quicker.  For search-and-replace scenarios, I use The RegEx Coach, which I'll discuss later on.
    • ZenCoding
      This is possibly the the best plug-in for an IDE that was ever made.  It is completely unbelievable that you can type the following:
      table#someID>(thead>tr>th*4)+(tbody>tr*2>td*4)
      and get back this:
      <table id="someID">
          <thead>
              <tr>
                  <th></th>
                  <th></th>
                  <th></th>
                  <th></th>
              </tr>
          </thead>
          <tbody>
              <tr>
                  <td></td>
                  <td></td>
                  <td></td>
                  <td></td>
              </tr>
              <tr>
                  <td></td>
                  <td></td>
                  <td></td>
                  <td></td>
              </tr>
          </tbody>
      </table>  
    • HTML Tidy

      This plugin allows you to take some formatted text, say from a Word document, and convert it to HTML using configurable settings, saving you hours of typing and formatting. Of course, you need to do some work configuring it to be useful (it's not by default), but once you have it down, you can enjoy hours of saved effort.
  • The Regex Coach is a program that I use almost constantly. You can say that I might be a little like the character in XKCD who proclaims, "Stand back! I know Regular Expressions!"

    I don't think I can name a single project where I have been given copy to put on a page, and I did not use this program to reformat the content into HTML for me (usually to shove into HTML Tidy).  It's blazing fast too.  For instance, let's take the example of my previous blog post, where I needed to list out the RSS feeds that I use.  The program only exported the feeds as "OPML", which was not useful to me.  I used the Regex Coach to pull out the important bits (URL and feed name) and used the replace function to generate some nice HTML anchor links as shown in the image below (the four text fields are my search expression, the original string that I needed to format, my replacement expression, and finally the HTML markup I could use in my blog post).


    There is the added advantage that using this program has taught me a lot about regular expressions in general, and I am confident about a lot of concepts like look-aheads, greedy * and +, grouping, escaping, and which abilities are available in the languages I use. Also, I now know a lot more of the inner-workings of the engines, and can plan for optimization. I would not say I am an advanced user of regular expressions, but I am a lot further up than when I first started out.
  • Firebug/IE Developer toolbar/ Webkit Inspector
    These tools are miles apart in terms of their capabilities, but together, they let you debug just about any problem you have with front-end development.  My general rule-of-thumb is: use Firebug first; if there is a problem with it or to debug $(document).ready, use IE.  Use Webkit inspector last.

    I don't particularly like the Webkit inspector as an alternative to firebug.  I find it doesn't scroll into view the correct element, and you can't edit the properties like you can in Firebug.  In fact, with Firebug, you can edit all the HTML/CSS on the page and see your changes live, while using the console to make changes to the scripts or to do further modifications via jquery.

    What Firebug lacks, the Web development toolbar add-on can fill in.  You can peek at all the javascript on a page, all the CSS, manipulate forms and elements, and outline specific groups of elements in  a page (broken images or table cells, for instance).

    I find myself using IE's dev toolbar a lot to debug javascript, as it throws a bigger tantrum for errors.  Some errors just don't show up in Firebug, especially when they are in the $(document).ready function.  Of course, as with all other things in javascript, the error messages themselves are completely useless, but at least you know that an error occurred!
  • Javascript beautifier / Closure compiler

    These two tools go hand-in-hand.  The Closure compiler minifies javascript for you, and the js beautifier allows you to glimpse minified javascript and trace out what others have done.  This is useful in certain instances like when using third-party tools in inherited projects, when you don't really have access to the code but need to debug a problem.  It has helped me before to look at problems with Sharepoint's weird functions for dynamic menus (especially the nefarious left-hand nav), Omniture code (don't get me started), and to catch a glimpse into how certain sites accomplished some javascript that I thought was interesting (20 things comes to mind).
The combination of these and other tools let me confidently declare that the present me is orders of magnitude more efficient than the me from 5 years ago who was just starting out professionally coding websites. This doesn't even take into account the efficiencies in actual code - just in the tools that I have learned to use.

    Labels: , , ,