Made of Everything You're Not

Because there's too much info for my brain.
  • Home
  • Projects
  • Portfolio
  • Resume

Posts Tagged ‘program design’

The Lesson of Adobe Reader

Posted in Programming, Rant on February 25th, 2011 by Eric Lamb – Be the first to comment

One of my favorite parts about programming is the design part of a project. No, not the pretty sparkly pretty design; I know I suck ass at that and it’s just a bad idea for me to even attempt. Instead, I like the part where the program itself gets designed; the part where the order of things gets worked out and you, the programmer, gets to be creative and connect A to B to C on all the parts that were left out of the scope.  Sure, you know you need to add, for example, a form to a webpage but unless anyone told you how it should be written it’s entirely up to you how to do it; that’s the good stuff.

There are all sorts of places people go to sharpen their program design skills, which, and let’s not kid ourselves here,  is a way subjective but difficult specialty; personal favorites of mine are sites like Gamasutra, which has a great postmortem section, and The Daily WTF, books about projects like Dreaming in Code and Showstopper! and, to look at how other programs function and what other people like or dislike about that program, Reddit (of all places).

One of the better posts I’ve seen on Reddit, in regards to insight, was for Adobe Reader. On Reddit the question of why Adobe Reader is constantly updated is often asked but, what with the hive mind and all coming to the proper conclusions, is almost always linked to the “correct” answer:

Ok.. here is a comment from somebory who knows his shit:

the adobe reader you have isnt a simple PDF reader.

TL;DR: Adobe Reader is a huge system and reading PDFs is one of its many functions. If all you care is reading PFDs only then you should ditch it and get Sumatra or Foxit.

Long version:

lets follow the rabbit..

There is a reason its not called “Adobe PDF reader” but “Acrobat reader” or “Adobe reader”. It is a monster of a system.
reading PDFs is one of many functions.
For a project i had to read into adobe acrobat and heck its a real monster: it has

a complete mail server, document lifecycle [sic] management system, DRM client, full fledged document tracking system, form capabilities, statistics for your docs (imagine sending a survey and tracking the collected data), video AND audio playing capabilities (yes you can embed audio and video in pdf) as well as capabilites [sic] for other formats (such as displaying CAD(!) data in its own 3Dviewer).

all in all the full acrobat SDK is like 500 MB and its manual a couple tousand [sic] pages long.

merely displaying PDFs is one function out of like 100.
To you as the consumer its the bait… but the full fledged system behind it is what Adobe sells to its corporate consumers.

they basically say: “You want a full fledged content tracking system? we got it… and the best part is all your customers have the clients already installed! in form of the acrobat reader”.

Its like a monster sleeping in every computer.

see this link. Its the function comparisson [sic] of the acrobat family..

and here comes the scoop: all functions you see are supported by acrobat reader… but you cant use them. They are there so you can provide them to the guys who paid for “pro extended”.

Basically the pro extended package can create all that shit and all drones using acrobat reader will support the functionality. wheter [sic] they want it or not.

And here is the screamer: being a normal guy you will most likely never need all that crap. You know what does it mean when i say ” document tracking system”? its just a fancy word for the dream of every adverstiser [sic]: Corporate customers can track how succesful [sic] their
newsletter, advertising and customer Polls are.

Yup.. they can track how efficient their spam is. And all you sheeples who over the years keep complaining “omg i just want to read pdfs why is the install file soo big” never cared to actually read what is included.

My advice: if all you care for is reading PDFs (and im sure 99% of Acrobat reader users are in this group) install Foxit or Sumatra.

That’s just poor design right there; forcing a large percentage of your users to suffer a poor experience for the benefit of a smaller portion of users is just flat out dumb and one of those decisions I’m fairly certain couldn’t be made by a team that’s dogfooding their project. It’s just basic math; make life easier on the majority.

Bookmark and Share

Portability Is A Good Goal

Posted in Brain Dump, Code, Programming on July 27th, 2010 by Eric Lamb – 2 Comments

For most web developers that I’ve met and worked with, at least, the concept of “hard coding” variable values, especially environment variables, is a definite “I will kill you and your first born if you do this” offense. Through a combination of painful moments, especially in the push to live phase, we all learned just how fucked up hard coding could make a day.

padlock

Portability Is A Good Goal

I’m telling you, it’s a special kind of pain when you’re frantically trying to fix a site you broke through poor planning and execution.

So, we do the most logical thing and abstract out all the system variables into a single point; either a config file or a database usually. For some reason we then go about our task feeling proud that we’ve stopped hard coding, oblivious to the fact that all we’ve done is just minimized the amount of hard coding. And that’s not enough.

According to WikiPedia hard coding

…refers to the software development practice of embedding input or configuration data directly into the source code of a program or other executable object, or fixed formatting of the data, instead of obtaining that data from external sources or generating data or formatting in the program itself with the given input.

Now, while that explanation is appropriate for good old fashioned native development, to be sure, I don’t think it’s applicable to web development because for most sites the database is as much a part of the application as the code is (especially when doing maintenance work). By which I mean that, in my experience, storing environment values inside a database isn’t a good idea unless there’s no other way (sometimes a project requires the rules be broken).

Anywho, for most of us who don’t have the natural, innate, knowledge, learning not to hard code was a tough lesson because when we first started developing web sites it was natural to connect the idea of the web site with the code and server it was running on. Hell, I personally remember being shocked to find out it was actually bad to develop a site on the live/production server; just didn’t make sense at the time (stupid, I know). In hindsight it was an obviously silly and short sighted mindset to adopt but changing that was probably the most important choice I’ve made to improve the quality of my projects.

I was reminded of this with painful clarity when a whole slew of issues came up from a client I’m working with. During the course of transitioning dozens of their legacy sites to a new server, some of which hadn’t been updated since the projects were completed some years ago by coders long since forgotten, quite a few started having weird and, not a little insidious, bugs in the new environment. Looking deeper into the issues revealed a nasty amount of hard coding in not only the custom projects, which I would expect actually, but also from various third party commercial and open source projects that were used for the base of the sites.

Here’s an example of what I’m talking about in terms of your everyday configuration file hard coding along with an example of what I’ve learned to do:

<?php
//bad
$path = '/var/www/mysite.com/html/';
$url = 'http://www.mysite.com';
$cache = '/var/www/mysite.com/cache/';
 
//good
$path = $_SERVER['DOCUMENT_ROOT'];
$url = 'http://'.$_SERVER['HTTP_HOST'];
$cache = $url.'/../cache/';
?>

All thanks to the $_SERVER variable, in PHP (though most languages have some way to get that info), you shouldn’t have to ever hard code the paths to pretty much anything within your site. Note though that when executing PHP through CLI scripts or using the exec() function all bets are off (though there are ways to get around that too like using variations on __FILE__ and dirname()). And, yes, there are circumstances that demand hard coding, I know, but those cases are few and far between and usually have people capable of making those changes.

It was the third party programs that really annoyed me though. I find it a little easier to accept an individual inexperienced coder’s exuberance in coming up with a base solution at zero hour. I’ve been there; an issue comes up and the quickest, and less painful, solution is to just throw the path in place with a perosnal promise to come back later and make it elegant. Then… well, life takes over and the promise is forgotten. Happens all the time.

On the other hand though, when dealing with third party projects, both open source and commercial, this type of hard coding, well, that just bugs the crap out of me. It seems like such an obvious design decision yet Expression Engine, Zen Cart and WordPress (for example) all hard code environment variables into the configuration files.

This is especially irritating to me because it’s been my experience that most websites move to a different server at one time or another, so it’s a given that configuration is going to be changed at some point. Keeping the pain of moving the site to a minimum rates a higher priority to me. And, unless I’m missing something, it seems that there’s very little difference between having your installation/configuration script write $_SERVER['DOCUMENT_ROOT'] versus “/var/www/html” to a configuration file.

Something like (as a base example with no sanitization):

<?php
if($_POST['path'] == $_SERVER['DOCUMENT_ROOT'])
{
    $path = '$_SERVER[\'DOCUMENT_ROOT\']';
}
else
{
    $path = $_POST['path'];
}
 
//write it to the config file
?>

As I mentioned above there are definitely times when $_SERVER['DOCUMENT_ROOT'] isn’t appropriate per the requirements or spec but for most projects that I’ve worked with replacing hard paths with the variable has been effective 99% of the time.

Bookmark and Share

Sometimes, Poor Design Works

Posted in Brain Dump, Programming on June 4th, 2009 by Eric Lamb – 3 Comments

As programmers, we can get obsessed with the small things. I personally have no problem spending hours trying to optimize a section of code, not for performance or to improve user experience or anything like that, but because there’s just something funny about the algorithm.

HTTP Cookie

HTTP Cookie

The professional in us tries, sometimes futily, to keep this behavior in check. Sometimes though you just have to do ”just” one more thing to really make it perfect.  This, more often than not, balloons up into a weeklong chore to, in the end, dig yourself out of the hole you made with zero progress on the initial problem.

This is the way of the code monkey.

Which is why it’s a little surprising to see how HTTP cookies are implemented. According to the official RFC (HTTP State Management Mechanism):

The user agent makes a series of requests on the origin server, after each of which it receives a new cookie. All the cookies have the same Path attribute and (default) domain. Because the request URLs all have /acme as a prefix, and that matches the Path attribute, each request contains all the cookies received so far.

Think about that for a second. On every request the cookies are sent to the server. ON EVERY REQUEST.

Try this; install a FireFox plugin called Live HTTP Headers (if you don’t already have it installed). Start the plugin and refresh this page.

Live HTTP Headers

Live HTTP Headers

You should notice that every request for ANYTHING (images, js, css) sends the cookies up to the server.

I don’t know about you, but I only have one or maybe two, points in an application that can evaluate cookies so to send on every request is a little… wasteful.

I don’t know for sure if there’s more work on the server side to deal with the cookies (though I would imagine the HTTP server has to do something to make the cookies available to a scripting language) but what I focus on is the bandwidth.

Now, I know it’s the 21st century and bandwidth is now fast and cheap. Yay us.

But consider the state of the Internet and bandwidth in 1997 when the specification was first drafted. Most people were using 28.8 and 33.6 baud modems to browse the Internet and 56k was still a year away.

According to the spec:

Practical user agent implementations have limits on the number and size of cookies that they can store. In general, user agents’ cookie support should have no fixed limits. They should strive to store as many frequently-used cookies as possible. Furthermore, general-use user agents should provide each of the following minimum capabilities individually, although not necessarily simultaneously:

* at least 300 cookies

* at least 4096 bytes per cookie (as measured by the size of the characters that comprise the cookie non-terminal in the syntax description of the Set-Cookie header)

* at least 20 cookies per unique host or domain name

User agents created for specific purposes or for limited-capacity
devices should provide at least 20 cookies of 4096 bytes, to ensure
that the user can interact with a session-based origin server.

So, unless I’m crazy, the above makes it acceptable for a site to use 20 cookies, each with a maximum size of 4096 bytes (4Kb). This equals out to a possible 81920 (82Kb) bytes of cookie data being sent on EVERY REQUEST.

This basically means that an image that weighed in at a cool 2Kb comes out to need 82Kb of bandwidth to transfer. Doing the math on a full site with say 20 images, 1 HTML file, 1 CSS file and 3 JS files and it really starts to add up. So at a time when bandwidth was scarce cookie usage was a good way to screw up the user experience if you didn’t pay attention.

Granted, you’d have to be an idiot to write a program that used 20 cookies with each containing 4kb of data, but we’re programmers. Most of us are stupid; some are really stupid.

It’s easy to point at cookies and laugh.

The thing we need to keep in mind, though, is that this isn’t a that big a problem anymore. Sure, once, it may have been an possible issue. Technology almost solved it though. Faster bandwidth, with faster computers made will eventually make any issue just disappear.

Which is kind of the point; something I try to keep in mind.

Bookmark and Share

How to Exploit an Online Poll

Posted in Brain Dump, Programming on April 15th, 2009 by Eric Lamb – Be the first to comment

UPDATE April 20th 2009: Once again Jeff Atwood has read my mind and posted his own version of how to conduct an online poll. As usual he provides an interesting take on the subject, and it’s definitely worth a read.

Not that many people talk about it but there’s a dark side to freelancing. It’s amazing how many people out there are looking for a hired gun to do something illicit to somebody else. I’ve been asked to do everything from installing cracked software to hacking websites and everything in between.

Stupid Cheater

Stupid Cheater

I’m sure the proper thing to do when approached for this type of work would be to politely decline, but freelancing can get a little boring, so I’ve always heard the client out in the hope of finding something interesting.

One such request that I’m guilty of was to exploit an online poll.

(Cymbal Crash!!!)

Basically, the client company was up for some award, where voting was done online, and I was tasked with making sure they won. I made sure they did, and in the process I learned a few thing about how NOT to create an online poll.

First, a disclaimer. blah, blah, blah. don’t sue me. blah, blah, blah. Use at your own risk. blah, blah, blah.

Anyway, to do any sort of an exploit the first thing you need to do is recon. You need to find out as much about the target as possible. In this case we’re talking about a poll on a website so this usually entails visiting the site and absorbing everything.

You want to read the terms of service and any sort of rules for the poll to find out what’s allowed and what isn’t. Not so you can follow the rules but because knowing what the site allows will tell you what they’re protecting against. For example, if the terms of service says something like “one entry per IP” you know they’re tracking IP addresses of registrants.

It’s also important to take a look at the actual HTML code of the form so you know how it’s built. Nine times out of ten the poll’s going to be a radio group. You also want to grab the form action URL as well as any hidden form values.

You’re going to need to know the HTML of the target site like you wrote it.

There’s also the need to cover your tracks. Most, if not all, online polls will record as much information about each transaction as they can. It’s actually easier to record everything, like IP address, referer and the date than it is not to.

What I did, which may not be the best idea I admit, was to create a text file of proxies in order to mask the IP address of the server the script was on. I then set the script to choose a random proxie on every “submit” with a different, random, referrer after waiting a random amount of time between transaction with sending a random user agent.

All this in the hopes of throwing off a manual inspection of the database. A little extreme but if anyone actually looks at the data they shouldn’t see too obvious of a pattern.

Below is an example script that should help highlight the above.

<?php
//the amount of votes to cast in this batch
$amount_of_votes_to_cast = 10000;
 
//the minimum amount of time, in seconds, to wait in between runs.
$wait_time = 60; 
 
//URL to visit (don't forget the "http://"!)
$submit_url = "http://CHANGE_ME"; 
 
//name and value of the form field
$submit_vars["vote2"] = "1";
 
$proxy_file = '/path/to/proxy/file';
$agent_file = '/path/to/agent/file';
 
$proxy_hosts = file($proxy_file);
$agents = file($agent_file);
 
include "snoopy/Snoopy.class.php";
$snoopy = new Snoopy;
 
for($i=0;$i<=$amount_of_votes_to_cast;$i++){
 
	$snoopy->agent = $agents[array_rand($agents,1)];
	$snoopy->referer = $referrers[array_rand($referrers,1)];
	$snoopy->proxy_host = $proxy_hosts[array_rand($proxy_hosts,1)]; 
	$snoopy->rawheaders["Pragma"] = "no-cache";
 
	if($snoopy->submit($submit_url,$submit_vars))
	{
		while(list($key,$val) = each($snoopy->headers)) {
			echo $key.": ".$val."<br>\n";
		}
		echo htmlspecialchars($snoopy->results);
	} else {
		echo "error fetching document: ".$snoopy->error."\n";
		sleep($wait_time);
	}
	sleep(rand(0,$wait_time);
}
?>

So, it’s pretty easy to do something like this but it kind of begs the question: what do you do if you’re building a poll? How should you protect yourself?

Unfortunately, it’s all in the data. It’s important to actually look at your data and verify the integrity.

Look for patterns.

Take solace in the knowledge that if a programmer did exploit your script it’s a safe bet to say he’s pretty lazy (we all, pretty much, are). If it looks like your poll was only taken primarily by Linux users on a Sunday during the hours of 10AM to 2AM on Monday with all submissions posted within a minute or less of each other, you might want to flag those votes.

Basically, there’s a finite amount of proxy servers, user agents and referrers your average programmer will be able to compile. In my example, I only used 150 proxie servers with 20 user agents; even a basic look at the data would have revealed some anomalies. It was pretty half-assed on my part.

But the target in the above story didn’t bother doing any diligence and I was rewarded with a nice paycheck and box full of clothes from the client for just a couple hours of work.

Suckers.

Bookmark and Share
  • Subscribe: Entries | Comments
  • About Me

    Email Email
    Twitter Twitter
    310.739.3322
  • Categories

    • Brain Dump
    • Business
    • Code
    • IT
    • Programming
    • Rant
    • Servers
  • Archives

    • October 2011
    • August 2011
    • July 2011
    • June 2011
    • May 2011
    • April 2011
    • March 2011
    • February 2011
    • January 2011
    • December 2010
    • November 2010
    • October 2010
    • September 2010
    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • November 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009
    • January 2009
    • December 2008
    • November 2008
    • October 2008

Copyright © 2008 - 2012 Eric Lamb - All rights reserved