Dr. katz's closet - Of Man, Women, Life and Computer Science: October 2008

Wednesday, October 29, 2008

Victoly times two!

Now I just found out that its only necessary to alter the "score" in the setup.py file of the syslog plugin. Peeeeeh. What you don't learn from nosetests eh? :P

nosetests who? Or : how winPDB p0wnz us all...

Victoly!!!
Hell yes!
It took almost three weeks, but in the end it was worth it.
You see, nosetest has a small problem with plugins. How did I get to that conclusion?
Well, after finding out that nosetests actually is not only an executable but a library that can be invoked by python code as well (!), and after having found an awesome windows - python debugger : winPDB, a simple debugger with the power of python. You can imagine what comes out of it.

So I made a small python script that includes the nose module (nosetests.py) :

import nose
run()

Complicated, huh?
This small snipped executes nose with the commandline that I passed to the script.
So once this was done, I could run it in winPDB with a commandline like this:

winpdb nosetests.py --sysloghost=localhost --with-nosexunit --xml-report-folder=c:\test test_mytest.py

This opens up the winpdb debugger. After some tracing, I found what I was looking for:
in the run method of the TextTestRunner class, the plugins are finalized with the following call: self.config.plugins.finalize(result), forelast line of the method. The call redirects to the various plugin finalization methods. The plugin we are using is handled by the standard plugin manager, PluginManager. In short, the finalize method redirects the call to a simple method, which processes the finalization of the plugins until one of those finalizations returns a value.
Our homegrown syslog plugin passed the received result on to the nosexunit manager, which of course was not null. Nonenthless to say, the nosexunit plugin was always executed after the syslogplugin. Forcing the syslog plugin finalization method to return null resolved the problem. Turns out that because of the plugin finalization manager executing first the finalization of the syslog plugin the nosexunit finalization was not executed, thus the output was not produced. I wonder if this is a bug?

Now, I don't know the following things:

How nosetests handles plugin loading / unloading with priority if it handles finalization with priority at all
If it was a good idea to forcingly set the result to "None" in our syslog utility finalization's method.

Considering the fact that the method in this case is executed as a final one, probably point 2) is not very relevant. But how does nosetests actually handle plugin priorities? Seems like it assigns to each of them a score, which is then used to decide which one gets loaded first and so on.
Look what I found crawling through google :

- Added score property to plugins to allow plugins to execute in a
defined order (higher score execute first).

This comes from the changelog of version 0.10.0a1. So it seems the problem is not totally new (damnit!). So it seems our search is not over.

Monday, October 27, 2008

nosetests - neverending story

nosetests, nosetests, nosetests!

This is getting hilarious..BUT I want to understand why it's not working. Damnit!
On that, it showed up to WORK actually every now and then (for an example if ANT calls it). Why would this be? I don't know. I need rest, so I am pushing this back in the closet for now.

Friday, October 24, 2008

nosetests part 2 - the final solution

Ok, some more freetime for me today! Yay!
Now I have actually time to investigate some more about the nosetests - python vs CruiseControl log output feature.
Let's see, where did we leave? Hm, by trying to get some proper xml formatted output for the Cruisecontrol, we discovered that if Syslog plugin was used as well, the output would not had been formed (for some strange reasons). The problem arises from the fact that we use syslog to "log" internal test activity (for an example, when a test starts and ends, to log exceptions etc).

Main reasons for that include :

xml logging targeted messages redirected to syslog
syslog interfers and breaks xml logging

Let's check the first one.

Xml formatted output should be produced if the switch "--with-nosexunit" is present on the command line.
So let's perform the two test cases in here:

nosetests --with-nosexunit --xml-report-folder=c:\temp test_mytest.py,

which produces a TEST-test_mytest.xml file in c:\temp and

nosetests --sysloghost=localhost --with-nosexunit --xml-report-folder=c:\temp test_mytest.py

which doesnt.

Now we have two effects in here, which push the problem into a somekind of "limbo" (no, not mambo): on one hand, we know for sure that the xml formatted output is not produced by nosexunit - The nosexunit library simply captures the stdio / stderr from the script it's running and outputs it accordingly to the commandline arguments it has obtained, so this means (probably) that no input is fed to the nosexunit (which is why, in turn, the output is not produced); we also know that the expected xml-formatted output created by nosexunit is not present in our syslog target. This excludes the (least likely) case in which the output is produced by nosexunit and gets (for some strange and impossible reason) to the syslog host.

This means that it gets lost somewhere in between. In between where? Well, nosexunit is a plugin as well. Could it be that the output to be formatted gets lost between the two plugins? To be able to answer this question, a few more tests need to be executed.

The syslog plugin which we use internally is home-made (obviously) and worked fine up to a few weeks ago. Well, it worked fine means "it never showed any problems". What does the plugin do - it adds a new listener to the logging instance, by doing the following :

self.logger = logging.getLogger("nosetest")
self.logger.setLevel(logging.DEBUG)

sysLogHandler = logging.handlers.SysLogHandler((options.sysloghost, 514))
formatter = logging.Formatter("%(name)-30s: %(message)s")
sysLogHandler.setFormatter(formatter)
self.logger.addHandler(sysLogHandler)

Obviously this piece of code interferes somehow with the nosexunit library.

Tomorrow? Maybe. Nah. Who cares! Of how Apache and througptut Speed changed our lives

Well, yes I admit it, definitively I've been not posting for four days now!

So today I had to do with Apache and PHP. Nice. Installed and configured. Easy. No big problem. Just the fact that the eAccelerator had a version misallignement.

But the real problem is : How do you test for throughput performance?
Well, you take a VERY big file, and send it with the application that is under test and register how long it takes. It's that easy. Ok, this is really not hard at all. BUT. What if you need to test the maximum throughput? How do you know you got it? You don't have a clue about how the application handles the streaming : buffered - unbuffered / paged - unpaged / UDP / TCP; and also, you don't know at which rate the network influences the transfer.

So all you can do is going by hypoteses, in the end. UNLESS you can have a look at the source code. Bleh :P.

Friday, October 17, 2008

Homework: Outlook, Indexes, PHP and Image rescaling!

I decided to add to my blog, erm, closet's title the following text :
"Of Man, Women life and Computer Science".

Ok, so I dropped by at my parent's place to say "hi" - haven't seen them in a while now, living far away doesn't make things much easier but that's really not the point here - This is a NERD blog, let's keep it on that level.
So I said I dropped by at my parent's, and discovered they had a problem with Outlook express 6 in the office (it's a very small office, just the two of them working in there with one pc). Apparently, the mails once being sent refused to leave the "Outgoing mail" folder, for preferring the "Sent mail" one. After a short test, I realized that mails actually were sent out, but not moved to the "Sent mail" folder.
I enabled POP3 / SMTP logging (hey btw, didn't really know there was such a nifty functionality in there!) by Flagging the "Mail" option in the "Maintenance" page on the "Options" menu, but everything was working of course, because the mailing wasn't affected! So I started thinking about what could be wrong with Outlook. I googled for the same problem and it turned up that there is actually a bug in the .dbx file management that corrupts your email databases (the .dbx files), and it stated that those data bases could easily show off some problems if for an example Outlook (or the whole Windows) would crash unexpectedly. It stated that compressing the folders would clean up deleted emails (apparently Outlook doesn't do that straight away) , as well as fixing those problems that aroused. Also, it stated that Microsoft supports files up to 2 GB. Well, my parents had a 2 GB and something archive file for the "Sent mail" folder. Was that the problem? Of course it was!
Double-checking it by dragging the mail that still was in the "Outgoing mail" folder and dropping it in the "Sent mail" was quick and confirmed the theory.

Now, after this was fixed, I spent some time with my father and he suggested I could fix up the company's homepage (literally "it sucks"). I promised to do so, and already camed up with some ideas. Let me introduce you more into it :
They are running a merchandising brooking agency, oriented to ski-schools and sports in general (mostly winter-related like skiing for an example).
They would like to have a homepage where they could show examples of what they managed to produce. Of course, they would like to be able to add more as time passes by, and remove some if there would be too much (the current hosting company are true thievs, they pay 80 € a year for 200 MB!! But that is another story).
The way to present the images would be contextualized, meaning that you would have a list of
contexts to choose from and depending on which list you would had chosen you would get a set of images of products. Of course, one product could be posted in more than a group (being that different categories of customers may be intrested in the same kind of products or in products with special attributes), so it should be possible for them to "link" images to a specific group.

In short, the rational is that you could set up one group for each category of customers, meaning a set of pictures of relevant examples with specific attributes which customers from the category are mostly sensible towards.

Of course, I would like to do all of this in AJAX, and do it *now*. I don't want to learn how to install / use (surely helpfull) frameworks like Drupal / Joomla / blablabla, this is a too small project for making it too complicated. I can make my own library. Yes I know, I am masochist.

So first thing I camed up with a small prior-art of functionality to get an idea on how needs to be done. You know, use-case oriented design and such. The basic functionality groups are the following

category management functionalities
picture management functionalities
grouping management functionalities

The first one groups the adding / editing / deleting categories, the second relates to uploading / downscaling / removing pictures, while the third one would group simple category / picture pairing logic functions like adding pictures to a group, or removing.

What do you think?>

The next step is trying to get the logic that lies behind the functionalities described above "factorized", that is reduced to the minimal logic that is shared by all functionalites and that covers all the functionalites required by our use-cases. In geometry the same concept is referred to as a "base", meaning the minimal set of vectors which can produce all other vectors (all the vectors that are linearly dependent, that is), or a set of vectors which are linearly independent towards eachother. So what we are looking for now, following the "geometry" point of view, are the "basic logical units" which all the others (required by the functionalities) are made from.
Of course, this is not as easy as it sounds. And surely can't be achieved in one single step. Let's try. A first (very) high - level grouping is the following :

a) Working with lists - meaning adding items, removing them (in all three of the cases)
b) Editing contents - this could actually fit in point "a" as well, but considering that what we edit
here are set of attributes where each one mainly inherit from a specific nature (categories, pictures, groups in our case) it may be some of them requires some additional / specialized (or original) unshared functionality. Well to be honest there are two ways of handling this. The first way (the way we are going to do it) shows that editing a list's item is not a behaviour of the list itself but of the item of the list. So for an example you could have a list with alot of items each of them would have a different set of attributes to be edited (for example a list with an item apple,an item pie, an item car, and so on). If on one side this list is generic and very flexible, on the other hand it may turn out to be uncontrollable, and thus unmanage-able (did I write that correctly?).
The other way is actually to make the list item-specific by incorporating the item's editing into the list itself, and in this case you would have a list which could handle only one single type of items (the ones whose content is editable by the list itself). Now this kind of list is not flexible at all, but at least you know what you are working with.

c) Associating list items to groups - or, creating indexes, or, even better, creating a new list! out of other lists.

In my opinion lists are indexes. So we will call lists indexes from now on and forever.

Following this point of view, it turns out that the functionality we are requiring is based actually on three conceptual indexes :

a Picture index
a Category index
a Group index

Let's assume we say a picture has to belong to a certain category - I don't see the point in having a picture in the system otherwise anyway. This makes things even more easier. Why? Becasue in this way, indexes 2 and 3 melt together, and we get rid of one of them. Yeah! The result is now:

a Picture index
a Group / Category index

So what we have now, are two indexes. How are they related to each other - how do they work together? Well, we know for sure that we need to manage pictures. And we just said pictures are required to be belonging to at least one group / category. And one group / category can have more than one picture. So there we end up in having a two - level index, whereas on top is the Group / Category index and on the bottom we have the single Group / Category pictures relative indexes. So far, so good. The rest, tomorrow :).

Tuesday, October 14, 2008

Of Man, Snakes and Cosmologic Balls

Or, in the correct form, "how to get nosetests to do what you want".

Yes well, yesterday that was like rushed. Programs changed on the fly and I had to switch to system testing because of the release that is planned for today.
BUT! Now I am back on nosetests, did you miss me? I really hope you did not!

So, where did we left off? Oh yes, we found out that nosetests and nosexunit are actually correctly installed and recognized.

Just to recap, the problem is relative to nosetest's output in xml format (which is supplied via the nosexunit plugin).
For instance, a command line which produces that kind of behaviour looks like that :

nosetests --sysloghost=localhost --with-nosexunit --xml-report-folder=report testunit

where

localhost is the output folder for the nosexunit reports
testunit is the unit which nosetests will search for test* routines
report is the ip address (or name) of the syslog daemon host

So now the questions are as follows:

how many files does nosexunit produce
in what xml structure

At the moment I can't answer any of those, because nosexunit doesn't actually get me any output. And this seems due to the fact that nosetest "traces" the output only for test suites, not for single test cases. Clearly this is not the case, we have other test cases which work flawlessy and produce an excellent XML - formatted output thanks to nosexunit.
So what's wrong? Hard to tell. Luckily NoseXUnit comes with full source code, and I have some spare time left. So let's have a look.

Monday, October 13, 2008

Of Python, Nosetests and Saturn's cyclon, and discovery-based testing

Wow.
Space bloats me away. Saturn and neptune especially. Mysterious, gigantic, enormous planets. The more we start knowing about them, the more we will love to know. And want. I for myself would love to set a foot on Mars. I mean, can you immagine the feeling?
Check this and you'll understand what im talking about (courtesy of NASA)!

Now, what does this have to do with Python and Nosetests?

Honestly, I doubt that the Fail-safe Shuttle's OS has any parts implemented in Python, but possibly it has been tested with it. The same I am doing today, and this is the live feed on how it's going (so far).

The problem is I can't seem to be able to get any test results output from nosetest in xml format. There may be two reasons for it :

I don't have nosexunit installed (plugin which is needed for it)
There is another reason why it's not working

Something tells me it will be the first one?
So the first step is to learn how to check for the nosexunit installation. Any clue?
How about starting at the nosexunit's homepage first. An easy easy_install nosexunit executed from a shell gives us a good clue about the status of the nosexunit package on the current system - turns out the library is there, damn it.

So now we have to expand option two. Update follows later.

Abandoned CruiseControl

Cruisecontrol is used alot in here (at the office where I work, that is).
The version of CruiseControl that I installed is the latest one : Binary distro 2.7.3.

So it came just natural that I had to deep-dive into it, mainly for understanding strange behaviour or missing behaviour, for an example on the logging side of the application and on configuring the dashboard (which was not easy at all).

The documentation that you can find at CruiseControl's main project page is not complete. Really, it is not. It covers mainly all of the configuration file options and grammar, and that is it.
Absolutely *no* clue about how to configure and setup a dashboard, or hints about how to customize logging, how to integrate and correctly visualize logging into legacy build chains etc etc.
Wow. It's the first time ever that I see a free software project this big with almost no documentation.

Let me tell you what happened when I tried to get the dashboard to work. It should be as simple as adding a few switches on the CC command line (in the cruisecontrol.bat file), but it's not, because quite a few basic jar modules are missing. So what did I do. I enabled CC debugging with the -debug switch and parsed to 5 Megs logs at a time for almost three days. Not the easiest of tasks but my time is paid anyway, so who cares. It was worth it, I managed to get all the missing jars. But I lost two day's time to look after pieces of SW that should had been there.

Well, you would say that that is it. Wait, there is alot more to come. Another slightly unimportant feature as logging has been kept out of the scope of the main documentation.
For an example, I assume you all know that Cruisecontrol uses Ant internally. Did you also know that Cruisecontrol logs all the messages posted by Ant on the stderr as "WARNING", the ones posted on the stdout as "INFO" and that errors that occour within the ant build (that is, exceptions) are traced as "ERRORS"? Let's see how many of you raise the hands.
It took some time to discover that, luckily on Nabble there are plenty of extremely usefull forums.

Thanks to those forums I managed to get the dashboard up and running, as well as understanding finally the way Cruisecontrol handles log files.

For an example : suppose you have a legacy tool chain to build your deployables which bases it's functionality on python scripts. Python scripts can be executed by ant through an exec command, which will spawn (create in nerdish, that is) a new process for the python interpreter.
Now, considering that probably CC uses ant to parse it's own configuration as well there are two ways to spawn a process :

Through an Ant Script
Directly from CruiseControl's configuration

The difference between the two calls is mainly in the way the call is handled : the spawned process ("task" in CruiseControl jargon) will be of different nature to Cruisecontrol depending on whom the call has been delegated to, thus the logging for it will be handled differently. It may even happen that it gets totally by the reporting application just because the task that had to be logged is unknown to cruisecontrol (which is *always* the case by the way).

Friday, October 10, 2008

Closets & Dropped Surface Code

I just realized..This is actually what it's named after: a closet. In a closet you throw things that you don't need anymore. Or that you don't need that much right now, but you want to keep it somewhere, because you feel one day you may see a use for it.
Just imagine :
"Where did you put my hammer honey?"
"In the closet, baby".

Exactly, that's the way to go.

So like all the closets I'm gonna throw in all those small things that I get in my hands and just need a place to lay down. But then, this is not a closet but a BLOG. You know what blog stands for - do you. Binary LOG? I think that is the correct answer. Google it if you don't feel like it.

There are two things that mind me right now :

How do I keep my closet tighted up - as in, how do I keep it clean and organized (for those things that may evolve in future from the crap I throw in here)
How do I organize my posting / closet trashing?

The first one is pertinent if I throw in concepts that show evolution and might in fact turn out to be intresting.
The second sounds alot similar to the first one but refers actually to my own action of "doing" (as in "logging") the information. With what frequence? Any standard format to make it more surfable? Any idea on fast-composing it so I don't waste too much time on that thing in that particular day?

Like, for an example, the concept of "bidimensional code" that was busted through my mind and robbed me of my sleep some evenings ago. Topic which I suppose will label my BLOG as a n3rd blob, but honestly, who gives a shit about it? If you are reading here, that means you are a nerd already. Accept the reality and try to make something good out of it.

The meanings of "bidimensional code" is pertinent if you think of some code being defined by a sequence of operations (which it always has been), each of it can be associated to a specific graphical form (or image - who's the l00ser that said clipart?).

Well, what's so special about that?
The answer is : I don't know. But I will try to find out. Probably. If it ever happens, that is.
The only usefullness I could really come up with about bidimensional code (as in code's operations associated biunivocally to images) could be the possibility of a PC to recognize "written" code, along with the logical connection that inheritely is buried within the structure that results from the composition of the single operation-images.

This is sick, I know. But what do you expect? It's a closet.

Hello World!

Well, first post ever in my closet..Pretty empty to be honest, but anyway! I hope to make it full soon with alot of usefull (or crappy perhaps) information. First of all, I thought about getting a quick introduction on what will be posted here. But then I realized this would take much more time than I am willing to give to this place. So I camed up with what could be seen as a compromise:

givign the maximum of information with the minimum timing effort.

There. Cool ha?
Now, without starting any long explanations on what this exactly means (and thus avoiding wasting prrresssious time), let me clarify, in here I'm going to post whenever I feel like:

something good deserves to be posted here
something intresting deserves to be posted
some big news related to my (small) world requires to be spammed
I am victim of one of those occasional "pre-sleep" brainstorms, which boost your (my in this case) immagination over the limits, provided I remember what I was brainstormed with the day after when I wake up.

So. A nice short list of crap to start with eh?

But seriously, what I have been thinking to post in here in the next days is a (somehow) detailed
howto for handling Cruisecontrol, as the ones I found are extremely incomplete (as so is the official website). Where I work we base everything on CruiseControl. But that's another story now.

And already, time flies! >_<

Later!

God I hope I remember my blog's address tomorrow... ^^;

Dr. katz's closet - Of Man, Women, Life and Computer Science