Home
Ben FrantzDale [entries|archive|friends|userinfo]
Ben FrantzDale

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Matlab is dead. Long live Python [Mar. 30th, 2009|09:04 pm]
[Tags|, , , , ]

Over the past year and a half I've incorporated Matlab into my daily work. It's a tremendously useful tool for experimenting with image processing and linear algebra. It's got a nice array-slicing syntax, so the second column of a matrix, M is M(:,2), and every other row in reverse order is M(end:-2:1,:). It also has some very useful easy graphing commands, so if you have an array of values sampled at a corresponding array of x and y positions, you can say surf(x,y,A) and get a nice 3D surface plot. It also has other nice language features like the ability to return a list of results from a function and assign those results separately as in [V,D]=eig(A).

However, it's got some serious serious shortcomings. First, it's closed source and expensive, so if you prototype with it, you have to then either port it to another language or kludge together scripts to call Matlab—not cool. Second, unlike any language other than assembly, values returned from functions are magic in that you can't index into them. So while the function eig(A) returns a vector of eigenvalues of A, if you want just the first one, you can do v = eig(A); v = v(1) but you can't do v=eig(A)(1). This is particularly tiresome with a language geared for matrices in which you might want to nest a series of functions indexing each as you go. That is, you couldn't do foo(bar(baz(A)(1:2:end))(:,:,0)), you'd have to assign each result to a temporary. Third, Matlab indexes from 1. Yes that's what everyone expects when they start programming, yes mathematicians index from one, but for twiddling with arrays, zero is just easier.

Finally, Matlab doesn't lend itself to creating functions or data structures. In general each function is in exactly one file, leading to terrible code reuse. You can add more functions to the end of a file but those can only (?) be used as subroutines within that file, leading to a terrible lack of code reuse. Worse, what little support Matlab has for classes is done through a special directory for the class and a separate file for each method(!)

Fundamentally, there is a need for an interpreted language capable of being run interactively but that is also a real standard programming language, not some kludgy interpreted dialect of Fortran. That something needs to have mathematical and graphing batteries included.


Fortunately, there is a collection of Python packages that have a very solid chunk of this covered: matplotlib, scipy, and ipython. Not only do they have all of the 2D graphing that Matlab has with a similar syntax, but they use the (awesome) Anti-Grain Geometry library for graphing, so you get plots way better than the aliased 1990s-style plots that Matlab dumps out.

Python has language support for array slicing and for returning tuples, but it is also a Real Programming Language in that you can write a linked list or a graph without having to hack it on top of arrays. It isn't designed with array programming in mind the way Matlab was, so the syntax for arrays is slightly more cluttered, but not by much. The big differences being that Matlab has special element-wise operators and has concise array construction notation, so a.*b in Matlab returns the array formed by multiplying a(i) by b(i).

So give it a try. Install the appropriate packages and start doing your scientific computing in an actual programming language.


ben@endash:~$ ipython -pylab
...
In [1]: x=r_[0:100:0.1] # Generate [0, 0.1, 0.2, ..., 99.9]
In [2]: plot(x, sin(x)/x) # Nice pretty plot.
Out[2]: [<matplotlib.lines.line2d object="object"
at="at" 0xa0ee8cc="0xa0ee8cc">]
In [3]:


Can Matlab rasterize images to look this nice?:
pylab_examples_polar_scatter


PS
If you want a blast from the past, Python even has a turtle library, giving it the same drawing sematics as Logo.
from turtle import Turtle
t = Turtle()
for in in (0,1,2,3):
    t.forward(100)
    t.right(90)

Not only is Python making programming as fun as it was in my hard-core Logo days, it's doing it in a language that can simultaneously be used for Logo-like simplicity and Matlab and driving high-performance computing. Go Python!
link9 comments|post comment

Programming for Modern Hardware (required reading) [Sep. 25th, 2007|08:31 am]
[Tags|, , ]

I am a big fan of Herb Sutter's work. He has great books on C++ programming techniques, for example. He recently gave a talk about programing for modern machine architectures. The very short version is that memory latency is the root of (almost) all performance issues. [blog entry][Google video][corresponding slides]

For a teaser, did you know that 99% of your CPU's die area is not for computing, but for hiding latency. Also, between a 1980 VAX and today's computers, memory latency has gotten about 150× worse as measured in instruction count.

Also, how out-of-order execution can be correct for single-threaded code, but can break otherwise race-condition-safe code(!)


Programming is going to get interesting in the next decade as the software tendency towards abstraction fights with the hardware's need for massive concurrency. (Personally, the only sane solution I see is for application programming to be done at a sufficiently high level of abstraction to encompass all of the concurrency. For example, if I want to add two one-gigabyte vectors, I should be able to say x+y (a high level of abstraction) and the addition operator should parallelize the computation (sane concurrency).)
link5 comments|post comment

Reading hideously-long argument lists [Feb. 15th, 2007|12:19 pm]
[Tags|, , , , ]

Dear Lazyweb,
In a C++ function with C linkage, are the arguments stored in a predictable way such that, e.g., I could cast the location of the first argument to a pointer to a struct containing the argument list?

Background


There is a commercial finite element package, ABAQUS, which has the nice feature of letting you programmatically define arbitrary materials. You just provide it with an object file containing the function umat_.

Unfortunately, since this interface was designed by Fortran programmers a while ago, umat_ takes 38 arguments, many of which are two-dimensional arrays. Since it is the year 2007 and some languages actually provide meaningful abstraction, I quickly made a umat_arguments class so my umat_ function just calls a C++ function, int umat(umat_arguments& in). Of course, I have to construct the umat_arguments structure which means copy-paste coding – exactly what I'm trying to avoid.

While the huge argument list is dirty, it is nice that ABAQUS has presumably put a lot more thought than I want to into exactly what a finite-element package would need to ask of a material point to be completely general.
link8 comments|post comment

Libraries & Namespaces [Aug. 22nd, 2006|06:40 pm]
[Tags|, , , ]

Dear Lazyweb,
I'm fairly new to creating libraries. Today I ran into a bug in which two libraries I linked against both had a class named Timer. When I dynamically linked, I got insidious memory-stomping behavior; when I statically linked, I got a link error. I have the source for one so I put Timer in a namespace and it worked fine. (Previously that library didn't have any namespaces in it.)

The question: Is there a way for a library to expose only some of its symbols (vis. the symbols at the interface) to avoid a namespace collision at link time? (When I link against a closed-source library, I can't see all of their symbols, can I?)

Update: Thanks for the answers. This brings me to another question:

There is something I don't understand about the process of going from source to library. Perhaps you can fill in my hand waving:
  • When I build a source file to get an object file, I think of the .o as being a direct translation of the .cpp file into machine language with the addition of a list of symbols. That is, I see an object file as essentially the compiled source (with function calls made by name) plus a list at the beginning (a symbol table) saying, e.g., “double foo(const int*) starts at offset 0x0000BEEF in this file and also, e.g., ”void bar() is required by this object but is not defined here.“ Is this an accurate description of an object file?
  • When building an executable, I see the linker as concatenating the objects together, filling out the symbol table with the offsets of all of the symbols. The linker must also be adding the appropriate header to make it an actual binary so that the program starts at main().
  • When building a static library, ar appears to just collect a bunch of object files into a single file.

Clearly somewhere in this process some symbols are exported and some are not. Where? In C, this has something to do with the extern keyword. I get the sense that in C some of what extern does happens automatically, at least with respect to classes.

Is there a good reference for this stuff?
link5 comments|post comment

Bit rot? [Jul. 3rd, 2006|11:24 am]
[Tags|, ]

Dear Lazyweb,
A colleague had a problem: A program was running in parallel for two days at which point it died. The problem was that an input file, which gets rewritten frequently, was corrupted. In particular, it was a line that gets printed in C++ like this:
theFile << someInt << "  " << anotherInt << "  " << yetAnotherInt << endl;
and so usually prints out like
123  1  456
did something different. In the middle of the file that failed, one line said
123 !1  456
with an exclamation point instead of the second space. The code looks good. The file was never touched by human hands between creation and reading (i.e., it wasn't a stray vim command) and it appears to have only been opened by one program at a time.

After scratching my head, my desperate guess is that it's a fluke. This is compounded by the fact that “!” is ASCII code 0b100001 whereas a space is 0b100000 – a one-bit error.

Wikipedia's article on bit rot says that “Bit rot is also used to describe the discredited idea that a computer's memory may occasionally be altered by cosmic rays.”

Any idea what would cause this?

Update


Thanks for the replies. The sysadmin ran memtest and says
I got almost 8000 memory errors in tests over the past 40 hours, so [the computer] may be down for a while. If the problem is the memory and not the motherboard then I will replace the memory with error correcting memory to prevent this sort of thing happening in the future.

So it looks like I was right that it was a bit flip but wrong in that it is just a good-old-fashoned hardware failure.
link12 comments|post comment

(no subject) [Jun. 1st, 2006|12:27 am]
[Tags|, , , ]

As seen in Applying UML and Patterns and then found on Wikiquote:
The most likely way for the world to be destroyed, most experts agree, is by accident. That's where we come in. We're computer professionals. We cause accidents. —Nathaniel Borenstein


Borenstein is a co-author of MIME, the email standard, along with Ned Freed; Freed is Mudd class of 1982.
link1 comment|post comment

MPI Applications [May. 25th, 2006|10:48 am]
[Tags|, , ]

Dear Lazyweb,
I'm learning about MPI. One thing that the tutorials don't explain and that is difficult to track down on Google is this: When you have an MPI program, you run it like so: mpirun -np 42 a.out; if you just call ./a.out, it fails to run. Why is it done this way rather than just having the program make calls to the MPI library? Furthermore, if I wanted to have an interactive application that could send computation to a cluster, would I have to have the entire program be run with mpirun or is there some other way to make MPI calls?
link6 comments|post comment

navigation
[ viewing | most recent entries ]

Advertisement