Project 3 : Webcam Paint

by mghods @ 9:22 pm 14 May 2010

Introduction

Webcam Paint is a program developed using openFrameworks in Code::Blocks for windows. It enables user to paint using a webcam through a simple color detecting algorithm.

How it works?

For painting on the pad user should add colors to color palette.  User can do this by pressing mouse left button-while color of program background changes to selected pixel color- and after finding desirable color, user should release the button. The color will be added to the color palette.  For example, user may use his/her hand as a painting brush by clicking on his/her hand and add colors of hand to the palette. Then, he/she can paint by moving his/her hand in front of webcam. User can tweak painting mode using provided sliders.

Interface

Webcam Paint interface is consists of four screen, a color palette, two control panels, and three buttons.

In upper-left screen, user can see webcam input to the program. Upper-right screen is the pad where user paints on. This pad shows current brush positions and painted stuff. User can view and optimize brush detection using lower-left screen which shows the black and white image that OpenCv uses for detecting brush blobs. Finally, lower right screen display any movement of brush blobs.

The color palette displays colors which have been added by the user. User can add unlimited colors to the palette, but, Webcam Paint just uses last twenty two colors on the palette. User can review these colors in the small boxes between screens and control panels.

Additionally, there are two control panels, which give user ability to tweak brush blob and blob movement detection. First one contains sliders to determine hue, saturation, and value threshold for detecting brush blob. It also has a slider that controls RGB threshold for detecting movements in webcam input. Adding thresholds makes detections more sensitive, but high thresholds make it hard to paint accurately. Next control panel has sliders which determine minimum size of blobs to be detected as brush blobs and blob movements. The other slider determine how many blobs detected for each color in the palette.

Finally, there are three buttons in the lower-left portion of interface. One of them is for clearing the pad. The other on is for saving the painting – you can find saved painting in the data folder of the program. The last button switches between drawing by circular brush or free-form brush. If you like to see shape effect of your painting tool on the pad, turn of circular brush button.

What are the algorithm?

For detecting brush blobs, the program changes webcam input to a HSV color image and extract grayscale hue, saturation, and value images out of it. Then,  it detects color blobs for added colors, based on specified thresholds. For detecting brush movement blobs, program perform same process on a image that contains pixels of difference between current and previous webcam input, which has been created using background subtraction algorithm.

Find the program here.

Find the source here. (extract it into of_preRelease_v0061_win_cb_FAT\apps\ folder )

Project 3: Stuck Pixel

by areuter @ 11:21 pm 10 May 2010

In this project I considered what it might be like to create an experience that contracts one’s perception instead of augmenting it. Stuck Pixel is an application that runs in the background while you carry out your daily activities on the computer. However, whenever you click a pixel (anywhere) on the screen, it becomes “stuck” at the color value it held when you clicked on it. Furthermore, the pixel can no longer be clicked on. After the stuck pixels become sufficiently bothersome, the user can save out the pixels to a BMP file before exiting through the application window.

My intention was to at create a visualization of a user’s computer habits, while at the same time prohibiting their most repetitive actions and eventually encouraging the user to seek an alternative. Which, in a way, is actually a means of augmenting their experience. In a most extreme case, the resulting pixels could provide a visual depiction of addiction…Facebook or other social media, perhaps? Here’s a quick simulated result of that scenario:

Here are some images from my own experience using the application:

Although it wasn’t my intention, I thought that the resulting image was interesting because it reminds me of a constellation:

The application was written in C# using global mouse hooks and other (kind of hackish) tricks.

References:

Processing Global Mouse and Keyboard Hooks in C#
By George Mamaladze
http://www.codeproject.com/KB/cs/globalhook.aspx?msg=2808928

cell phone intervention

by Mishugana @ 5:53 pm 29 March 2010


some explanation/stream of consciousness. needs work but i really wanted to post something before tonight and I’m running out of time that i can do that. I will fix up the rest of this on Wednesday night and make it clearer and much better written. (for now though, i feel that something is better than nothing) so here goes:

some very quick writings i did during class

This is a three part experiement/digital intervention that aims to examine the relationship
that our cell phones have with communication. Cell phones are designed to enhance communication and give the user power
often it distracts the user from real world communication and this interuption isnt helpful because even when considering
the added virtual communication, the whole doesnt equal the sum of its parts.
Step one is to make people aware of the situation without effecting the situation.
If people would only observe their own environment there is a chance they would be enlightened, by showcasing the existing
world that people live people will be more aware of the interuption that happens on a day to day basis.
BY connecting GSM activated LED keychains to  light siren and having it sit in a class room. people will be more aware of the
frequency of virtual interuptions. Even vibrating/silent calls/texts will set off the siren so this is
like the oppisite of the mosquito ring tone.
Step two would be to create fake ring tone devices that only make loud ring tone sounds and have it go off in interesting spaces/times
BY taking the previous step further and actually changing a situation I can cause people to question the very nature of these
interuptions and why they are necesary. By placeing the ringers in places like… the ceiling, rafters, vents, sewers, and have
them go off when people wait for the bus or go shopping or are in an HCI interview, people will be forced to confront
issues.
The third and last step of cell phone rehabilitation is putting the power of cell phone communication back into the hands of the people
by connecting a car radio fm transmitter to a cell phone headset and setting my phone to autoanswer. i can broadcast anyone who
calls my number (a number that will be advertised) and then anyone who tunes into the frequency, or any radios set to the frequency
that are connected to speakers set up in places i choose will hear anything that people want to say un censored, unleashing the real power
this device to the common people.

FaceFlip

by Max Hawkins @ 9:10 pm 15 March 2010

FaceFlip from Max Hawkins on Vimeo.

Since Chatroulette is all the rage these days, I decided to freak people out on the website by flipping their faces upside down.

The project was implemented as a plugin for a mac webcam augmentation software named CamTwist using OpenCV and Apple’s Core Image.

The source is available on github:
http://github.com/maxhawkins/FaceFlip

AI Brushes

by xiaoyuan @ 2:05 am 10 March 2010

Get Adobe Flash player

Intelligent digital paintbrushes using the Boids AI algorithm. 10 customizable “broishes” exhibiting flocking and goal-seeking behavior.

Project 3: Ghost trails

by rcameron @ 3:39 am 6 March 2010

OSX executable (~5MB)

I began playing with modifying the camera feed using openFrameworks and went through a bunch of effects.  I eventually decided to create a ghost trail similar to something envisioned in Mark Weiser’s ‘The Computer for the 21st Century’. The frame is captured when movement is detected through background subtraction. The new frame is then alpha blended with the original frame. The ghost images fade out over time through the alpha blending.

I also played around with color averaging in an attempt to generate a rotoscoped effect, which started out like this.

I also implemented Sobel edge detection on top of it and ended up with a Shepard Fairey-ish Obama poster effect.

Project 3: Fireflies

by ryun @ 11:26 am 5 March 2010

IDEA
In this project I focused on the Interaction between the screen(display) and the everyday life object. In the Ubiquitous computing, one important aspect of it is all the system becomes invisible(go behind the wall) and people do not even notice their gesture and action is tracking by the computer. I wanted to make a simple interaction following this idea. So the viewer use the everyday life objects and the system subtly recognize the human behavior and do something(i.e. shows display accordingly). I had many objects in my mind as candidates for this project such as a laser pointer,a flashlight, a soap bubble stick, a mini fan, a paint brush and so on… I decided to use a flashlight because this is the simplest one to make this happen. (due to the time limitation)

PROCESS
I spent a while to figure out how to make this happen. I was thinking about using openframeworks but never had experience of C++ so decided to use processing again. Luckily, one Japanese developer built the library for communication between the Wiimote and the Infrared light via bluetooth. It was not easy to understand and use it at first but after I spent a while it turned out that it is pretty good library for this project.

APPLICATION
I believe this has a huge potential. For this project I wanted to show one of the possibilities of it and it was the art installation for children. For example, there could be a projected screen to the ceiling of the childrens museum and children can use the flashlight, make their own fish or bird (as an avatar) and play with it. Here in this project I built circles as fireflies chasing the light as an example.

CONCLUSION
In class, I received good, constructive criticism about the display which is not the very perfect use of the the technology. I should have spent more time working on it. Maybe color sucking drawing tool or multi-user interaction should have made this more interesting. I would like to use this technology that I learned and expand more for my final project.

Source Code (Processing)

fantastic elastic type

by davidyen @ 10:03 pm 3 March 2010

My file is too large to upload so I made this documentation video:

Notes:
I used Processing, Box2D (Dan Shiffman’s pBox2d / Eric Jordan’s jBox2D), and the Ricard Marxer’s Geomerative library. The letters are pressurized soft bodies.

I want to do something more with the project someday, like a music video or something. I’ll update this post if I get there.

David

Project 3: You Control Mario

by Nara @ 11:56 am

The Idea

For my project, I was inspired by this augmented reality version of the retro game Paratrooper. My first idea was to create an “augmented reality” 2-player Pong, but I decided not to pursue that because I was worried it had been done many times before and that the Pong implementation would not be challenging enough. Then I started thinking about what else could be done with games, and during my searching I found that there were some remakes of retro games that used camera input as the controls, but often the gestures they were using were not analogous to the gestures of the character in the game. I used that as my jumping-off point and decided I wanted to do something where the player could actually “become” the character, so that when they move, the character moves, and when they jump, the character jumps, etcetera. To make sure that there were analogous movements for all of the game’s controls, the game I decided to implement was Mario.

The Implementation

I knew almost straight away that this project was best implemented in C++ and openFrameworks, both because any OpenCV implementation would likely be much faster, and because there is a much larger library of open source games available for C++. (Golan gave me permission to hack someone else’s game code for this since there realistically was no time to implement Mario from scratch.) I even found a Visual Studio project for a Mario game I wanted to try, but I basically spent all of last Saturday trying to get Visual Studio and openFrameworks to work, to no avail. So, I ended up using Java and Processing for this project, which is one of the reasons why it isn’t as successful as it could’ve been (which I’ll discuss later). The source code for the Mario implementation I used is from here.

The program basically has 3 parts to it: the original Mario source code (which, other than making a couple of variables public, was untouched), a Processing PApplet that sets up an OpenCV camera input and renders it to the screen if called, and then a package of classes for an event listener that I created myself to do some motion detection and then send the right signals to the game to control the character’s movements. In essence, when it detects movement in a certain direction, it’ll tell the game that the corresponding arrow key was pressed so that the character will respond.

The Problems

First of all, the OpenCV library for Processing is pretty bad. It’s not a full implementation (it doesn’t do any real motion detection), the documentation is pretty vague and not helpful, and I even read somewhere that it has a memory leak. Just running OpenCV in a Processing applet has a slight lag. Also, I wanted to use full body tracking for the motion detection (my ultimate goal if I got it to work was to use this implementation with a port of Mario War, a multiplayer version of Mario, although I never got that far) but the body tracker was extremely buggy and would lose the signal very often, so I ended up just using the face detector, which was the least buggy.

Using a combination of the Mario game (which is implemented in a JFrame) and a PApplet together in the same window also doesn’t really work well. I read somewhere that even without OpenCV, the fastest framerate you can get when using both a JFrame and a PApplet together is about 30fps.

Because of the combination of all of these factors, even though the game technically works (it can pick up the movements and Mario will respond accordingly), there is a big lag between when the user moves, the camera detects it, the motion event listener is called to action, and Mario moves — usually at least 1-2 seconds if not longer. The consequence is that the user is forced to try to anticipate what Mario will need to do 2 seconds from now, which on a static level is not too bad, but on a level with a lot of enemies, it’s almost impossible. I still haven’t been able to make it more than 2/3 of the way through a level.

The Merits

Even though my implementation wasn’t working as well as I would’ve liked, I’m still really proud of the fact that I did get it working — I’m pretty sure the problem isn’t so much with the code as it is with the tools (Java and Processing and the OpenCV for Processing library). I know that there’s room for improvement, but I still think that the final product is a lot of fun and it certainly presents itself as an interesting critique of video games. I’m a hardcore gamer myself (PS3 and PC) but sometimes it does bother me that all I’m doing is pressing some buttons on a controller or a keyboard, so the controls are in no way analogous to what my avatar is doing. Hopefully Project Natal and the Sony Motion Controller will be a step in the right direction. I have high hopes for better virtual reality gaming in the future.

The code is pretty large — a good 20-30MB or so, I think — so I’ll post a video, though probably not until Spring Break.

Project 3: Makes You Dance and Sing

by jsinclai @ 10:46 am

I had given up on my first idea for this project when I just didn’t feel it was interesting (or appropriate) anymore. I was looking around for some motivation and looked at my older projects. I saw my Nees sketch and wondered what it would be like to throw a webcam feed on it. Of course, though, I couldn’t have the same interaction with the mouse position. I wanted this project to deal with people on their feet!

And so, this project started as an installation that responded to the audience. When there is movement in the audience, the display would shake around and go crazy. I played around with a bunch of different forms, sizes, and representations.

STIA Project 3 – Submission from Jordan Sinclair on Vimeo.

I was fairly pleased with how these worked, but felt that it still lacked some “Jordan.”

Then, someone walked in my room and started singing and dancing to some music I was listening to. I got up as well, and we goofed around for a bit. The video started going crazy and we couldn’t ourselves having lots of fun. That’s when it clicked! I want to encourage people to be in this elated, excited state. Instead of rewarding people for being sedentary, I want to reward people for being active and having fun. This kind of ties back to my first project about Happy Hardcore, the music that I love so dearly. It’s dance music! It’s music that screams “Get up on your feet! Dance and sing along!”

And so I flipped the interaction paradigm around. When you’re having fun (dancing and singing) you can see yourself having fun. When you’re still, you don’t really see anything.

STIA Project 3 – Submission from Jordan Sinclair on Vimeo.

Some implementation details:
-I use frame differencing to detect movement.
-Every frame “fades” out. This creates a background that is something other than the plain grey. When the background is just grey, there is usually not enough data on screen to make an interesting display. You see a few crazy blocks and that’s it.
-Movement alone cannot bring the display into focus. Movement is a big motivator (more people dance than sing). It accounts for about 70% of the ‘focus’ (e.g. If there is “maximum” movement, then the display is 70% in focus). But if you want to achieve full focus, you need to sing along as well (or at least make some noise)!

TODO:
-Make a full screen version and hook it up to a projector!
-Frame differencing uses linear mapping “focus” values. I need to scale these to use the differencing values that are most common.
-The audio detection isn’t as solid as it could be. It should certainly be more inviting so that users know what the audio does. I also would like to implement some sort of “fade in/out focus offset.” Currently, the audio only creates focus when you make a noise, but you lose focus in between every word you sing.
-The colors are a little dulled out. Maybe it’s just the lighting, or maybe I can do something to help.

Jon Miller – Project 3

by Jon Miller @ 10:45 am

Project 3

A zip file containing the source code/executable: link
If you wish to get it working in a playable fashion, please contact me. Thanks.

Concept
I chose the input device fairly early on – I figured, given my limited options (microphone, webcam, mouse, keyboard), that webcam would be the most interesting for me, and new territory.

Having seen several projects where the user puts himself into awkward/interesting positions to propel the exhibit, I wanted to create something that forced the user (perhaps against his will) to enter in some embarrassing positions. I decided that a game would be best, to use a person’s natural inclination to win in order to force them to compromise themselves in front of the rest of the class.

I wanted to create a game where a person, using their finger, hand, or entire body, would propel themselves through water by creating snakelike or swimming motions. The idea would be that there would be a viscous fluid that the onscreen representation of one’s hand would push against, and if done right, would propel the user in a direction, similar to the way a snake uses frictional forces against the ground to travel.

Execution
Tracking hands/bodies in a convenient/consistent way across variable lighting conditions/backgrounds proved too daunting a challenge for me, especially since there would be no consistent position for them, so I decided to use brightly colored stuffed fish as the game controllers, because their colors were such that it was unlikely anything else in the room would have the same hue, allowing me to relatively easily parse their shapes and locations.

Secondly, implementing the viscous fluid for them to swim through was also too ambitious, and i settled on the fish moving around the screen based on the fish’s location.

In the end, it was a rush to get something that would maximize fun and increase my chances of delivering a workable product to the table by Wednesday. I changed the game to have the two fish (represented onscreen by the amorphous blobs of whatever the webcam was able to detect) shoot at each other, with sound effects and explosions to maximize fun/silliness.

I was able to (eventually) implement an algorithm that calculated the fish’s “pointiest” area, allowing the user to shoot lasers from that point – this meant that if the users pinched the fish in just the right way, they could achieve a certain degree of aiming, and by moving the fish around, they could dodge incoming lasers to a degree.

Conclusions (post presentation)
Although it was not what I expected, the class seemed to enjoy watching the battle, and the participants were sweating with exertion, so I feel I was able to at least capture the attention of people. I liked that the user was given a certain degree of control that was novel to a computer game (if it can even be called that) – I felt this provided a gameplay mechanic and level of complexity to an otherwise simple game.

Project 3: Musical Typing

by jedmund @ 10:44 am

placeholder

Project 3 – Trace Modeler

by Karl DD @ 10:29 am

Concept

Trace Modeler is an experiment with using realtime video to create three-dimensional geometry. The silhouette of a foreground object is subtracted from the background and used as a two-dimensional slice. At user-defined intervals new slices are captured and displaced along the depth axis. My motivation is to create new interfaces for digital fabrication. The geometry created can be exported as an STL mesh ready for 3d printing.


Related Work

Trace Modeler is related to slit scan photography in that slices of information are used (in this case a silhouette) and built up over time. In particular Tamás Waliczky & Anna Szepesi’s Sculptures uses silhouettes of performs to create 3d forms.

I have also been considering approaches for how the forms created can be fabricated. One approach is to use laser-cut planar materials such as the work of John Sharp. Below is one example using radial slices to form the outline of a head.

The ‘Hand-Made’ project by Nadeem Haidary uses video frames of hands to create laser-cutter ready patterns for fabrication using slice-forms.


Trace Modeler

The videos below show how the geometry can be created using anything from shapes drawn on paper, to physical objects, to parts of the body such as hands.


Output

Geometry can be exported as an .STL file (thanks to the ofxSTL library!). This opens up a number of possibilities for working with the mesh in other software as well as for fabrication using a 3d printer. Here is a screen-shot of an exported mesh in a 3d viewer.


Reflection

Processing the geometry was a little more complicated than I thought it would be. There are still problems with the normals that need to be fixed. The mesh resolution is also based on the first slice, with all subsequent slices resampled to match that number of points.

I am interested in implementing radial displacement in the future, in a similar way to the Spatial Sketch project I worked on last year. This would allow the construction of a more diverse range of forms.

I envision this as a general purpose tool that can be used to create 3d forms in an unconventional way. With careful consideration of camera placement, lighting and the objects used I think there are some interesting things that can be done with this as a tool. I am interested in releasing it in the near future to see what people make.

Project 3: Creature Physics

by guribe @ 9:55 am

Watch the demo here.

Where the idea came from

I originally wanted this project to include physics in a playful way like in the project Crayon Physics. Later, when looking at examples of interactive projects during class, I noticed the various experiences that could be created just through drawing on a screen. I eventually decided to combine the two ideas (physics and drawing) with this project.

My work process

Programming for this project was difficult for me. I ended up looking at various examples of source code from the Processing website. After figuring out the code, I was slowly able to implement my own physics and spring simulators.

My self-critique

Although the project is engaging and fun, it still feels unfinished to me. The “scribble monsters” could be more interesting by changing their expressions or by interacting with one another. I believe I could easily take what I have now and eventually create something more engaging like a game or phone/iPad application.

Project 3: Spider Eggs(-ish)

by Michael Hill @ 4:33 am


This was an attempt at creating an interface that would “spin” the users drawing around a defined object. I drew inspiration from the way a spider spins silk around it’s prey.

Download and run it here.

Successes

Overall, I would consider this project a success.  It functions more or less the way I want, with the exception of a few aesthetic problems.

Issues

1.Colors get distorted at the top and bottom of the shape.  I think this has to do with how colors are randomized and limited.

2.I tried adding lighting to the object, but because it is made out of lines, the shadows dont work very well.  If I were to continue work on it, I would like to try drawing shapes instead of lines. Doing so would allow me to outline the shapes with a different color, making each “stroke” more visible.

3. controlP5 fails to work on some of the sliders when the sketch is placed online.  I think this is how it handles bindings in regards to functions and variables.

**UPDATES**

1. The application now has instructions.

2 The program doesnt seem to work very well online.  I think it has to do with how controlP5 handles binding.  In light of this, I now have a Mac executable available for download: spiderNest.zip

speaker

by Cheng @ 4:30 am

speaker is an interactive gadget that sculpts wires of sound as people around it talks.

Inspirations
The idea started from a discussion about interactive fabrication with Golan and Karl, when we brainstormed what we could take from real life to inform the creation of artifacts. Later when I saw Peter Cho’s takeluma, I decided to make a machine that physically make the shape of sound.

takeluma

For a while I considered cutting/extruding pasta into tasty sound, but food safety and dough feeding made it a bit difficult given the time I had (still hope to do it someday!). Wire bending, at the same time, has an interesting play with the shape of sound wave. It also offers some unexpectedness as wire extrudes and bends into form.

I collected some manual and commercial wire bending examples and came up with my design. A stepper drives a pair of rubber wheels and push the wire forward. A servo waits on the end and busily turns around, bending wire to various angles. Extra support and guide are added to keep wire flying without tangling.

Implementation
Material List

  • servo motor
  • stepper motor
  • toy car wheel with rubber tire
  • hardboard as a mounting base, part of which covered with polythene sheet to reduce friction
  • rolls of metal wire
  • copper tubing as wire guide, and piano wire as wire bender
  • microphone, op amp,  H-bridge, and arduino board
  • The whole system

    Wire flow

    A test of  “speaking” arcs

    speakCurve from Cheng on Vimeo.

    Sound is a rich source of data; you can pick volume, pitch, tempo, timbre, (signal noise ratio, emotional impact…), or any of them combined and map them into shapes. In this prototype, I picked volume. As user speaks into the mic, arduino compares the averaged volume of small sections of time. For raising/lowering value, servo bends wire CCW/CW respectively.

    Future work

    A lot of time was devoted to separating power sources for microphoen, stepper, and servo, so that they don’t interfere. Still have issues with stepper. One major problem of the system is real time response. Default arudino stepper control   is blocking – sound sampling is paused when stepper turns. Even one step each time brakes the flow. Need to find another control strategy, and an optimal update rate.

    Beyond the engineering issues, I would also like to consider where this system will go. Would it be a real time jewelry maker?  Toy? Exibition piece? Would it be interactive? real-time interactive? Or just a wall of names and corresponding bended wire? Could wire be bended to 3D labyrinth? Could the project be scaled up and generate public sculpture? Or be kinetic sculpture itself (snake robot??) …

    Project 3:[Chocolate,Chocolate, Add some milk][Kuan-Ju]

    by kuanjuw @ 12:44 am

    Concept

    There is a game called “chocolate chocolate add some milk” which is played by 4 to 5 people standing in a line.
    It starts by the first player who does some moves with in the rhythm of “chocolate chocolate add some milk”.And after the first round the second player duplicate the first player’s move while the first player is creating a new move. And the third player duplicate the second player’s move and so on.

    In this project I used web cam to record moves. After the round it plays the frames that has been recorded right next to the real time frames. And then doing the video cascade for the rest player.
    For implementation first I captured the frames from the web cam and drew it in numbers of rectangle, and at the same time I save it as an image array. Finally I cascaded the video in sequence with 100 frames delay for each video.

    Untitled from kuanjuwu on Vimeo.

    The project uses a web cam and a projector. A good lighting condition is required.

    And I wore black shirt and gloves to enhance the quality of video capture.

    Augmenting with Optical Flow

    by paulshen @ 12:16 am

    http://in.somniac.me/2010/03/03/augmenting-optical-flow/

    Project 3 – The Secret Word of the Day

    by sbisker @ 9:15 pm 2 March 2010

    I’m interested in interactions that people can have with digital technology in public spaces. These ideas are not new, but digital technology has only recently reached the cost, effectiveness and social acceptability where someone can actually turn a crazy idea about a public interaction with computers into a reality.

    As soon as I heard this project was about “real-time interactions”, I got it in my head to try to recreate the “Secret Word” skit from the 1980’s kids show “Pee Wee’s Playhouse.” (Alas, I couldn’t find any good video of it.) On the show, whenever a character said “the secret word” of each episode, the rest of the cast of characters (and most of the furniture in the house, and the kids at home) would “scream real loud.” Needless to say, the characters tricked each other into using the secret word often, and kids loved it. The basic gist of my interaction would be that a microphone would be listening in on (but not recording, per privacy laws) conversations in a common space like a lab or lobby, doing nothing until someone says a very specific word. Once that word was said, the room itself would somehow go crazy – or, one might say, “scream real loud.” Perhaps the lights would blink, or the speakers would blast some announcement. Confetti would drop. The chair would start talking. Etcetera.

    Trying to build this interaction in two weeks is trivial in many regards, with lights being controlled electronically and microphones easily (and often) embedded in the world around us. However, one crucial element prevents the average Joe from having their own Pee Wee’s Playhouse – determining automatically when the secret word has been said in a casual conversation stream. (Ok, ruling out buying humanoid furniture.) For my project, I decided to see if off-the-shelf voice recognition technology could reliably detect and act on “secret words” that might show up in regular conversation.

    In order to ensure that the system could be used in a public place, I needed a voice recognition system that didn’t require training. I also needed something that I could learn and play with in a short amount of time. After ruling out open-source packages like CMU Sphinx, I decided to experiment with the commercial packages from Tellme, and specifically, their developer toolkit (Tellme Studio). Tellme, a subsidary of Microsoft, provides a platform for designing and hosting telephone-based applications like phone trees, customer service hotlines and self-service information services (such as movie ticket service Fandango).

    Tellme Studio allows developers to design telephone applications by combining a mark-up development language called VoiceXML, a W3C standard for speech recognition applications, with Javascript and other traditional web languages. Once applications are designed, they can be accessed by developers for testing purposes over the public telephone network from manually assigned phone numbers. They can also be used for public-facing applications by routing calls through a PSTN-to-VoIP phone server like Asterisk directly to Tellme’s VoIP servers, but after much fiddling I found the Tellme VoIP servers to be down whenever I needed them – so for now, I thought I’d prototype my service using Skype. Fortunately, the number for testing Tellme applications is a 1-800 number, and Skype offers free 1-800 calls, so I’ve been able to test and debug my application over Skype free of charge.

    So how would one use a phone application to facilitate an interaction in public space? The “secret word” interaction really requires individuals to not have to actively engage a system by dialing in directly – and telephones are typically used as very active, person to person communication mediums. Well, with calls to Tellme free to me (and free to Tellme as well if I got VoIP working), it seemed reasonable that if I could keep a call open with Tellme for an indefinite amount of time, and used a stationary, hidden phone with a strong enough microphone, I could keep an entire room “on the phone” with Tellme 24 hours a day. And since hiding a phone isn’t practical (or acceptable) for every iteration of this work, I figured I could test my application by simply recording off my computer microphone into a Skype call with my application in a public setting (say, Golan conversing with his students during a workshop session.)

    Success! It falsely detects the word "exit", but doesn't quit.

    Success! It falsely detects the word "exit", but doesn't quit.

    In theory, this is a fantastic idea. In practice, it’s effective, but more than a little finicky. For one, Tellme is programmed to automatically “hang up” when it hears certain words that it deems “exit words.” In my first tests, many words in casual conversation were being interpreted as the word “exit”, quitting the application within 1-2 minutes of consistent casual conversation. Rather than try to deactivate the exit words feature entirely, I found a way to programmatically ignore exit events if the speech recognition’s confidence in the translation was below a very high threshold (but not so high that I couldn’t say the word clearly and have it still quit.) This allowed my application to stay running and translating words for a significant amount of time.

    A bit of Tellme code, using Javascript to check the detected word against today's secret word

    A bit of Tellme code, using Javascript to check the detected word against today's secret word

    Secondly, a true “word of the day” system would need to pick (or at least be informed of) a new word to detect and act on each day. While the Tellme example code can be tweaked to make a single word recognition system in 5 minutes, it is harder (and not well documented) how to make the system look for and detect a different word each day. The good news is, it is not difficult for a decent programmer to get the system to dynamically pick a word from an array (as my sample code does) and have it only trigger a success when that single word in the array is spoken. Moreover, this word can be retrieved over an AJAX call, so one could use a “Word of the Day” service through a site like Dictionary.com for this purpose (although I was unable to get corporate access to a dictionary API in time.) The bad news is, while VoiceXML and Tellme code can be dynamically updated at run-time with Javascript, the grammars themselves are only read in at code compile time. Or, translated from nerd speak, while one can figure out what words to SAY dynamically, one needs to prep the DETECTION with all possible words of the day ahead of time (unless more code is written to create a custom grammar file for each execution of the code). Unfortunately, the more words that are added to a grammar, the less effective it is at picking out any particular word in that grammar – so one can’t just create a grammar with the entire Oxford English dictionary, pick a single word of the day out of the dictionary and call it a day. So in my sample code, I give a starting grammar of “all possible words of the day” – the names of only three common fruits (apple, banana and coconut). I then have the code at compile time select one of those fruits at random, and then once ANY fruit is said, the name of that fruit is compared against the EXACT word of the day. However, server-side programming would be needed to scale this code to pick and detect a word of the day from a larger “pool” of possible words of the day.

    Finally, a serious barrier to using Tellme to act on secret words is the purposes for which the Tellme algorithm is optimized. There is a difference between detecting casual conversation, where words are strung together, versus detecting direct commands, such as menu prompts and the sorts of things one normally uses a telephone application for – and perhaps understandably, Tellme optimizes their system to more accurately translate short words and phrases, as opposed to loosely translating longer phrases and sentences. I experimented with a few ways of trying to coerce the system to treat the input as phrases rather than sentences, including experimenting with particular forms versus “sentence prompt” modes, but it seems to take a particularly well articulated and slow sentence for a system to truly act on all of the words in that sentence. Unfortunately, this particular roadblock is one that may be impossible to get around without direct access to the Tellme algorithm (but then again, I’ve only been at it for 2 weeks.)

    "Remember kids, whenever you hear the secret word, scream real loud."

    "Remember kids, whenever you hear the secret word, scream real loud."

    In summary – I’ve designed a phone application that begins to approximate my “Secret Word of the Day” interaction. If I am talking in a casual conversation, a Skype call dialed into my Tellme application can listen and translate my conversation in real time, interjecting with (decent but not great) accuracy with “Ahhhhhhh! You’ve said the secret word of the day!” in a strangely satisfying text-to-speech voice. Moreover, this application has the ability to change the secret word dynamically (although right now the secret word is reselected for each phone call, rather than “of the day” – changing that would be simple.) All in all, Tellme has proven itself to be a surprisingly promising platform for enabling public voice recognition interactions around casual conversation. It is flexible, highly programmable, and surprisingly effective at this task with very basic tweaking (in my informal tests, picking up on words I say in sentences about 50% of the time) despite Tellme being highly optimized for a totally different problem space.

    Since VoiceXML code is pretty short, I’ve gone ahead and posted my code below in its entirety: folks interested in making their own phone applications with Tellme should be heartened by the inherent readability of VoiceXML and the fact that the “scary looking” parts of the markup can, by the by, be simply ignore and copy-pasted from sample code. That said, this code is derived from sample code, which is copyrighted by Tellme Networks and Microsoft, and should only be used on their service – so check yourself before you wreck yourself with this stuff. Enjoy!

    <?xml version=”1.0″ encoding=”UTF-8″?>

    <!–

    Solomon Bisker – The Secret Word of The Day

    Derived from Tellme Studio Code Example 102

    Copyright (C) 2000-2001 Tellme Netprocessings, Inc. All Rights Reserved.

    THIS CODE IS MADE AVAILABLE SOLELY ON AN “AS IS” BASIS, WITHOUT WARRANTY

    OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION,

    WARRANTIES THAT THE CODE IS FREE OF DEFECTS, MERCHANTABLE, FIT FOR A

    PARTICULAR PURPOSE OR NON-INFRINGING.

    –>

    <vxml version=”2.0″>

    <!– Does TellMe REALLY support javascript? We’ll see. –>

    <!–    <script src=”http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.js”/> –>

    <var name=”awotd”/>

    <script>

    var myWords = new Array(“apple”, “banana”, “coconut”);

    //gives us a random number between 0 and 2. This uniquely determines

    //our “secret word of the day”

    var randomnumber = Math.floor(Math.random()*3)

    awotd = myWords[randomnumber];

    </script>

    <!–Shortcut to a help statement in both DTMF and Voice (for testing) –>

    <!– document-level link fires a help event –>

    <link event=”help”>

    <grammar mode=”dtmf” root=”root_rule” tag-format=”semantics/1.0″ type=”application/srgs+xml” version=”1.0″>

    <rule id=”root_rule” scope=”public”>

    <item>

    2

    </item>

    </rule>

    </grammar>

    <grammar mode=”voice” root=”root_rule” tag-format=”semantics/1.0″ type=”application/srgs+xml” version=”1.0″ xml:lang=”en-US”>

    <rule id=”root_rule” scope=”public”>

    <item weight=”0.001″>

    help

    </item>

    </rule>

    </grammar>

    </link>

    <!– The “Eject Button.” document-level link quits –>

    <link event=”event.exit”>

    <grammar mode=”voice” root=”root_rule” tag-format=”semantics/1.0″ type=”application/srgs+xml” version=”1.0″ xml:lang=”en-US”>

    <rule id=”root_rule” scope=”public”>

    <one-of>

    <item>

    exit

    </item>

    </one-of>

    </rule>

    </grammar>

    </link>

    <catch event=”event.exit”>

    <if cond=”application.lastresult$.confidence &lt; 0.80″>

    <goto next=”#choose_sat”/>

    <else/>

    <audio>Goodbye</audio>

    <exit/>

    </if>

    </catch>

    <!– Should I take out DTMF mode? Good for testing at least.–>

    <form id=”choose_sat”>

    <grammar mode=”dtmf” root=”root_rule” tag-format=”semantics/1.0″ type=”application/srgs+xml” version=”1.0″>

    <rule id=”root_rule” scope=”public”>

    <one-of>

    <item>

    <item>

    1

    </item>

    <tag>out.sat = “sat”;</tag>

    </item>

    </one-of>

    </rule>

    </grammar>

    <!– The word of the day is either “processing” (static) or the word of the day from our array/an API –>

    <grammar mode=”voice” root=”root_rule” tag-format=”semantics/1.0″ type=”application/srgs+xml” version=”1.0″ xml:lang=”en-US”>

    <rule id=”root_rule” scope=”public”>

    <one-of>

    <!– The dynamic word of the day –>

    <!– WE CANNOT MAKE A DYNAMIC GRAMMAR ON PURE CLIENTSIDE!

    DUE TO LIMITATIONS IN SRGS PARSING. WE MUST TRIGGER ON ALL THREE

    AND LET THE TELLME ECMASCRIPT DEAL WITH IT. –>

    <item>

    <one-of>

    <item>

    <one-of>

    <item>

    apple

    <!– loquacious –>

    </item>

    </one-of>

    </item>

    </one-of>

    <tag>out.sat = “apple”;</tag>

    <!–                        <tag>out.sat = “loquacious”;</tag>–>

    </item>

    <item>

    <one-of>

    <item>

    <one-of>

    <item>

    banana

    </item>

    </one-of>

    </item>

    </one-of>

    <tag>out.sat = “banana”;</tag>

    </item>

    <item>

    <one-of>

    <item>

    <one-of>

    <item>

    coconut

    </item>

    </one-of>

    </item>

    </one-of>

    <tag>out.sat = “coconut”;</tag>

    </item>

    <!– The static word of the day (for testing) –>

    <item>

    <one-of>

    <item>

    <one-of>

    <item>

    processing

    </item>

    </one-of>

    </item>

    </one-of>

    <tag>out.sat = “processing”;</tag>

    </item>

    </one-of>

    </rule>

    </grammar>

    <!– this form asks the user to choose a department –>

    <initial name=”choose_sat_initial”>

    <!– dept is the field item variable that holds the return value from the grammar –>

    <prompt>

    <audio/>

    </prompt>

    <!– User’s utterance didn’t match the grammar  –>

    <nomatch>

    <!–<audio>Huh. Didn’t catch that.</audio> –>

    <reprompt/>

    </nomatch>

    <!– User was silent –>

    <noinput>

    <!–    <audio>Quiet, eh?</audio> –>

    <reprompt/>

    </noinput>

    <!– User said help –>

    <help>

    <audio>

    Say something. Now.

    </audio>

    </help>

    </initial>

    <field name=”sat”>

    <!– User’s utterance matched the grammar –>

    <filled>

    <!– HERE ECMA SCRIPT CHECKS FOR WORD MATCH –>

    <if cond=” sat == awotd “>

    <audio>I heard you say <value expr=”awotd”/>

    </audio>

    <goto next=”#sat_dept”/>

    <!– from old code –>

    <elseif cond=” sat == ‘processing’ “/>

    <goto next=”#shortword_dept”/>

    <!– Wrong word in grammar was said, spit back into main loop. –>

    <else/>

    <audio>You’re close!

    </audio>

    <goto next=”#choose_sat”/>

    </if>

    </filled>

    </field>

    </form>

    <form id=”sat_dept”>

    <block>

    <audio>Ahhhhhhh! You’ve said the secret word of the day!</audio>

    <goto next=”#choose_sat”/>

    </block>

    </form>

    <form id=”shortword_dept”>

    <block>

    <audio>That’s a nice, small word!</audio>

    <goto next=”#choose_sat”/>

    </block>

    </form>

    </vxml>

    Project 3: “Dandelion”

    by aburridg @ 6:12 pm

    Download a zip containing an executable of my project. To run it, unzip the downloaded file and click on the executable file named “proj3”:
    For Macs
    For Windows

    Here is a video of my art project. I placed it running in the Wean 5207 Linux Cluster for about 5 minutes. The video is sped up (so it’s like a little time lapse):

    Inspiration
    I knew I wanted to do something with audio at some point–since I have never worked with audio before at a code level. The idea to use a dandelion came from a dream, and because as a kid I was addicted to exploding dandelions. Another aspect I took into account was where I would ideally place this piece (if I ever continued with it and decided to prim it up). I would most likely put this in a generally quiet location–a museum, a library, a park. And, hopefully it would encourage people to interact with it by being loud (since this piece is more interesting the more sound you make).

    How the Project Works with the User
    You start out with an intact dandelion. I included noise waves in the background because I thought it was cool and would hopefully, if I ever did exhibit this place someplace quiet, would give my audience a clue as to how to interact with the piece. When you make a loud enough noise the petals will come off and float based on how loud you continue to be. If it is dead quiet the petals will stay at the bottom, otherwise they will ride on a simulated “wind” that is determined by the sound levels.

    Project’s Short Comings
    I also realize that this project is not very visually pleasing…and runs a little slowly if all the petals come off at once. I know this is because of Processing and probably because I’m keeping track of too many variables at a time. Also, I know visually it is a little dull…if I had more time I would probably have the dandelions tilt more due to forces and make the stem move as well.

    If I had a lot more time, I would make the dandelions more complicated and probably try in 3D. I would also probably try to port it to Open Frameworks. But, for a small project…Processing seemed like a good choice.

    ESS Library
    For this project I used the ESS library (found here) to interpret real-time audio input from my laptop’s built in microphone.

    I also borrowed the basis of the code from Shiffman’s Flow.

    Coding Logistics
    Instead of having a 2D array of vectors (so that the canvas was split up into a grid) like Shiffman, I only used a 1D array of vectors (and split up the canvas into columns). If a dandelion petal was within a column, it followed the flow vector of that column. To determine the flow vector of a column, I used the audio input. The angle and magnitude of the vector is determined by how loud the channel sound is of the column’s corresponding audio input channel. The dandelions also follow some basic real time forces (separation and gravity).

    This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
    (c) 2016 Special Topics in Interactive Art & Computational Design | powered by WordPress with Barecity