Friday, February 24, 2012

Hard data on YouPorn

Introduction
As you might have heard (or not) YouPorn Chat had a huge information leak on February 21st 2012. One of their servers served a directory with all registration log files from the last couple of years (http://chat.youporn.com/tmp). Apparently this chat server is not serviced by the YouPorn guys but by the YouPorn chat guys according to their blog post. Nevertheless, I assume that there will be a huge overlap for the passwords between the chat service and YouPorn in general and as well as to other accounts. The files were world-readable and could be downloaded. Some Swedish guys at flashback.org discussed what they could do with the log files. They discovered that the logs contained all registration details and account creation details of all users from 2008 to 2012 and they shared it with the world. Soon thereafter Anders Nilsson published an analysis of all passwords on
his blog. His analysis shows the top passwords and he shows some statistics about the different passwords. Non surprisingly the 10 top passwords are 123456, 123456789, 12345, 1234, password, qwert, 12345678, 1234567, 123, and 111 111. But analyzing passwords is only the first step.

When I read about the password breach on twitter I thought that we could do more with the available data and I got my hands on a copy of the logs. The log files show detailed information about username, password, email address, country of origin, date of birth, and user id. So let's play!

Log format
The log format is really simple and consists of logged registration attempts and server responses. A registration attempt is logged in the following format:
<user_register.php: 2010-01-11 00:00:03
POST
username=MyFunnyUsername
email=Mail@Foo
email_confirm=Mail@Foo
password=1234
password_confirm=1234
country=US
msisdn=
isyp=0
isPremiumChat=
dob=1990-08-17
sub1=1
sub2=1
is3g=
>

I guess that username, email, password, country, and dob (date of birth) are self explanatory. The server responds with either a reply that there was an error or with a new user id.

A registration is unsuccessful if either the email or the password do not match or if the username is already taken. The error code is encoded as the following message:
<REPLY username =fucking whore
status =207
err_msg =202
>

A successful registration contains a correct status and a new userid:
<REPLY username =LaraDWT28SL
status = OK
user_id =3565583
>

After downloading the files I had to parse them using some script foo. A dirty little python script {link file} did the trick. As I wanted to do some heavy number crunching and I did not want to spend days reevaluating the same data over and over again I imported all the logs directly into a not so small MySQL database. The database has the following layout:
CREATE TABLE accounts (
  date DATE,
  username varchar(128),
  password varchar(128),
  email varchar(128),
  dob DATE,
  country varchar(2),
  userid INT DEFAULT -1,
  INDEX (date),
  INDEX (username),
  INDEX (password),
  INDEX (email),
  INDEX (dob),
  INDEX (country),
  INDEX (userid)
) TYPE=InnoDB;

The complete import, i.e., parsing, formatting, MySQL import, and index generation took a couple of hours. Due to weird formatting I lost some accounts during the import, so the numbers are a lower bound to the total numbers.


Analysis

With so many raw data sets (5290696 registration attempts led to 1202040 unique user accounts) it is hard to work with text files only. So the MySQL database was a good choice to start with. One of the most interesting analyses is the password analysis. Andres already published a breakdown of the passwords in his blog and the full results on pastebin. I assume he filtered the raw data for the raw passwords. Using a database I have the advantage that I can select more detailed combinations of data. In the following analysis I will look at country specific details, registration attempts, email addresses, and the age distribution of the YouPorn users.

By country

The top country with most registrations is the US (27%), followed by Germany (12%), the UK (9%), Italy (5%), the Philippines (4%), Canada (3%), France (3%), India (3%), Australia (2%), and Mexico (2%). The graph shows a pie chart of the 20 countries with most registrations.


If we look at registration attempts then the picture is a little different. The log files contain a total of 4,088,656 registration attempts and 1,202,039 successful registrations, so on average a user tried to register more than 3.4 times until he/she was successful. Typing captchas with one hand must be hard
(pun intended).


The number of total registrations actually seems to scale by country. There is no country that has significantly more failed registration attempts than an other country. India, Indonesia, and the Philippines have a slightly higher amount of registration attempts than the other countries. The table shows the number of registered accounts and the number of registration attempts.


Age distribution

The age distribution graph shows the fraction of total registrations per year of birth. The average porn registrant is 31.04 years old, with a tendency of getting younger, most registrants are 24 years old and represent 7.96% of all registrations. The two peaks in the graph are around the ages 32 (year of birth 1980) with 5.72% of total registrants and 24 (1988) with 7.96%. The older people get (above 30) the less likely they are to register.


The graph shows that there is a high rising edge around age 20 that drops off slowly. The question remains if younger registrants just enter fake birth dates or if they do not register at all. Apparently the website did not impose any age restrictions as the year of birth span starts in 1908 and goes up to 2007.


Unfortunately real work calls right now, stay tuned for more results!

Monday, January 2, 2012

28c3 - 28th Chaos Communication Congress & Berlin Sides or a tough week in Berlin

Last week we celebrated that special time of the year again. For me it was the 8th time that we went to the Chaos Communication Congress and the 3rd time that I had a talk. This year we also had tokens for the BerlinSides, a side conference with only technical talks organizes by Aluc.

We carried out the same procedure as every year; Stormbringer and I meet on December 26th around 7pmish at the airport in Zürich for a beer or two. Unfortunately he was late, so I had to drink alone. No harm was done as I still had to finish the slides for my talk. The flight was really smooth and we arrived at 11pm at the hostel. Following our regular procedure we walk right to the bcc to get our badges (and the first couple of beers). A couple of things changed this year, the bcc committee no longer allowed smoking inside the venue so the hack lounge (that was very cozy in the last couple of years with many couches, music, video installations, good wired network connectivity) was replaced by a smoking tent outside of the bcc that had like 1/100 of the style. The hacking area was never as crowded as it was in earlier years and the hackers are moving more and more towards software only. Well, nevertheless we had to try out the lounge/tent on the first day and were thrown out at around 4am. That's one other novelty: for the first time the bcc (or parts of the congress) closed during the night.

The following four days went by in a blur. I watched many interesting talks, met many interesting people, had good discussions, had a blast during my talk, and had the one-odd beer or so. I organized the interesting talks into three categories, technical talks, political / social talks, talks that I would have liked to watch. The talks are ordered according to my subjective rating on a scale from 1 to 10.


Technical talks:

String Oriented Programming, Mathias Payer (my talk)
Mathias first gives an overview of all the different attack vectors that are currently used in exploits (code injection, return oriented programming, jump oriented programming, and format string attacks). He then discusses the available defenses on current systems (Data Execution Prevention, Address Space Layout Randomization, and ProPolice). Using a tool that emits specially crafted format strings he presents an attack that can be used to rewrite some static regions of a program (e.g., GOT, or PLT regions of the main executable) into a jump/return oriented interpreter that reuses parts of the application to execute arbitrary code.


Print Me If You Dare, Ang Cui, Jonathan Voris (8)

Ang et al. present an awesome hack how you can upload your own malware to regular HP printers. Current HP printers are connected to the network, have fairly powerful processors, and can be updated (without authentication) over the Internet. The talk includes a live demo. Great presentation!


Datamining for Hackers, Stefan Burschka (7+)
Stefan gave great talk about the potential of datamining and how datamining can be used to exploit and analyze legacy systems. Stefan talks about traffic mining where he exclusively looks at traffic patterns and unencrypted fields in the headers (e.g., length, flags) to infer details of the encrypted connection (e.g., pauses, which party is speaking, and other details). All in all an entertaining talk with medium level of details and verbosity.

802.11 Packets in Packets, Travis Goodspeed (and Sergey Bratus) (7)
Travis and Sergey talk about and introduce probabilistic packet injection. If the wireless signal is congested in one way or another or if there are interferences then the transmission of a packet can be incomplete. The main idea of the hack is that a part of the original (legit) packet is destroyed during the transmission. The data section of the packet now contains a complete inner packet of the same protocol. If the header of the original packet is destroyed then the inner packet is parsed like a regular packet. This hack can be used to inject illegal packets into protected networks (e.g., somebody downloading a large file; some packets are transmitted wrong and are reinterpreted as "attack packets" due to the Trojan horse character of the packets). The idea is really nice, but I doubt that an attacker is able to race a sufficiant amount of times against the (very low) probability that only the header is destroyed and no other parts of the data section that contains the illegal packet. After the talk I actually asked this question and Travis did not really answer it.


Can trains be hacked?, Stefan Katzenbeisser (7)
Interesting talk (in German) about the history of train safety (including infos on signalling, relays, and so on). Stefan includes details on "Stellwerken" as well.


The Atari 2600 Video Computer System: The Ultimate Talk, Sven Oliver ('SvOlli') Moll (6)
Interesting talk bei Sven about all the Atari 2600 internals. Sven was inspired by Michael Steil's talk at 26c3 about the C64 internals (which was an awesome talk as well, go watch the recording!). Sven presents a nice introduction about all the hardware details of the Atari 2600, the development of ROM/RAM boards, and a lot of nitpicking about programming the given hardware.

x86 oddities, corkami (6)
Corkami presents nice subtleties of the x86 machine code. He shows undocumented instructions, especially how these instructions can be used in packers and malware to circumvent debuggers, emulators, and other checking techniques. Very low level talk that assumes a lot of prior knowledge of x86. Overall very interesting, unfortunately there is no recording available.


Reverse Engineering USB Devices, Drew Fisher (6)
Drew is a MsC student at UC Berkeley in Human Interaction. He talks about the USB protocol and how to reverse engineer drivers for new USB hardware.


Introducing Osmo-GMR, Sylvain Munaut (6)
Hacking satellite phones. Sylvain introduces a new feature for the Osmo software stack.



Defending mobile phones, Karsten Nohl, Luca Melette (5)
Karsten and Luca show techniques how to clone existing mobile phones given a regular call that can be eavesdropped. The cloned phones can be used to call premium numbers or to send text messages to premium services. After the motivating example they shows how the attacks that Karsten and co. developed during the last couple of congresses can be mitigated using additional software, additional checks, or new algorithms. Interesting talk, but the "big hack" was missing. They gave a great overview of the available attacks but failed to bring up something new (for this year).

Rootkits in your Web application, Artur Janc (5) (2nd link)
A regular XSS bug is used in combination with new HTML5 features to implement persistent rootkits in web applications. The combination of persistence and XSS bugs enables rootkits that reinstall themselves even if caches are cleared. Artur also explains how these rootkits can be used to grab information and forge, e.g., baking sites.

New Ways I'm Going to Hack Your Web App, Jesse Ou, Rich (5)
Similar to Rootkits in your Web application. Featuring HTML5, XSS attacks, and other nice technologies.

Cellular protocol stacks for Internet, Harald Welte (5)
Great overview talk of all the wireless protocols used in the last 20 years. If you want to know more about GSM, UMTS, and all the other protocols, then go watch this talk to get some pointers. If you are not interested in an overview, then the talk is just a 1hr show of three letter acronyms.


Time is on my Side, Sebastian Schinzel (5)
Sebastian is a PhD student in Erlangen. He studies side-channel attacks on web pages. The talk introduces timing analysis, how to get exact timing measurments, and how to remove jitter. He talks about different approaches based on TCP/IP how to measure jitter for each packet instead of per connection. If the sever side is stacked (PHP over Apache) then you need domain specific knowledge, you need to know which parts are sent by Apache and when control is passed to PHP. Using this DS knowledge you can reduce the jitter inside the PHP application. The idea is pretty straigth-forward. Do n measurments, do statistic analysis, compare, get hidden data. Talk shows attack on XML RSA encryption using timing attack (based on PKCS#1 decryption and pre-existing attack); combine both techniques to break XML encrypted messages.


Security Log Visualization with a Correlation Engine, Chris Kubecka (5)
Solid talk about how to use correlation engines to analyze log files.



Apple vs. Google Client Platforms, Bruhns, FX of Phenoelit, greg (4)
FX and some guys bash about Apple and Google client platforms. They analyze the hardware platforms of the Google Chromebook (no good exploits found) and the iPad 1 (some possible exploits similar to red snow found, red snow uses a bug in the boot ROM that can not be fixed by Apple). They also found some bugs in the markets of Apple and Google. Both markets are vulnerable to XSS exploits. All in all I expected more of this talk. The presentation was good but FX was overselling the bugs they found and in my opinion there was too much bashing around.

Protecting Software, dosbart (3)
How to protect legacy software from piracy. So-so talk that ended in a long rant against piracy and software cracking.

X(tra|ml|slt|query|dp|mas) pwnage, Nicob (3)
Let's just use XSL bugs to inject new code into a server and let's execute it server side. Nicob includes details on how to transcode procedural-oriented code into functional-oriented code used by XSL. Talk was not that interesting as he presented too many details on how to write code instead of showing individual attacks.

Automatic Algorithm Invention with a GPU, Wes Faler (2)
Wes talks about genetic programming and GPU programming. I was not that interested in the topic of the talk and drifted off pretty soon. In addition I do not believe that genetic programming or some other automatic programming techniques will be able to evolve automatically generated code to very complex/optimized algorithms.


Political/social:

Hacker Jeopardy (9)
Jeapardy game show with hacking questions. Awesome just like every year!

„Die Koalition setzt sich aber aktiv und ernsthaft dafür ein“, maha/Martin Haase (8)
Unfortunately this talk is only available in German. Martin Haase analyzes the talks of politicians and shows the use (and misuse) of language. Interesting and funny as usual.

Fnord-Jahresrückblick, Felix von Leitner, Frank Rieger (8)
The second political talk that is only available in German. Fefe and Frank talk about what happened during the year and give a nice "fnord" review about all the political, social, and other mishaps. Funny and entertaining, although not the best Fnord show ever.

Der Staatstrojaner, 0zapfths, Constanze Kurz, Frank Rieger, Ulf Buermeyer (7)
The third political talk that is only available in German. The group reviews the Trojan horse that was developed by Germany to spy on its people. They analyze both the technical and the political side and give a great review on the development.

The coming war on general computation, Cory Doctorow (6)
Cory discusses the problems with Turing complete CPUs. They can be used to compute anything. Appliances now want to ensure that only specific functionality can be executed on these CPUs. This is hard to enforce.

The Hack will not be televised?, Caspar Clemens Mierau (6)
One of the few talks that was not recorded (due to copyright issues - torrent might be available). Caspar shows different sequences of hacks in movies. He talks about the hacks and how they are shown on the screen. I liked the movie sequences but I do not like his take-outs (e.g., women are not hackers).

"Neue Leichtigkeit", Alex Antener, Amelie Boehm, Andrin Uetz, Jonas Bischof, Ruedi Tobler, Samuel Weniger (6)
Artistic show with lots of booze.

SCADA and PLC Vulnerabilities in Correctional Facilities, Tiffany Rad, Teague Newman, John Strauchs (4) (2nd link)
New breakthroughs in SCADA systems... More and more people know about the vulnerabilities in SCADA and PLC systems. These systems are also used in correctional facilities. Exploits and expertise in these systems can therefore be used to break out of prisons. Tada.


What I would have liked to watch:

So your 0day exploit beats ASLR, DEP and FORTIFY? I don’t care, Erik Bosman
Erik would have presented Minemu, a minimal binary translator that executes full memory taint checking. I read the paper and the work looks solid, the discussion with Erik was also very interesting. Unfortunately the talk was canceled due to timing issues.

Thursday, May 12, 2011

Academic careers or how to become a professor.

Summary: Have a vision. Stay foolish.

Introduction
On May 11th 2011 the VMI (the organization of the scientific staff at the CS department at ETH Zurich where I'm currently the president) organized an informational event on how to start and how to structure your academic career. If you are pursuing a PhD then at one point in time you will think about continuing your academic career by either becoming a professor or by working at a research lab. Many people on the other hand will turn away from an academic career at a later point. We asked ourselves what it takes to pursue the academic career, what are pros and cons of becoming a professor vs. either working in a research lab or vs. working in the industry.
Three speakers presented their academic careers and told us about their ideas and how they structured their academic life. An open question session also allowed the audience to ask personal and detailed questions why they made individual decisions. Andreas Krause, a new assistant professor at ETH Zurich
gave an impression of the beginnings of an academic career at an university, Christian Cachin then talked about working at a research lab at IBM Research Zurich. The third speaker was Markus Gross, full professor at ETH Zurich and head of Disney Research Zurich.
In this blog post I try to summarize the information they gave us, but I'll twist that information with my own thoughts and my own (incomplete) experience about the topic after being in the PhD program for almost 5 years.

How to evolve
It is surprising that many academic careers evolve out of similar roots and environments. Many academics were big nerds in their teens, disassembled hardware, and were fiddling around with technology all the time. This brings back memories from my own youth where I was dumpster diving for old hardware. I was also well known for disassembling any electronic hardware I got into my hand. Sometimes I blew the fuses of our house, but more often it seemed to work. At one point in time this curiosity came across the first computer and the average technology nerd then tries to figure out how this magical machine works. So all in all curiosity and affinity to electronic hardware appears to be a good foundation for future academic career. You have to keep that curiosity to figure out unknown and unresolved problems and you need to stay foolish enough to try out things nobody has tried before.

Research
Both research labs and universities have their individual advantages and disadvantages if you are interested in research. At the university you have absolute freedom what you research and you are only bound indirectly by the funding you are able to attract. If you are not able to attract funding then you are limited in the amount of research you can carry out. On the other hand if you work in a research lab then the boss of that research lab will control in what are you will do research (so better choose your lab wisely).
Additionally one of the biggest differences between academia and research labs is that research labs are interested in patents. You are measured by the amount of patents you put out, not by the amount of papers that you present at academic conferences. Writing a patent will take up as much time as writing a scientific paper, including all revisions.
One drawback of academia is that research is only a (small) part of your daily life. Due to teaching lectures, mentoring students, and balancing other responsibilities research will only be one of many activities that you need to worry about. On the other hand at a research lab you are able to work full time
on your ideas (minus maybe some mentoring overhead if you are a group leader or a tech. lead).

Funding
If you want to carry out research in academia you need money first. In some universities there are some fixed positions or paid positions per assistant professor or full professor, additional positions must be financed through external funding.
On the other hand in research labs you are often bound by project budgets and you are limited how much time you spend for a specific project until it must pay off.
So both in academia and in research labs you need to worry to some extent about money to fund your research.

Lectures
In academia you need to give lectures as well. Planning lectures and creating the curriculum takes up a lot of time. If you do not like teaching then academia is definitively the wrong place for you. Both professors agreed that they really like teaching and that it is a very satisfying feeling to work with the
brightest young students and teaching them new things.

Building your own group
The basic idea is that good people attract other good people. If you are a good lecturer then you will attract the bright students. If you publish good results then you will get good post docs. In addition you need to approach the good students directly.

Advice to kick-start your academic career
Some advice to kick-start your academic career is to go for fellowships, e.g., Microsoft Research, IBM Research, Google, Yahoo! Key Scientific Challenges and governmental fellowships (e.g., SNF, NSF).
At conferences you should give tutorials and organize interesting workshops at top conferences. As a (PhD) student you should search for interesting (research) internship positions at interesting places. This helps you to build your network of professional mentors that you can use later on for reference letters, to meet new people, to attract funding, and fellowships.
A drawback of academia is that it is a probabilistic system. There is no guarantee that if you have x good publications and y good talks and references then you will get tenure. Always go for new ideas and be the first or one of the first. Have a vision. Stay foolish.

Time management
One of the most important factors of academic careers is time management and resource planning. No matter weather you go for a research position in a lab or at a university you need to plan your available time. Otherwise the combination of all your auxiliary tasks will use up all your available time and you will have no more time for fun things like research, teaching, or spending time with your friends and family.

Is it worth it?
From the view of both professors (one halfway into his career, the other at the beginning of his career) and from my view in front of the PhD finishing line: absolutely yes. Your mileage may vary of course.

Resources:
Academic job listings (watch out for the academic cycle in the US!)
What they didn't tell you in grad school (Paul Grey and David E. Drew), either buy it or get the PDF

Monday, April 18, 2011

40 hours in Washington D.C.

I was on a tight schedule as I only had around 30 hours in Washington D.C. to see all the mayor sights. First of all I stayed in the Downtown Washington D.C. Hostel, which was great because I could just walk to Union Station and all the major sights.

I arrived around 5pm on Baltimore-Washington International Airport. From there I took the shuttle to the MARC train station and drove to Union Station (for 6$, Amtrak for the same distance was around 60$, but would have been 15mins faster). After dropping my bags off at the Hostel it was already getting dark so I hopped over to see the Capitol by night. On the way back I walked through Chinatown and got a first impression of the city. Washington appears to be a small scale version of New York and with smaller houses that have a more English touch (brick houses next to Union Station).

I started my day in Washington with a big breakfast in Union Station, then went to the Capitol for the free tour (takes about 90minutes after you are in) which is absolutely worth it, just be there early! The rest of the day I meandered through the National Mall and checked out all the sights there like Ulysses Grant Memorial, National Gallery of Art sculpture garden, Washington Monument which is really impressive from close up, World War II Memorial, Lincoln Memorial (which I assumed would be more impressive),  Vietnam Veterans Memorial, and the White House. On my 6 hour walk I also took tons of pictures from all these sights. An interesting note is that the National Mall was more or less deserted in the morning (from 8-10am). It gets more crowded after 10am when the museums open.

To combine history/monuments with some interesting notes I went to the Smithsonian National Air and Space Museum which is just awesome. It took me around 3 hours to see all the interesting space capsules, rockets, and airplanes. The museum gives you great details about everything man-made that flies and is absolutely worth a visit if you are even a little bit of a tech geek. The second museum I visited was the National Museum of National History which had great exhibits showing the evolution of mankind in particular and of all mammals and animals in general.

Combining two museum visits and all the other sights in one day was a lot so I headed over to 7th street and H street NW for dinner in Chinatown where I got good Pho (some form of Asian noodle soup with beef in it) and then headed to the movies to relax. All in all it was a great day with perfect weather to visit all the outdoor attractions and I had lots of fun.

On the next morning I only had a couple of hours to kill before getting the shuttle to the airport. So I headed over to the National Library of Congress and checked out the making of the constitution and all the other interesting artifacts they have on display. If you are a bookworm or interested in US history then the Library is worth a visit!

I flew out form Dulles International Airport and I actually booked a Super Shuttle to get me there. I already booked them a couple of times and it usually works out pretty well. But this time I got screwed over twice, first of all I made a mistake on my reservation, putting 506 H Street NW instead of 506 H Street NE on the form. The shuttle driver just went to the wrong part of the city and did not even bother to call me. After waiting on the street during the pickup window I waited for 10 more minutes in their phone loop until I was told that I have to make a new reservation and that the money is lost. So I made a new reservation with the correct address and waited for another 30 minutes. And the second driver/shuttle did not show up as well, so I got screwed twice and had to beg the hostel owner to drive me to the airport. Which worked out well in the end as we is an awesome guy and tries to help wherever he can, but I will never again take Super Shuttle in Washington D.C.!

So to summarize: Washington D.C. is a great and interesting city where you can reach all the major attractions by foot. It is one of the american cities that has a lot of history and is really worth a visit. Stay in Downtown Washington for the full experience and take public transport to get around!

Tuesday, April 12, 2011

Conference remarks: ISPASS 2011

As you might know I've been presenting at the IEEE ISPASS'11 conference in Austin, Texas. The conference went from April 10th to April 12th. If you are interested you can read my ramblings about the talks below.

Conference information
(David Brooks & Rajeev Balasubramonian)
75+ Registrations
64 submissions in total, 4 PC reviews/paper, accepted via consensus, 24 accepted papers.

Keynote: The Era of Heterogeneity: Are we prepared?
(Ravi Iyer, Intel)
Shift from client/server to smart devices (tablets, smart phones, ...)
Integrate GPU, IP into CPU core for power efficiency, it's no longer just about cores but also about the accelerators that we integrate into the CPU.
Why heterogeneity? Because the workloads are heterogeneous and one single solution (general purpose core) will not work. Small cores scale and good power performance, big cores are needed for single threaded performance. The talk sounds a lot like ISCA'09 where they proposed a heterogeneous architecture with one big core and multiple small cores.
Intel's Idea: SPECS (Scalability, Programmability, Energy, Compatibility, Scheduling/Management)

Questions for cores and accelerators are:
How to mix and prioritize heterogeneous cores? Should all cores have the same ISA (e.g., SEE)? How should we structure the cache architecture?

Solution for mixed ISA: Use a co-op model and run applications on any core. If we have an unsupported opcode exception on the smaller core than the OS must move the application to the bigger core. This sounds a little like Albert Noll's VM for the Cell. What about hardware tricks for context switches?

(Great talk!)


Session 1: Best Paper Nominees (David Christie, AMD)
Characterization and Dynamic Mitigation of Intra-Application Cache Interference
(Carole-Jean Wu, Margaret Martonosi, Princeton University)
Intra-Application cache interference is a challenging problem. Measure and characterize the cache behavior of applications. Paper uses two-folded measurements, Intel Nehalem using perfmon2 and Simics/GEMS for an artificial system. Measurements show that system cache lines are usually not reused (source: mostly TLB misses) so these misses pollute the application cache lines.
Propose new cache systems that adhere to the fact that cache lines from a system context are not reused as often as cache lines from a user context.

Questions: Measurement done on 64/32b system? Are there differences due to different page placement?


A Semi-Preemptive Garbage Collector for Solid State Drives
(Junghee Lee*, Uoungjae Kim, Galen M. Shipman, Sarp Oral, Jongman Kim*, Feiyi Wang, Oak Ridge National Laboratory, Georgia Institute of Technology*)
Block replacement strategies and how to cope with flash problems. Implement some form of GC for flash blocks.
Fast access speed but performance degradation due to garbage collection. New form of GC inside the SSD.


PRISM: Zooming in Persistent RAM Storage Behavior
(Ju-Young Jung, Sangeyeun Cho, University of Pittsburgh)
Block-oriented FS for PRAM. Not my field.


Evaluation and Optimization of Multicore Performance Bottlenecks in Supercomputing Applications
(Jeff Diamond, Martin Burtscher, John D. McCalpin, Byoung-Do Kim, Stephen W. Keckler, James C. Browne, University of Texas Austin, Texas State University, and NVIDIA)
Moore's law of super computing: scale the number of cores! Motivation for this paper: inter-chip scalability. New compiler optimizations for multi-core cpus, depending on cache-layout and coordination. Use AMD performance counters to measure HPC performance.
Most important performance options for multi-core are: L3 Miss Rates (cache contention), Off Chip Bandwidth, DRAM contention (DRAM page miss rates) (Great talk!)


Session 2: Memory Hierarchies (Suzanne Rivoire, Sonoma State University)
Minimizing Interference through Application Mapping in Multi-Level Buffer Caches
(Christina M. Patrick, Nicholas Voshell, Mahmut Kandemir, Pennsylvania State University)
Storage paper that handles a switched network with a complicated node hierarchy. The paper introduces interference predictors for the I/O route through the network and analyzes buffer cache placement. -ETOOMANYFORMULAS (for me)


Analyzing the Impact of Useless Write-Backs on the Endurance and Energy Consumption of PCM Main Memory
(Santiago Bock, Bruce R. Childers, Rami G. Melhem, Daniel Mosse, Youtao Zhang, University of Pittsburgh)
20-40% of energy consumption due to memory system. Use PhaseChangeMemory instead of DRAM. Low static power (non volatile), read performance comparable to DRAM, scales better than DRAM, bug high energy cost for writes and limited write endurance. Observation: a write-back is useless if the data is not used again later on. Use application information from allocator, control flow analysis or stack pointer. Focus: how many useless write-backs can be avoided using these metrics? 3 different regions analyzed: heap: use malloc / free; global: control flow analysis; stack: stack pointer.

What about DRAM, would that make sense as well? (Reducing the number of write-backs), e.g. for cache coherency in multi-cores?
His solution: application tells the HW which regions are dead / alive.


Access Pattern-Aware DRAM Performance Model for Multi-Core Systems
(Hyojin Choi, Jongbok Lee*, Wonyong Sung, Seoul National University, Hansung University*)
Latency between different banks, very low level/HW.


Characterizing Multi-threaded Applications based on Shared-Resource Contention
(Tanimo Dey, Wei Wang, Jack Davidson, Mary Lou Soffa, University of Virginia)
Check/measure intra-application contention and inter-application contention for L1/L2/Front side bus.


Session 3: Tracing (Tom Wenisch, University of Michigan)
Trace-driven Simulation of Multithreaded Applications
(Alejandro Rico*, Alejandro Duran*, Felipe Cabarcas*, Alex Ramirez, Yoav Etsion*, Mateo Valero*, Barcelona Supercomputing center*, Universitat Politecnica de Catalunya)
How to simulate multi-threaded applications using traces? Capture traces for sequential code sections, capture calls to parops but do not capture the execution of parops. Interesting but not my topic.

Efficient Memory Tracing by Program Skeletonization
(Alain Ketterlin, Philippe Clauss, Universite de Strasbourg & INRIA)
We want to get the minimum amount of code to reproduce the memory layout of an application. Instrumentation is expensive but useful as a baseline. To improve from there we need to find loops in binary code, try to recognize patterns and generate access sequences to remove instrumentation. Work on machine code and find register accesses movl %eax, [%ebx, %ecx, 8]
Program skeletonization extracts what is useful to compute the memory addresses.

Do you also track direct registers (e.g., the address computation happens before)? You decouple the memory recording and the application, so recording happens with loose correlation to the application. How do you handle threads/concurrent memory accesses? Exceptions? (Great talk!)


Portable Trace Compression through Instruction Interpretation
(Svilen Kanev, Robert Cohn*, Harvard University, Intel*)
If you are reliably able to predict a byte stream you do not need to record it


Reception & Poster Session
VMAD: A Virtual Machine for Advanced Dynamic Analysis of Programs
(Alexandra Jimborean, Matthieu Hermann*, Vincent Loechner, Philippe Claus, INRIA, Universite Strasbourg*)
Interesting work on LLVM that adds different alternatives and tries reverse compilation to turn, e.g., while loops into for loops and adaptively optimize them (for C/C++ code). Interesting work, maybe forward her Olivers' work


Performance Characterization of Mobile-Class Nodes: Why Fewer Bits is Better
(Michelle McDaniel, Kim Hazelwood, University of Virginia)
For netbooks 32bit code is faster than 64bit code. What kind of GCC settings did you use? Mention Acovea, also her masters is about padding, give her a pointer to my work.


Keynote II: Integrated Modeling Challenges in Extreme-Scale Computing
(Pradip Bose, IBM)
Exa-Scale Computing is 10^18 which is 100x peta-scale computing. What is the wall: power or reliability?
Power-wall: We need to reduce power needed in chips, dozens of cores per chip that are allowed to use 1/1000 of power. Idea: different processing modes: storage mode; turn of parallel cores, computing mode: turn off storage controllers, I/O. Reliability wall: MTTF and reliability drops with the increased numbers of transistors. Problem: with millions of cores/cpus MTTF is so low that super computers are not even able to complete linpack benchmarks between failures. MTTR/MTTF (mean time to repair vs. mean time to failure).


Session 4: Emerging Workloads (Derek Chiou, UT Austin)
Where is the Data? Why you Cannot Debate CPU vs. GPU Performance Without the Answer
(Chris Gregg, Kim Hazelwood, University of Virginia)
GPU computation is fast but data transfer from/to the GPU is a bottleneck. GPU speedup is misleading without describing the data transfer necessities.

Questions: What about algorithms with dual-use approach where the CPU does not idle during kernel? What about compression? (Great talk!)


Accelerating Search and Recognition Workloads with SSE 4.2 String and Text Processing Instructions
(Guangyu Shi, Min Li, Mikko Lipasti, UW-Madison)
STTNI can be used to implement broad set of search and recognition application, embrace newly available instructions to speed up classical algorithms. pcmpestri: packed compare explicit length strings return index. New instructions can be used for any data comparisons. Depending on data structure different algorithms are needed. Easy for arrays, tree structure need some B-Tree and similar handling as strings, for hash tables more complicated but resolve collisions with STTNI
What about aligned loads, or loop unrolling for this code? Example was a single static loop that did use unaligned loads (expensive) and no manual loop unrolling. Speaker only compared to GCC, not ICC or other compilers.


A Comprehensive Analysis and Parallelization of an Image Retrieval Algorithm
(Zhenman Fang, Weihua Zhang, Haibo Chen, Binyu Zang, Fudan University)
You shall not use Comic, Sans Serif, Courier, and Serif fonts on one slide!


Performance Evaluation of Adaptivity in Transactional Memory
(Mathias Payer, Thomas R. Gross, ETH Zurich)
My talk. See: https://nebelwelt.net/research/publications/ispass11/

Transactional memory (TM) is an attractive platform for parallel programs, and several software transactional memory (STM) designs have been presented. We explore and analyze several optimization opportunities to adapt STM parameters to a running program.

This paper uses adaptSTM, a flexible STM library with a non-adaptive baseline common to current fast STM libraries to evaluate different performance options. The baseline is extended by an online evaluation system that enables the measurement of key runtime parameters like read- and write-locations, or commit- and abort-rate.  The performance data is used by a thread-local adaptation system to tune the STM configuration. The system adapts different important parameters like write-set hash-size, hash-function, and write strategy based on runtime statistics on a per-thread basis.

We discuss different self-adapting parameters, especially their performance implications and the resulting trade-offs. Measurements show that local per-thread adaptation out- performs global system-wide adaptation. We position local
adaptivity as an extension to existing systems.

Using the STAMP benchmarks, we compare adaptSTM to two other STM libraries, TL2 and tinySTM. Comparing adaptSTM and the adaptation system to TL2 results in an average speedup of 43% for 8 threads and 137% for 16 threads. adaptSTM offers performance that is competitive with tinySTM for low-contention benchmarks; for high-contention benchmarks adaptSTM outperforms tinySTM.

Thread-local adaptation alone increases performance on average by 4.3% for 16 threads, and up to 10% for individual benchmarks, compared to adaptSTM without active adaptation.



Session 5: Simulation and Modeling (David Murrell, Freescale)
Scalable, accurate NoC simulation for the 1000-core era 
(Mieszko Lis, Omer Khan, MIT)
Yet another cycle accurate instruction simulator.


A Single-Specification Principle for Functional-to-Timing Simulator Interface Design 
(David A. Penry, Brigham Young University)
Desinging simulators. Problem: depending on the level of information that is needed there is a huge performance difference for simulators. Idea: define high-level interface and generate low-level interfaces that offer faster simulation automatically.


WiLIS: Architectural Modeling of Wireless Systems
(Kermin Fleming, Man Cheuk Ng, Sam Gross, Arvind, MIT)
Simulator for wireless protocols implemented in hardware (FPGA) for better/more accurate analysis.


Detecting Race Conditions in Asynchronous DMA Operations with Full-System Simulation (Michael Kistler, Daniel Brokenshire IBM)
Using heavy-weight simulation helps in finding DMA bugs for light cache protocols like Cell that have no explicit cache management. This work can also be used for the analysis of cache protocols. (Great talk!)


Mechanistic-Empirical Processor Performance Modeling for Constructing CPI Stacks on Real Hardware
(Stijn Eyerman, Kenneth Hoste, Lieven Eeckhout, Ghent University)
Analyze different types of architectures and compare performance and different HW features.


Session 6: Power and Reliability (Bronis de Supinski, LLNL)
Power Signature Analysis of the SPECpower_ssj2008 Benchmark 
(Chunghsing Hsu, Stephen W. Poole, ORNL)
Use many available measurements and analyze the signatures to develop a better predictor for different CPU models.


Analyzing Throughput of GPGPUs Exploiting Within-Die Core-to-Core Frequency Variation 
(Jung Seob Lee, Nam Sung Kim, University of Wisconsin, Madison)
Scaling of HW down to very small structures leads to new problems and characteristics.


Universal Rules Guided Design Parameter Selection for Soft Error Resilient Processors 
(Lide Duan, Ying Zhang, Bin Li, Lu Peng, LSU)
Reduce soft errors in processors due to an analysis of architectural weaknesses.


A Dynamic Energy Management in Multi-Tier Data Centers
(Seung-Hwan Lim, Bikash Sharma, Byung Chul Tak, Chita R. Das, The Pennsylvania State University)
How to save energy in data centers.


Final remarks
Jeff Diamond won the best paper award, no other remarks.

Friday, March 11, 2011

VEE conference ramblings

As you might know I've been to the VEE'2011 confernece in Newport Beach/LA in the last couple of days. If you are interested in more information about the talks then you can read my notes below.


Conference details:
In total there were 84 abstracts, 64 full submissions, and 20 papers selected for presentation.
Corporate sponsors are: VMWare, Intel, Google, Microsoft Research, IBM Research.

Keynote:
Virtualization in the Age of Heterogeneous Machines, David. F. Bacon
(IBM Research, known for thin locks, http://www.research.ibm.com/liquidmetal/ )

Motivation:
It's the multicore area! But what about performance? Three different models of computation exist; CPU: general purpose, GPU: wins at gflops/$ (raw power), FPGA: wins at gflops/$/watt. The drawback is that they are heterogeneous. A possible solution would be to virtualize these heterogeneous systems.
There were basically 2 original ideas in computer science: Hashing & Indirection, all else a combination of those. Virtualization can be categorized in the indirection area. There are two forms of virtualization, namely System VM: Virtualize Environment (VMWare, QEMU - diff machines) and Language VM: Virtualize ISA (MMAME, QEMU - diff architectures). The current VM model usually is the accelerator model, send stuff from CPU to GPU/FPGA for computation, get nice chunk of data back.

What is the solution to get over this heterogeneity? Use virtualization!
David introduces LIME: Liquid Metal Programming Language, a single language with multiple backends: CPU, GPU, WSP, & FPGA. This new single language compiles down to different architectures. CPU backend must compile any code, all other backends can decide not to compile that piece of code; e.g., code that is not deeply pipelineable (with increased latency) can be rejected by the GPU compiler. Approach for FPGA uses an artifact store that has solutions for common problems. These artifacts are then stitched together to form the compiled program, otherwise the compilation overhead would be way to large. LmVM: Lime Virtual Machine is introduces as an implementation of the LIME principle. Code originally starts on the CPU and evolves (or can be forced to evolve) to other platforms.

The programming approach is as follows:
Java is a subset of Lime. A programmer starts with a Java program and extends it with different Lime features. Many new types are introduced in lime to adhere to the hardware peculiarities in the different machines. [Insert long and lengthy discussion about language features here].

Performance is evaluated using the following scheme: write 4 benchmarks and 4 different versions of each benchmark to compare the different implementations. Baseline is a naive Java implementation. This baseline is compared to a handwritten expert implementation and the automatic Lime compilation.

Total man power needed to develop this approach: 8 man years.

Session 1: Performance Monitoring
Performance Profiling of Virtual Machines
(Jiaqing Du, Nipun Sehrawat and Willy Zwaenepoel, EPFL Lausanne)
PerfCTR only incur low overhead, a lot faster than binary instrumentation. The drawback is that support for virtual machines is missing. There are three different profiling modes: native profiling (os<->cpu), guest-wide (os<->cpu, without VMM, only guest is profiled), system-wide (os<->VMM<->cpu, both VMM and guest is profiled). Implement performance counters for para-virtualization (Xen), hardware assistance (KVM), and binary translation (QEMU) for both guest-wide profiling and system-wide profiling.

Challenging problems for guest-wide profiling is that the context must be saved for all context switches (e.g., client 1 to VMM, VMM to client 2). The overhead of the implemented approach is low, about 0.4% for the additional counters in all cases. Native overhead in contrast is about 0.04%, so the additional VMM increases the overhead by 10x. An analysis of the accuracy shows that the deviation increases for virtual machines but are still very low for compute-intensive benchmarks. For memory intensive benchmarks QEMU has a lot higher cache miss rate due to the binary translation overhead.

Questions: What about profiling across different VMs? (in the VMM?) Is PEBS supported?


Perfctr-Xen: A Framework for Performance Counter Virtualization
(Ruslan Nikolaev and Godmar Back, Virginia Tech Blacksburg)
Perfctr-Xen as an implementation for performance counter virtualization using the perfctr library in Xen. This removes the need for architecture-specific code inside the Xen core to support PMUs. Two new drivers needed to be implemented: Xen Host Perfctr driver, Xen Guest Perfctr driver, and Perfctr library needed to be changed as well.

Questions: What kind of changes in the user-space library and why Xen guest driver are needed? Is PEBS supported? Ruslan did not convince me with his answer to the questions.


Dynamic Cache Contention Detection in Multi-threaded Applications
(Qin Zhao, David Koh, Syed Raza, Derek Bruening (Google), Weng-Fai Wong and Saman Amarasinghe, MIT)
The motivation of this talk is to detect cache contention in multi-threaded applications (e.g., false sharing between arrays across multiple threads). Use dynamic instrumentation to keep track of single memory locations using a bitmap and shadow memory. The ownership bitmap for each cache line stores ownership of individual cache lines for each of up to 32 threads. If a thread that accesses a cache line is not single owner then we have a potential data sharing problem. Depending on the performance counters we can detect cache contention. Implementation is on top of Umbra which uses DynamoRIO.

Questions: What tool did you use for BT? How can you know that you measure the real overhead and not some distortion through the instrumentation interface?


Session 2: Configuration
Rethink the Virtual Machine Template
(Kun Wang, Chengzhong Xu and Jia Rao)
Main objective of is to reduce the startup overhead of system images down to 1sec. Problem is that the overhead of VM creation is large. Cloning copies files and other solutions are limited to the same physical machine. Idea is to concentrate on the 'substrate' of the virtual machine and only concentrate the smallest possible state (e.g., app/os state). This small substrace can then easily be copied to other machines and restored to full online VM images.


Dolly: Virtualization-driven Database Provisioning for the Cloud
(Emmanuel Cecchet, Rahul Singh, Upendra Sharma and Prashant Shenoy, UMASS CS)
Emmanuel used the tagline "Virtual Sex in the Cloud". This work tries to solve the problem of adding database replicas to organize the load on database backends. Problems are that the VM can not just be cloned but a complete databse backup and restore must be carried out so that the DBs can be synced across different nodes. It is hard to replicate state and e.g., instantiate a consistent copy/replica of a database. When the replica is actually ready it misses a couple of updates that happened during the process of generating the replica. The idea of this work is to dynamically scale database backends in the cloud by generating new VM clones and using DB restore in the background. Parameters are snapshot intervals, update frequency of the database which in turn describes the size of the replay log that must be recovered. The evaluation part contains a detailed analysis of different predictors of when to take snapshots, what the replay overhead from the snapshot to the current state is, and how much it would cost on Amazon EC2.


ReHype: Enabling VM Survival Across Hypervisor Failures (Highlight)
(Michael Le and Yuval Tamir, UCLA)
VMM is single point of failure for VMs (due to hardware faults or faults in the virtualization software). The problem also is that system reboots (of the host) are too slow. ReHype detects failures and pauses VMs in place. VMM is then micro rebooted. Paused VMs are then integrated into the new VMM instance and unpaused. Related work 'Otherworld' reboots the linux kernel after a failure and keeps processes (applications) in memory. ReHype recovers a failed VMM while preserving the sates of VMs. Possible VMM failures are crash, hang, or silent (no crash/hang detected but VMs fail). Crash: VMM panic handler is executed, hang: VMM watchdog handler is executed. The system was evaluated using faul injections into the VMM sate.

Can only recover from software failures, HW is still the same so persistent HW failures are not protected. Logs are not kept to fix the bugs later on. But in theory this system could also be used to upgrade VMMs.

Questions: What about the size of the system (LOC)?


Session 3: Recovery
Fast and Space Efficient Virtual Machine Checkpointing
(Eunbyung Park, Bernhard Egger and Jaejin Lee, Seoul University, South Korea)
Checkpointing can be used for faster VM scaling, high availability, and debugging/forensics. Checkpoint stores volatile state of the VM. A large part of the snapshot data does not need to be saved, e.g., file cache in the Linux kernel. Goal is to make checkpointing faster and to reduce these redundant pages and remove them from the snapshot. A mapping between memory pages and disk blocks is added to the VMM. Problem: how to detect dirty/written pages in memory? It is necessary to check the shadow page-table of the guest. Result: 81% reduction in stored data and 74% reduction in the checkpoint time for para-virtualized guests, 66% reduction in data and 55% reduction in checkpoint time for fully-virtualized guests.


Fast Restore of Checkpointed Memory using Working Set Estimation (Highlight)
(Irene Zhang, Yury Baskakov, Alex Garthwaite and Kenneth C. Barr, VMWare)
Reduce time to restore a checkpoint from disk. Current schemes: (1) eager restore; restores all pages to memory. (2) lazy restore; restores only the CPU/device state and only restores memory in the background. If the guest accesses pages that are not yet restored then the VMM must stop the VM and restore that specific page (this can lead to trashing). How to measure restore performance? Time-to-responsiveness measures time until VM is usable. How big is the share of the restore process of the total time? (Mutator utilization, comes from GC communicty that measures GC overhead).

New feature: working-set-restore that prefetches the current working set to reduce VM performance degradation. Working set is estimated using either access-bit scanning or memory tracing. The memory tracing runs alongside the VM all the time (overhead around 0.002%) and keeps track of the working set. When the checkpoint is restored then this set of pages is restored first and the VM is started at the point the checkpoint was taken, so all the pages will be accessed again. (Of course no external I/O may be executed during the checkpointing).

Question: Do you do any linearization of the list of the pages that need to be restored? I/O allowed during lazy checkpointing? Working-set-predictor and peeking into the future?


Fast and Correct Performance Recovery of Operating Systems Using a Virtual Machine Monitor
(Kenichi Kourai, Institute of Technology, Japan)
A reboot is often the only solution to get over a fault in the system. After a reboot performance is still degraded due to many page misses. A new form of reboot with a warm page cache is proposed. The page cache is kept in memory and can be reused after a reboot. A cache consistency mechanism is added to keep track of the caching information.

Saving a couple of seconds after a reboot leads to constant overhead during the runtime due to the implementation. Does this really make sense? Reboots should be infrequent for servers. Does it really make sense to keep the page cache around?


Session 4: Migration
Evaluation of Delta Compression techniques for Efficient Live Migration of Large Virtual Machines
(Petter Svard, Benoit Hudzia, Johan Tordsson and Erik Elmroth, Umea University Sweden)
A problem with current solutions is that more pages can turn dirty than pages are transferred to the other host. At one point in time the VM is stopped and the remaining pages are transferred. This leads to a long downtime. Depending on the transfer link it makes more sense to compress, transfer, decompress than to just transfer pages because compression and decompression is faster than the transfer of the full uncompressed page. A special remark is that only pages that were already transferred are compressed. If a page was already transferred and is in the cache of the sender and turns dirty then the delta is constructed, compressed, and sent. Otherwise the plain page is sent. Petter did some live demos.


CloudNet: Dynamic Pooling of Cloud Resources by Live WAN Migration of Virtual Machines (Highlight)
(Timothy Wood, KK Ramakrishnan, Prashant Shenoy and Jacobus van der Merwe)
Problem: cloud resources are isolated from one another and the enterprise. The interesting question is how to manage these different isolated machines and how to secure data transfers between the different machines and across multiple data-centers. Use VPNs to connect different data centers and use common migration tools.


Workload-Aware Live Storage Migration for Clouds
(Jie Zheng, T. S. Eugene Ng and Kunwadee Sripanidkulchai, Rice University)
Storage migration in a wide-area VM migration contributes the largest part of the data that needs to be transferred. No shared file storage is available, so disk image must be synchronized somehow (based on block migration).


Session 5: Security
Patch Auditing in Infrastructure as a Service Clouds (Highlight; Read paper)
(Lionel Litty and David Lie, VMWare / University of Toronto)
Apply your patches! But not everybody does it. Even automatic patch application is not a solution. Also monitoring on the OS level is not continuous or systematic, different applications have different update mechanisms. There is need for a better tool to automate the update mechanism and to monitor the vulnerable state of systems. Additional challenges are VMs that might be powered down or unavailable to the infrastructure administrator. Solution: add patch monitoring to the VMM infrastructure and report to a central tool. Use VMM to detect application updates (binary and text only) and analyze different patches. Use executable bits to detect all live executed code on host VM. Check that executed code is OK.
Patagonix (only binary code detected) -> P2 (extended executable code (bash script, python, executable) detected).


Fine-Grained User-Space Security Through Virtualization
(Mathias Payer and Thomas R. Gross, ETH Zurich Switzerland)
My talk. See my paper for details.


Session 6: Virtualization Techniques
Minimal-overhead Virtualization of a Large Scale Supercomputer
(Jack Lange, Kevin Pedretti, Peter Dinda, Patrick Bridges, Chang Bae, Philip Soltero and Alexander Merritt, University of Pittsburgh)
Palacios (OS-independent embeddable VMM) and Kitten (lightweight supercomputing OS) for HPC. Key concepts for minimal overhead virtualization are that (1)I/O is passed through, e.g., direct I/O access with no virtualization overhead; (2) virtual paging is optimized for nested and shadow paging; (3) preemption is controlled to reduce host OS noise. The VMM trusts the guest (e.g., to do DMA correctly). Bugs in the guest could bring down the complete system. Symbiotic virtualization as new approach that uses cooperation.


Virtual WiFi: Bring Virtualization from Wired to Wireless (Highlight)
(Lei Xia, Sanjay Kumar, Xue Yang, Praveen Gopalakrishnan, York Liu, Sebastian Schoenberg and Xingang Guo, Northwestern University)
New approach to virtualization that enables wifi virtualization. One phyisical WiFi interface is virtualized and can be used in multiple VMs. Current approach is to virtualize an ethernet device inside the GuestVM. This strips all the wifi functionality. The new approach virtualizes complete wifi functionalities in the VM. The same Intel Wifi driver is used in the GuestVM as is used in the HostVM. Each VM gets its own vMAC, HostVM distributes packets according to vMAC, all other capabilities are directly forwarded to the VMs and can be set by the VMs as well.

Questions: Promiscuous? Rate limited? Multiple vMACs supported in VM as well?


SymCall: Symbiotic Virtualization Through VMM-to-Guest Upcalls
(Jack Lange and Peter Dinda)
SemanticGap: loss of semantic information between HW and emulated guest HW and guest OS state is unkown to VMM. Two approaches to find out about guest: BlackBox: Monitor external guest interactions, GrayBox: reverse engineer guest state.
Symbiotic Virtualization: design both the guest OS and the VMM to minimize the semantic gap. But also offer a fallback to blackbox guest OS. SymSpy passive interface uses asynchronous communication to get information about hidden state and SymCall that uses upcalls into the guest during exit handling.

SymSpy: uses a shared memory page between the OS and the VMM to offer structured data exchange between VMM and OS
SymCall: similar to system calls. The VMM requests services from the OS. Restrictions: only 1 SymCall active at a time, SymCalls run to completition (no blocking, no context switches, no exceptions or interrupts), SymCalls cannot wait on locks (deadlocks).

SwapBypass is an optimization that pushes swapping from the guest to the VMM. SwapBypass uses a shadow copy of the page tables of the guest VM. The VM does not swap out any pages and caching/swapping only happens in the VMM but never in the guest VM to reduce I/O pollution. Page fault happens in VMM and not in host.



Session 7: Memory Management
Overdriver: Handling Memory Overload in an Oversubscribed Cloud (Highlight)
(Dan Williams, Hani Jamjoom, Yew-Huey Liu and Hakim Weatherspoon, Cornell University)
Peak loads are very rare and utilization in data centers is below 15%. But on the other hand peak loads are unpredictable and oversubscription can lead to overload. Memory oversubscription is kind of critical because overload carries a high penalty due to swapping costs. The focus of this work is to research if the performance degradation due to memory overload can be managed, reduced, or eliminated.

Analysis of different memory overloads shows that most overload is transient (96% are less than 1min), some overload is sustained (2% last longer than 10min). Two techniques used to address memory overload: VM migration (migrates VM to another machine), and network memory that sends swapped pages not to disk but to another swapping machine over the network. Network Memory may be used for transient overloads and VM migration for sustained overloads.

OverDriver uses network memory and VM migration to handle overload. OverDriver collects swap/overload statistics for each VM. Use overload profiles to decide when to switch from network memory to VM migration.

Question: Decision on when to migrate is static, what about adaptive checks/analysis for migration? What other predictors could you use? (Sounds like future work)




Selective Hardware/Software Memory Virtualization
(Xiaolin Wang, Jiarui Zang, Zhenlin Wang, Yingwei Luo and Xiaoming Li, Peking University)
3 possibilities for memory virtualization: MMU para-virtualization, shadow page tables, and EPT/NPT. Idea: use dynamic switching between hardware assisted paging and shadow paging. Question: how and when to switch?



Hybrid Binary Rewriting for Memory Access Instrumentation (Highlight; Read paper)
(Amitabha Roy, Steven Hand and Tim Harris, University of Cambridge UK)

Scaling inside multi-threaded shared memory programs can be problematic (scalability, races, atomicity violations). Run existing x86 binaries and analyze synchronization primitives (locks). Dynamic binary rewriting used to analyze lock primitives.

Hard to decide statically if lock is taken or not. Either overinstrumentation or unsound. Therefore dynamic BT is needed.

Hybrid binary rewriting uses static binary rewriting and dynamic binary rewriting as a fallback. A persistanc instrumentation cache (PIC) is stored between different runs of the same program. So the translated code can be reused.

HBR used for two case-studies:
1. Profiling: interested in understanding how suitable programs are for applying STM transformations
2. Speculative Lock Elision: remove locks and turn them into stm_start, stm_commit, and instrument reads and writes. STAMP used to evaluate this dynamic instrumentation. Problem is that STAMP uses private data that is accessed inside transactions and there is manual optimization for STMs that get rid of the additional read and write operations. Dynamic instrumentation instruments all reads and writes and has bad performance for these cases. Private Data Tracking uses a special tracking of private data to reduce the amount of instrumentation and reduces the overhead to reasonable numbers.

Question: Static binary rewriting: no runtime overhead (no translation overhead), but there can be artifacts/overhead through the translation process. Translation overhead for DBT is <1% What about hierarchical transactions?


Peek into the future:
VEE 2012 will be in London, UK, general chair will be Steve Hand. VEE'12 is colocated with ASPLOS again, Saturday 3rd of March and Sunday 4th of March.

Sunday, February 20, 2011

The fbviews.org worm or how to collect user data and make money

A couple of hours ago I read a post from a friend on Facebook that said "Secret tool shows who stalks your pics". The text was followed by a shortened link (tweet, anyone?).

As I opened the link (in an incognito browser window of course) I was greeted by instructions to copy some JavaScript code into my address bar.



Hm, this smells phisy...

The JavaScript that one has to copy to the address bar crates a script element that downloads a JavaScript file from some drop-box and executes that JavaScript file. So the next thing I did was to download script.

Well, the script was (somewhat) obfuscated as many/most function names were replaced with array accesses and all strings used in the code were placed in the array as well. The array is stored as raw-hex-values. So I decoded the values and de-obfuscated the JavaScript.

The JavaScript does the following:

  • Display a nice message that it is analyzing your profile (and your stalkers)
  • Posts a (spam) message to your wall (with a random message and a link to the tool)
  • Adds you to the group "Music Makes me High." (127901437283104)
  • Adds you to the group "I Hate it when I can't fall asleep because I'm thinking." (165991450116555) 
  • Adds you to the event with the number 168046893242650
    • I was unable to access this event - it might have been deleted
  • Finds 15 of your friends and posts to a (spam) message their wall (another random message)
  • Checks all your registered entries/pages for "Facebook Insights" and 
    • adds two new admin email addresses (lethaburbach890@yahoo.com and wintersaccohoqr@hotmail.com)
    • writes a (spam) message to the wall of the page
  • Redirects you to a page that shows some fake results and tries to get you to fill out some "Human Verification tests".
    • I assume that's where they make the money


So if you got tricked by this worm try to delete all messages, leave the two groups that you were added into and check all your pages and remove redundant admins!


What are the details from the code analysis:
  • The following functions are available:
    • _88xuhyr: decrypts an array of values and executes the resulting string as JavaScript
    • addAdmin: adds a new admin to a pageid in Facebook Insights
    • makePost: writes a post to a friends wall
    • update: fire the Ajax request
    • loading: display an image to calm the user
  • The original code was somewhat weird, two functions (addAdmin, makePost, and update) were defined two times (looks like a copy-paste error before the obfuscation)
  • A function (_88xuhyr) is not used in the source code. It apperas as if they intended to obfuscate the code even further but forgot about it.
  • makePost has unused arguments

The following messages are used to generate spam messages:
  • Wow! Seems like lots of people stalk me - http://goo.gl/lfDvG
  • New FB tool shows who stalks your profile-- http://goo.gl/NHAlt
  • Secret tool shows who stalks your pics http://tinyurl.com/48jd66w
  • Insane! Awesome tool to see who looks at your pics >> http://goo.gl/3Nt6T
  • According to http://ow.ly/3Zy2Z you're my top stalker. Creep.
  • Secret tool shows who stalks your pics - http://goo.gl/NMclq
The possible subjects are:
  • Check this out!
  • Hey, whats happening?
  • Hey! This is awesome

The final landing pages are:
  • http://goo.gl/lfDvG -> http://thefbcreeper.info/
  • http://goo.gl/NHAlt -> http://profileviewers.info/
  • http://tinyurl.com/48jd66w -> http://thefbcreeper.info/
  • http://goo.gl/3Nt6T -> http://profilechecker.info/
  • http://ow.ly/3Zy2Z -> http://valcreepers.tk/
  • http://goo.gl/NMclq -> http://profilechecker.info/

Another fun fact is, that all landing pages use the same Google Analytics account (UA-21407597-1) for the domain .fbviews.org.

When the script finishes it redirects the user to a landing page on fbviews.org (http://fbviews.org/result.php) that display some fake results and tries to trick the user into entering some data.

The worm is a nice peace of JavaScript code that is somewhat obfuscated and tries to spread on Facebook. It would be interesting to get to the data that is stored in this Google Analytics account. According to the pages were the users are automatically added more than 16k people have already fallen for that trap (and counting).

btw, you can download the de-obfuscated JavaScript on nebelwelt.net.