try another color:
try another fontsize: 60% 70% 80% 90%
Overdetermined

Open Source

Introduction to R - Goals

Today, I want to talk briefly about my goals for this series. I also want to be very clear about what I am not going to do here. Behind the scenes, I am developing a realistic, but fake dataset for us to play with in the coming weeks.

This series, "An Introduction to R" will take you through the basic steps used by pollsters all over the country when they receive polling data. These steps include importing the data, building crosstabs (weighted and unweighted), and running some basic statistical tests. Later in the series, I will show you how to format the data for printing.

This is not going to be a comprehensive review of R. For starters, several already exist:

I have read both of these texts and they are terrific. If you are following this series and want to get the most out of it, you will want to read these texts. I will make sure that I refer to relevant chapters where appropriate. At the beginning of my future posts I will also highlight the commands you will need to master in order to complete the exercises. I will also provide you with example data sets and code to help you complete the exercises, but I will not review each and every command. To do so would be A) repetitive/tedious (see above) and B) Boring as #$^%$. I would rather use our time together focusing on the problems unique to working with the kinds of data commonly found in the social sciences.

Until next time

--pluribus

Red Hat for Christmas

Open Source tools promote Freedom. Proprietary tools do not. Of course, companies that intentionally build tools that promote Freedom may also be interested in other progressive ideals. Red Hat, the company that produces the Fedora Linux distribution is one such company. 

Red Hat Cancels Party;  will feed needy instead

The corporate culture at Red Hat has changed over the years. This year, Red Hat scaled back it's plans for their corporate Christmas party. The savings will be donated to a charity, Feeding America. They expect the donation will enable Feeding America to provide roughly 800,000 additional meals. That's a lot of food. Other offices are involved in coat drives and canned food drives.

With the economy in a tail-spin, this change in corporate culture is not only timely and appropriate; it shows how the values of community and collaboration created by the open-source development process can also result in a different style of corporate capitalism. I only wish stories like this got more attention in the main stream media.

--pluribus

Introduction to R - Interface

Introduction . . . .

In my Installation article, I mentioned that R does not provide a graphical user interface. By default, this is true. On Linux, the default R interface looks like this:

There's more.

Neat free/low-cost tool for diagramming

Update: this post originally referred to Gliffy as open source, which it isn't.

Let's say that you need to build a massive diagram of a database and need to share it with people, and allow them to modify, collaborate, etc.  You can either pay a lot of money for a proprietary tool like Visio, or you can do the smart thing and use Gliffy.

That is all.

DD

Introduction to R - Installation

Things I am thankful for - 2

This is the second, and more serious entry in my mini-holiday series "Things I am thankful for". Today I want to discuss why I am so thankful for Free software. Those of you who have been reading my stuff over the last couple of days may have noticed that I have this funny habit of capitalizing the F in free. This is not a habitual typo. It is in fact, intentional and instructive.

In the Free Software Movement, there is a differentiation between free software and Free software. The latter is often referred to as Free/Libre Software. The former costs nothing, but comes with other, hidden costs. There are many sources of free, proprietary software. The flash player from Adobe and iTunes are two obvious examples of free software. You can download this software for no cost, but it is still proprietary. If you'd like to read a detailed analysis of the iTunes license, read this. Here's the short version - you aren't allowed to do anything that Apple doesn't want you to do. You may not look at the source code or understand how iTunes operates. So, although iTunes is does not cost anything, it comes with many restrictions. Download.com is a great place to find  some of this software. I am not trying to say that this is bad software. In fact, much of it is actually very high quality. But, you need to remember the hidden costs and restrictions that come with using this kind of software.

In contrast, we can choose to use Free software. This is software that is developed in an open-transparent manner. We all have the right to see the source code and participate in the development of the end product. We can re-use the good ideas, and we can create our own custom version of a product that better suits our needs. These are Freedoms that no proprietary product will ever offer you. And, I am thankful for these Freedoms.

Fortunately, there are also lots of places to find Free software. Head over to Sourceforge, one of the Internet's largest repositories for Free software. Look around. There are literally THOUSANDS of software titles to choose from. Text editors, databases, video games. Free software is about choices AND Freedom. I'm thankful for both the choices and the Freedom, but today I really want to focus on the Freedom part.

Richard Stallman, one of the early pioneers of the Free Software movement, defined the Four Freedoms of Free software. For you history/political buffs, this is a clever fork of FDR's famous Four Freedoms.  Stallman's Four Freedoms famously start on 0 and increment to 3. Very geeky. But, this is the guy who developed Emacs, the greatest text editor/operating system in the world.

  • The freedom to run the program, for any purpose (freedom 0).
  • The freedom to study how the program works, and adapt it to your needs (freedom 1). Access to the source code is a precondition for this.
  • The freedom to redistribute copies so you can help your neighbor (freedom 2).
  • The freedom to improve the program, and release your improvements to the public, so that the whole community benefits (freedom 3). Access to the source code is a precondition for this.

I want you to look at these four freedoms. I want you to look at them carefully. Now compare them to the End User Lices Agreements you agreed to when you installed ANY proprietary product. You and I both know you can't re-distribute copies of Microsoft Office or SPSS. We also know it is impossible to fundamentally alter the way these programs operate, even if it is just to improve the tool. SPSS is a great example. Even though it is a statistics tool, information detailing the algorhythms it uses. Proprietary software is developed in secret and lacks the transparency that is inherent in any healthy Free software product.

Free software is a game changer in the IT industry. This should be no less true in the political realm. I promise you that the software you will see promoted on overdetermined.net is Free software not free software. The products we promote will be Free software. The code we develop is available under a Free license too. In the coming weeks I will try to write up some information about various Free software licenses and how these may be relevant to your use of the software, but for the moment let it suffice to say that the stuff we promote is Free as in Freedom/Libre.

Of course, all this software does cost real resources to develop. It takes man-power and it takes time and resources. So, although we will promote the cost of Free software, I want us to all promise to focus on the Freedom this software offers us too. And, if your campaign or organization has a little cash left over and wants to support the continued development of Free software, consider donating to the Free Software Foundation or purchase a service contract from a company such as Canonical or Red Hat. The not-for-profits that develop this software and the companies that bundle it are vital to the continued development of the Linux ecosystem. I know the economy is in a tail-spin, but let's make sure we all have something to be thankful for.

Unitl next time . . . .

--pluribus

Pluribus

E Pluribus Unum
Out of Many, One
    --Official Seal of the United States

I am pleased to introduce a new ongoing column here at overdetermined.net focused on the R Project for Statistical Computing (I'll just call it R). R is an interactive programming environment, based on the same high-level "language" the commercial product S-Plus uses. Many of you are probably familiar with tools such as SPSS or Stata. Although these are nice tools; they are expensive, proprietary products. A base copy of SPSS costs nearly $7000, and that does not include an operator! The open source community can provide your campaign or organization (and the competition) with a full suite of enterprise-grade tools for $0.00. How many bumper stickers and t-shirts can you print with an extra $7000? For smaller campaigns and not-for profits, purchasing proprietary software (of any sort) is simply not a good use of funds when there are free software packages capable of doing the same work. I look forward to introducing you to a few of these tools.

We will start slowly, and build on what we learn. Starting next week I will begin a series titled: "Introduction to R". We will learn how to install R and use it to perform very basic tasks. Future series will build on these skills. Although R is an interactive programming environment, I do not expect you to have any programming experience. When necessary, I will provide a primer on important topics or skills that are relevant to the task at hand. All you need to do is follow the examples and try out the homework which I will present as a sort of puzzler from time to time. Extra credit to those who get the right answer!

Although most of my efforts here at overdetermined.net will focus on R, my interests are wide ranging and often geeky. I will introduce you to other interesting sources for statistical data and show you how to use R (or other open-source tools) to learn something interesting about our nation. I also enjoy debunking "sophisticated" efforts to use statistics to support completely non-sensical policy positions. Mark Twain was right, statistics are often used to tell the worst sort of lies. It is my professional obligation to club those individuals over the head with a good dose of reality. And, I must warn the ideologically blindered that I can be quite non-partisan in my drubbings when provoked!

I'm glad to be here and I look forward to getting to know all of you better in the future.

Until we meet again . . .
--pluribus

Ubuntu Linux - Obama Edition

We'd be remiss if we didn't note that we're not the only ones that think open source products are the way to go for stable, usable and powerful systems. The Obama campaign used Ubuntu 8.04 in their campaign offices across the nation. (Though it seems that Barack Obama tends to favor Macs himself.)

In Defense of RealClearPolitics

Those of you following along at home may notice that this post has introduced some new categories to the list, and that's because there's no really easy way to categorise this.  Basically, not too long ago, two of the entities listed on this site as Inspirations, RealClearPolitics and FiveThirtyEight, got into a major slapfight over different methodologies, transparency and whether or not one of them was committing major fraud in an effort to drive the media.

Here's the context: Nate Silver wanted to know why RCP wasn't including the Research 2000 polls commissioned by Daily Kos, but would include the polls by the Associated Press.  Silver argued that it was because the R2K polls were showing massively favorable Democratic results, while the AP was finding better results for the Republicans.  Since the editorial position of RCP is Republican, he argued that they had a vested interest in promoting better numbers for McCain, and that he had caught them doing it.

As much as I like Nate, though, I think that he's wrong, but he has managed to touch on one of the most fascinating things about the internet: the way that professionals and pitted against knowledgable impassioned people, and how this results in different models of information dissemination.

There's more...

Cool Tools: Introduction

As an analysis site inspired by the principles of the open source movement, our job would be incomplete if we didn't provide our readers with the tools to look at the numbers on their own. What good is all this talk about analytical openness if you can't take a look at our analysis yourself? So it is in that spirit that this column will cover free and open-source tools to allow interested individuals, cash-strapped local campaigns, or really anyone with the desire to do some number-crunching on their own to participate.

Syndicate content