LiveOps is a small, profitable and quickly growing company that provides
very sophisticated telephony services with world-class reliability and scalability. While our server count is not enormous, the work those servers do is very sophisticated. The code base is large and extremely diverse.
My primary job is to lend technical support to any other areas in Operations that needs it, including Network Engineering, Linux System Administration, Release, DBA and others. This support comes in two primary forms: tactical, hands on expertise in any of those areas, and intense, strategic development of custom code that provides automation, fault management, performance management and correlation.
My administration experience at Wal-Mart was notable, but it was not the focal point of my work. At LiveOps, we live, breathe and think Linux. Also, my MySQL experience at Wal-Mart was good and useful, but the MySQL infrastructure at LiveOps contains many billions of rows of data, spread across scores of really large Linux systems. I have learned much about practical MySQL and Linux since April 2009.
When there aren't pressing operational issues at hand, my primary focus is monitoring and automation. Our existing Nagios infrastructure was aging and very out of date. In the last 12 months, I have constructed a custom agent based monitoring system that has excellent and flexible configuration, correlation and escalation capabilities. This system is a framework that is also providing many other important and interesting system management applications, including:
- A system that automatically uses git to track all changes to all files in a number of critical Linux system directories.
- High performance, custom modifications to the installed system shell for security auditing purposes.
- An easily configurable and scalable gateway software package that allows the incremental export of whatever system and network events we wish to expose to partner companies.
- An easy to use web based GUI front end to our existing change management system that allows anyone to create service requests.
- A framework that allows powerful and highly flexible timed-based escalation. For a given event that does not clear, any number or escalation actions can occur on a given time line.
I lead (in the technical sense, not manage) the Network Management team
at Wal-Mart Stores, Inc. Our team's job was to enable the 70+ Network Engineers
and our Network Operations Center to manage 150,000+ routers, switches, APs
and various other equipment that make up Wal-Mart's global network.
These devices are running in more than 15 countries all over the world.
In essence, we wrote code to automate the process of managing this vast
array of devices. No commercial Network Management tool had ever worked
well for us, so almost everything was custom written. My team had six
members.
For most of my time at Wal-Mart, the company did not allow Linux. We had,
over the years, installed a number of 'pirate' Linux boxes. Once Wal-Mart
officially supported Linux, we moved most of our development to a Linux-based
infrastructure. These servers
provided a huge boost of performance over the existing array of HPUX and AIX systems.
Until that point, we had been responsible for all management, hardware and OS, of
these Linux boxes.
We also used MySQL for our database back-end since 1997. In 2009
we had 200-300 million records in our database, replicated in real-time
among 9 systems running on three different OS' in all of our data centers.
This made our databases more redundant than anything else inside of
Wal-Mart.
We interfaced with many different teams in Information Systems because of the
unique data that we collected. We were forced to write a custom layer-2
discovery engine because nothing commercial suited our needs. The data
this engine collected was widely distributed among various areas in the
division, because it was accurate and fairly timely.
As of 2009 I was serving as project manager, senior architect and lead
developer of various sub-groups of my team in a variety of development
efforts. These were some of the things I did on a day to day basis:
- Write code and design specifications for others to write related code.
- Oversee various projects to make sure they stay on track.
- Work issues inside and outside of our immediate team, and assist other
members of my team to do the same.
- Evaluate 3rd party products for their suitability in our environment.
- Formally and more often informally meet with other teams to understand
their needs and wants.
- Work with other Network Engineering teams to derive switch and router
configurations for a whole variety of corporate needs.
- Train and develop members of my team and adjacent teams.
I was a Computer Programmer in the 81st Medical Group based at Keesler
Medical Center, Keesler Air Force Base, Biloxi, Mississippi. I was there
from 1993 until 1997.
My title notwithstanding, I was an extreme jack of all trades. I was the
prime administrator of our Unix and AOS/VS systems at the hospital, as well
as the primary Network Administrator, as well as the Computer Security
Manager. Day to day, I would:
- Configure Cisco routers and various other hubs and switches
- Fully administrate a whole raft of different UNIX variants, from Linux
to Solaris to DG/UX to SCO to about anything else out there.
- I did a lot of very early web programming as customer projects.
- Make sure our servers were secure from physical and network attacks.
We used HP Open View and some other commercial programs to do the basic
network management. I wrote a lot of other programs to automate everything
from installing new Windows 95 PCs to watching our Squid cache log for
unauthorized surfing.
Starting from where I am placed in an organization, and given the appropriate
resources, my basic, innate tendency is to get with individuals and teams
around me to find out what they need and how their work can be automated and
made easier/more efficient. How well I do that is based on a couple of things.
I strongly believe in using as much Open Source software as possible, so that
availability makes me more productive. My management needs to trust me to
help and not harm, even if my methods seem strange. I try to live and work
as far away from "The Box" as I can. This does NOT mean that I am an
office/cube hermit. I believe in communication, up, down and sideways, and
as much as possible. So my management will know as much about what I'm doing
and why I want to do it as they can tolerate. And I readily accept as much
communication from my management so I can understand and figure out what
needs to be done for the organization.
My basic focus is three-fold:
- Network Technologies
- Dynamic Programming Languages
- Development: of myself by listening and learning, and of others by whatever
means works best for each individual.
Work-wise, this is who I am in one sentence:
Dana is a network/programming technologies junkie who's into automation
and people development.