TWiki> FipDoc Web>StatisticsModuleGuide (15 Feb 2005, DotFingerPost? )EditAttach

info.FingerPost Statistics Module

 

The FIP Statistics Module for Sports, TV Listings, Elections and Financial data

Fact Sheet

In a world where raw data is cheap and plentiful but useful, timely information a rare and therefore expensive, the FIP Statistics module is used to collate and analyse data from various sources and varying formats so Publishers can produce meaningful information quickly and cheaply.

Where is it useful ?

  • where data arrives from one or more suppliers need merging; the data often arriving at varying times; needing to be stored for future use.
  • where historical data needs to be referenced or added to the feed
      Yesterday's Closing Price for a Shares page
      The result of the same match last season for a baseball match
  • where simple graphs covering the data help explain the point of an article.
  • where writers and researchers need access to reference material for their articles.

The FIP statistics module has been designed to :

  • Add value to incoming data by merging standing copy.
  • Track many different items that are needed to complete a section.
  • Reformat and merge several different input types.
  • Form statistics on past performance.
  • Produce completely finished pages or sections from the data.
  • Be a reference library tool for writers and researchers using on-line facilities to check facts.

It is aimed at applications such as Sports, especially soccer, baseball and other highly structured sports, Radio and TV Listings, Election results and sundry Financial data such as myriad of stocks, shares and small tables.

The module is based around a relational database (Rdbs) running SQL such as Sybase or Ingres.

There are four main sections to the Stats module :

  • Input data formatter. To process incoming data so that it is both valid and loadable to the Rdbs.
  • The Rdbs itself where FIP setups exist for the design, maintenance and production.
  • On-line viewer for researchers/writers.
  • Output formatter which will take data from the Rdbs and format it for :
    • Quark tags.
    • Complete EPSF callable by Pagemaker, Quark etc.
    • OPI-callable info.PostScript page or section.
    • Traditional in-line markup for use within an existing Editorial system.
    • HTML/SGML style tags

The last section is used in conjunction with the FIP Data formatting package which includes a info.PostScript driver.

In fact, the Stats module integrates seamlessly with the other FIP modules which can, of course, be used to both funnel raw data in and distribute the finished products.

Output can either be textual or used as input to the FIP simple graphs module which generates simple, standard info.PostScript graphs.

Other Key elements of the Stats module are :

  • Rules and data validations.
      Where applicable, rules are set in the Rdbs to check the validity of the data. eg. compare goalscorers and full time result for consistency checking.
  • Ad hoc listings and Views using SQL.
    eg. referees and yellow/red cards shown.
  • Optional graphs output.
    eg. track league positions of Liverpool and Chelsea from the beginning of the season.
  • Minimum effort for humans.
      Generally only exceptions and errors, like a goal scorer for a team that is not in the Rdbs, are flagged
  • Transparent to the type of data on the Rdbs whether they are text, pictures, graphics, sound, movies etc
  • Using existing FIP modules to collect, present incoming data and distribute outgoing.
  • Tunable and enhancable by the client.
  • While info.FingerPost delivers a useable production system, the client is encouraged to enhance and personalise it.

The heart of the system is the database or Rdbs. One of the bigger packages, such as Sybase, is advised because of the large amount of data arriving over hours, days or longer need a heavy-duty indexing, tracking and retrieval system.

Of the bigger Rdbs, a single copy can support several different and independent statistics systems - several sports, tv and radio - with password control protecting misuse.

In More Detail ...

Sports

The Stats module is most useful for those sports that are heavily statistically based already - soccer, baseball for example.

It is based on teams which can be in one or more leagues and one or more cup competitions which run over a season. Comparable results from the previous season can be referenced and reused.

The level of detail set by the system administrator - and by the data available. A Newspaper with a strong local team may want to track every detail of that club down to individual player appearances. But only the final scores may be relevant for a semi-professional side in a remote minor league.

FingerPost supply a set of rules to cleanup the incoming data in case of errors - scorers for teams that are not playing for example.

Where the use of the database would be to good example is for the Saturday afternoon results sections of a Sunday paper.

Beforehand the system administrator puts together a list of matches to be tracked.

Usually team lists, goals scored, attendances, sent offs, half and full time results are all sent as each occurs during the afternoon. The job of the Statistics module is to collate and verify this data such as whether the total goalscorers equals the half-time score for each match.

During the afternoon, any inconsistences or errors that need checking are flagged for attention.

At 4.50 pm, a completion list is produced for manual verification stating what is missing. If all is correct, the section can be released.

Producing the results pages quickly and efficiently in this manner can cut Sunday paper deadlines significantly when compared to traditional means.

Radio and TV Listings

This is a problem of formatting the same data, or very similar data, for different regional editions or different publications. Data needs to be updated by many small late or partial changes from data-suppliers, and combined with critics text generated internally.

There is also a need to add uniqueness to a very standard product - every paper runs some sort of listing section and nowadays they need to add standing data or editorial text and pix to make it more interesting to the reader.

So the Statistics module has to assemble the sections from input data from several sources or files, including late changes from the suppliers.

Financial Sections

The financial sections of most news publications rely heavily on historical data - position last week last close, highs, lows.

However most papers do NOT need to recreate the complete raw data of their main feeds - shares, bonds/unit trusts - as their suppliers can normally give a reasonably sophisticated feed containing the requirements of that publication - Yields, P/E ratios etc.

So for the larger tables, the Statistics module would generally be used to produce standard graphs and non-standard listings and analysis such as year-on-year changes.

However many of the smaller tables are sourced directly from the generators - banks, traders etc - and in many cases are either incomplete, in the wrong format or need extra information added.

For example, a table labelled Commodities may be sourced from several exchanges and traders each supplying their data in a different format.

The aim of the Statistics module here would be able to clean up the varying input data streams and deliver a single consistent table.

Election Results

There are Two big problems with running 'election specials' in any news publication :

  • The Need for as many results as possible at the latest edition time.
  • There are always too many or too few results to place in the page area allotted by Editorial.

To get over this problem, the Statistics module can be used to do one or both of the following :

  • Trigger complete sections on predefined shapes. A series of shapes of increasing size are setup beforehand. When enough results arrive that fill a particular shape, it is output automatically.
  • Run with a fixed space but add extra elements that have been previously added - editorial text, pictures, graphs, biographies of candidates - to fill the space. These extra items would be graded from a 'Must Have' level (like the result itself) to a 'Least Useful' level (like a Photo of the outgoing member).

With all the relevant data, the Statistics module can be used for other purposes that producing sections on the night. Editorial reference on the biographies; sundry calculations such as 'track current swings with % and probably outcome'; creation of CDROMs etc

By merging predefined shapes with editorial holes, the arrival of the result which trggers the section to getting the plate on the press of only a few minutes should be possible.

In addition to the normal cleanup of the input data and formatting the output with the Data Formatting module.

Notes and Comments

Topic revision: r1 - 15 Feb 2005 - 18:37:12 - DotFingerPost?
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback