In a world where raw data is cheap and plentiful but useful, timely information a rare and therefore expensive, the FIP Statistics module is used to collate and analyse data from various sources and varying formats so Publishers can produce meaningful information quickly and cheaply.
Where is it useful ?
It is aimed at applications such as Sports, especially soccer, baseball and other highly structured sports, Radio and TV Listings, Election results and sundry Financial data such as myriad of stocks, shares and small tables.
The module is based around a relational database (Rdbs) running SQL such as Sybase or Ingres.
There are four main sections to the Stats module :The last section is used in conjunction with the FIP Data formatting package which includes a info.PostScript driver.
In fact, the Stats module integrates seamlessly with the other FIP modules which can, of course, be used to both funnel raw data in and distribute the finished products.
Output can either be textual or used as input to the FIP simple graphs module which generates simple, standard info.PostScript graphs.
Other Key elements of the Stats module are :
The heart of the system is the database or Rdbs. One of the bigger packages, such as Sybase, is advised because of the large amount of data arriving over hours, days or longer need a heavy-duty indexing, tracking and retrieval system.
Of the bigger Rdbs, a single copy can support several different and independent statistics systems - several sports, tv and radio - with password control protecting misuse.
It is based on teams which can be in one or more leagues and one or more cup competitions which run over a season. Comparable results from the previous season can be referenced and reused.
The level of detail set by the system administrator - and by the data available. A Newspaper with a strong local team may want to track every detail of that club down to individual player appearances. But only the final scores may be relevant for a semi-professional side in a remote minor league.
FingerPost supply a set of rules to cleanup the incoming data in case of errors - scorers for teams that are not playing for example.
Where the use of the database would be to good example is for the Saturday afternoon results sections of a Sunday paper.
Beforehand the system administrator puts together a list of matches to be tracked.
Usually team lists, goals scored, attendances, sent offs, half and full time results are all sent as each occurs during the afternoon. The job of the Statistics module is to collate and verify this data such as whether the total goalscorers equals the half-time score for each match.
During the afternoon, any inconsistences or errors that need checking are flagged for attention.
At 4.50 pm, a completion list is produced for manual verification stating what is missing. If all is correct, the section can be released.
Producing the results pages quickly and efficiently in this manner can cut Sunday paper deadlines significantly when compared to traditional means.
This is a problem of formatting the same data, or very similar data, for different regional editions or different publications. Data needs to be updated by many small late or partial changes from data-suppliers, and combined with critics text generated internally.
There is also a need to add uniqueness to a very standard product - every paper runs some sort of listing section and nowadays they need to add standing data or editorial text and pix to make it more interesting to the reader.
So the Statistics module has to assemble the sections from input data from several sources or files, including late changes from the suppliers.
The financial sections of most news publications rely heavily on historical data - position last week last close, highs, lows.
However most papers do NOT need to recreate the complete raw data of their main feeds - shares, bonds/unit trusts - as their suppliers can normally give a reasonably sophisticated feed containing the requirements of that publication - Yields, P/E ratios etc.
So for the larger tables, the Statistics module would generally be used to produce standard graphs and non-standard listings and analysis such as year-on-year changes.
However many of the smaller tables are sourced directly from the generators - banks, traders etc - and in many cases are either incomplete, in the wrong format or need extra information added.
For example, a table labelled Commodities may be sourced from several exchanges and traders each supplying their data in a different format.
The aim of the Statistics module here would be able to clean up the varying input data streams and deliver a single consistent table.
There are Two big problems with running 'election specials' in any news publication :
To get over this problem, the Statistics module can be used to do one or both of the following :
With all the relevant data, the Statistics module can be used for other purposes that producing sections on the night. Editorial reference on the biographies; sundry calculations such as 'track current swings with % and probably outcome'; creation of CDROMs etc
By merging predefined shapes with editorial holes, the arrival of the result which trggers the section to getting the plate on the press of only a few minutes should be possible.
In addition to the normal cleanup of the input data and formatting the output with the Data Formatting module.