public class RssStreamProviderTask extends Object implements Runnable
Runnable
task that queues rss feed data.
RssStreamProviderTask
reads the content of an rss feed and queues the articles from
the feed inform of a ObjectNode
wrapped in a StreamsDatum
.
The task can filter articles by a published date. If the task cannot parse the date of the article or the article does not contain a
published date, by default the task will attempt to queue article.
A task can be run in perpetual mode which will store the article urls in a static variable. The next time a
RssStreamProviderTask
is run, it will not queue data that was seen the previous time the rss feed was read.
This is an attempt to reduce multiple copies of an article from being output by a
RssStreamProvider
.
** Warning! **
It still is possible to output multiples of the same article. If multiple tasks executions for the same rss feed overlap
in execution time, it possible that the previously seen articles static variable will not have been updated in time.Modifier and Type | Field and Description |
---|---|
protected Map<String,Set<String>> |
PREVIOUSLY_SEEN
Map that contains the Set of previously seen articles by an rss feed.
|
Constructor and Description |
---|
RssStreamProviderTask(BlockingQueue<StreamsDatum> queue,
String rssFeed)
Non-perpetual mode, no date filter, time out of 10 sec
|
RssStreamProviderTask(BlockingQueue<StreamsDatum> queue,
String rssFeed,
org.joda.time.DateTime publishedSince)
Non-perpetual mode, time out of 10 sec
|
RssStreamProviderTask(BlockingQueue<StreamsDatum> queue,
String rssFeed,
org.joda.time.DateTime publishedSince,
int timeOut,
boolean perpetual)
RssStreamProviderTask that reads an rss feed url and queues the resulting articles as StreamsDatums with the documents
being object nodes.
|
RssStreamProviderTask(BlockingQueue<StreamsDatum> queue,
String rssFeed,
int timeOut)
Non-perpetual mode, no date filter.
|
Modifier and Type | Method and Description |
---|---|
String |
getRssFeed()
The rss feed url that this task is responsible for reading.
|
protected Set<String> |
queueFeedEntries(URL feedUrl)
Reads the url and queues the data
|
void |
run() |
public RssStreamProviderTask(BlockingQueue<StreamsDatum> queue, String rssFeed)
queue
- queuerssFeed
- rssFeedorg.apache.streams.rss.provider.RssStreamProviderTask
#RssStreamProviderTask(java.util.concurrent.BlockingQueue, String, org.joda.time.DateTime, int, boolean)}
public RssStreamProviderTask(BlockingQueue<StreamsDatum> queue, String rssFeed, int timeOut)
queue
- queuerssFeed
- rssFeedtimeOut
- timeOutorg.apache.streams.rss.provider.RssStreamProviderTask
#RssStreamProviderTask(java.util.concurrent.BlockingQueue, String, org.joda.time.DateTime, int, boolean)}
public RssStreamProviderTask(BlockingQueue<StreamsDatum> queue, String rssFeed, org.joda.time.DateTime publishedSince)
queue
- queuerssFeed
- rssFeedpublishedSince
- publishedSinceorg.apache.streams.rss.provider.RssStreamProviderTask
#RssStreamProviderTask(java.util.concurrent.BlockingQueue, String, org.joda.time.DateTime, int, boolean)}
public RssStreamProviderTask(BlockingQueue<StreamsDatum> queue, String rssFeed, org.joda.time.DateTime publishedSince, int timeOut, boolean perpetual)
queue
- Queue to push data torssFeed
- url of rss feed to readpublishedSince
- DateTime to filter articles by, will queue articles with published times after thistimeOut
- url connection timeout in millisecondsperpetual
- true, if you want to run in perpetual mode. NOT RECOMMENDEDpublic String getRssFeed()
protected Set<String> queueFeedEntries(URL feedUrl) throws IOException, com.rometools.rome.io.FeedException
feedUrl
- rss feed urlIOException
- when it cannot connect to the url or the url is malformedcom.rometools.rome.io.FeedException
- when it cannot reed the feed.Copyright © 2018 The Apache Software Foundation. All rights reserved.