Closed

Network data stream simulation with time range LDA pattern mining

This project involves the simulation of a SIEM system using Latent Dirichlet Allocation for IoT device streams. It can be implemented in R, Python, C++ or any relevant language that achieves the outcome.

Workflow

Input config > random & pattern generated content streams > stream chunks > LDA parser > output pattern frequency & topics per stream

Data Generation

Input config > random & pattern generated content streams

The generator should be configurable and able to create network simulation data streams. Each stream generates random content and includes generated content as provided by the config file:

1. stream information

2. string and regex patterns to include in the stream (generator fills the regex with matching values)

3. occurrence frequency (range 0 to 10) which represents the number of the generated string and regex patterns to include per minute. Does not have to be very sophisticated, just relatively different.

The generator can be started and stopped.

Example inputs configuration for 2 streams in JSON format.

/ input/[login to view URL]

{

{

“name”: “endpoint1”,

“ip”: [login to view URL],

“port”: 345,

{

“pattern”: “IP_EXT: '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}' MSG: ^#[^ !@#$%^&*(),.?":{}|<>]*$ USER: ^[a-z0-9_-]{3,15}$”

“frequency”: 2

},

{

“pattern”: “PAYLOAD: ^ABC_[^ !@#$%^&*(),.?":{}|<>]*$ ID: ^[a-z0-9_-]{30,150}$”

“frequency”: 5

},

},

{ “name”: “syslog1”,

“ip”: [login to view URL],

“port”: 534,

{

“pattern”: “IP_EXT: '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}' MSG: ^#[^ !@#$%^&*(),.?":{}|<>]*$ USER: ^[a-z0-9_-]{3,15}$”

“frequency”: 2

},

{

“pattern”: “PAYLOAD: ^ABC_[^ !@#$%^&*(),.?":{}|<>]*$ ID: ^[a-z0-9_-]{30,150}$”

“frequency”: 5

},

},

}

Sample stream chunk.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed euismod eros a lectus porttitor, vitae aliquet magna ullamcorper. Praesent in enim non magna vehicula faucibus. Vestibulum lacinia velit ut dolor aliquet tincidunt. IP_EXT: [login to view URL] MSG: #abyx USER: das-dkjh Ut consectetur hendrerit massa vel tempus. Nulla sit amet libero id felis lacinia accumsan. PAYLOAD: ABC_aS57dasd USR: 42d8ffe6-8a65-416c-ac92-d5826315faa6 In dictum porta magna sed lectus venenatis. Aliquam accumsan molestie augue, sit lectus amet vulputate metus tristique et. Ut a lectus erat elit….

Regex specifications from

[login to view URL]

[login to view URL]

[login to view URL]

Stream Parser

stream chunks > LDA parser > output pattern frequency & topics per stream

The streams are red by a parser application which reads each input stream for a configurable span of time (e.g. 30 seconds) as input chunks. You must use the Latent Dirichlet Allocation package or method to analyze the data and create/append to 3 log files per stream. Each run is in a new output folder with a timestamp from when the run began.

1. the found matching patterns log (use the input file to identify patterns),

2. the count of the patterns in that timespan log, and

3. up to 10 highest frequency single string terms (LDA topics, occurrence > 1 & not in regex patterns?)

Attached is a research paper related to the filed of study. My aim is to replicate the basic stream generation and pattern matching using LDA. It is just a proof of concept and not for production code. Good use of comments is always welcome!

Skills: C++ Programming, R Programming Language, Python

See more: javascript extract data website load time, network based atm simulation system java, captcha data entry part time individual, excel data entry part time requirement pune, forex data stream, data entry part time dhaka, network based atm simulation system, network based atm simulation system java implementation, data entry part time job muar, network camera stream windows, send data stream vbnet, swf stream mp3 time, network camera data stream, data enter part time job, network camera stream java project, data encoder part time job davao, javascript real time data stream, javascript real-time data stream

About the Employer:
( 0 reviews ) New York, United States

Project ID: #30617901

4 freelancers are bidding on average $88 for this job

StatisticandArt

Hi, I graduated Bachelor of Statistics. I have experience using R, IBM SPSS, IBM Amos, IBM Modeler, and Tableau because that application have been learned when i was college. I am also a specialist in Basic Statistica More

$100 USD in 5 days
(10 Reviews)
3.2
CHRISPAUL300

MASTERS SOFTWARE ARCHITECTURE C# AND R PROGRAMMING EXPERT HELLO, After KEENLY reading your description and being in possession of all CLEARLY STATED REQUIRED SKILLS as this is my area of PROFESSIONAL SPECIALIZATION hav More

$75 USD in 3 days
(1 Review)
0.0
ArtemStakheev

Hello! This is Artem from Russia who has been working as an Desktop App developer for the last 6 years. I have checked the project description and I think that I can help you to do this project. I am fully feeling co More

$100 USD in 7 days
(0 Reviews)
0.0
dhruvradadiya111

the reason why something is done or used : the aim or intention of something. : the feeling of being determined to do or achieve something. : the aim or goal of a person : what a person is trying to do, become, etc.

$75 USD in 7 days
(0 Reviews)
0.0