TBAMarko Mitić bio:
Data Science Conference is first-open Conference dedicated to Data Science on Balkan. When we started, we wanted to make some impact, some crucial changes in Data Science scene in Serbia. We wanted to make high-level event and our role models were Conferences such as PyData & Strata + Hadoop. Nowadays, goals of the Conference are to promote Data Science in Balkan, to foster networking in the community and to help Data Science ecosystem to develop itself.
"We choose Data Science because it is everywhere around us, effecting our lives on daily basis". The field of Data Science is evolving into one of the fastest-growing and most in-demand fields in the world. Organizations across industries are looking to make sense of the data they can now collect from new technologies – from predicting the next hot product to determining the risk of an infectious disease outbreak.
Team behind the Conference is Institute of Contemporary Sciences. You can find more about our work on www.isn.rs.
This year we divided our speakers in two parallel tracks: Big Data & Machine Learning track. Topics that will be covered this year are:
Machine Learning - such as, but not only: Natural Language Processing, Deep Learning, Neural Networks, Bayesian Non-Parametrics, Topic Models, Probabilistic Programming, Machine Learning Tools in practice & cetera;
Big Data - such as, but not only: Big Data Models and Algorithms, Big Data Management, Quality of Big Data Services, Big Data Search, Mining, and Visualization, Big Data Applications, Real-Time Big Data Analytics, Big Data Tools in practice & cetera.
Speakers for the Conference were selected through public Call for Papers and in partnership with our sponsors. You can find the speakers we selected through Call for papers in next section. Schedule of the Conference will be announced on 10th of September.
Last year we organized first-open Conference dedicated to Data Science on Balkan. Some of the Conference number were:
- 500 pax applications
- 300 attendees
- 2 working days, with 2 parallel tracks
- 24 speakers, 22 speach/workshops
- 580 likes on Facebook page, average post reach of ~5000 people
- 20 business partners
For more information about previous Conference, you can check our last year web site.
Meet our speakers:
BI Team Leader at Banca Intesa Beograd
CEO at Institute NIRI Ltd.
CFO & COO at Privredna banka Sarajevo
Data Science Lead at Flight Data Services
Co-Founder at SmartCat.io
Assistant Professor at Faculty of Organizational Sciences
Information Management Specialist at IBM
Machine learning consultant at Institute NIRI Ltd.
CEO at Things Solver
CTO at SV Group
Advisor at RATEL
Head of Content Insights Labs
Senior System Administrator at Pirate Technologies
Data Engineer at Things Solver
BI team leader at Parallel d.o.o.
Business Data Analyst at Telenor Srbija
Research Assistant at NIRI Intelligent Computing
Teaching assistant at Faculty of Civil Engineering, Belgrade
1st Walker at Cloudwalker
Analytics/Big Data Business Development Manager at Comtrade System Integration
Technical specialist at Ibis Instruments
Senior Data Scientist at Microsoft Development Center Serbia
We want to thank you all for submitting application for Data Science Conference 2.0. Conference will be held at Hotel Holiday Inn on 11-12. of the October. You can check our speakesr team above and Conference schedule below. See you at the Conference!
See the schedule of the Conference:
TBAMarko Mitić bio:
Machine learning is starting to become a standard part of everyday operation in corporations across various industries. In this sense, companies are nowadays starting to use standardized tools which, in combination with cloud technologies, provide model development in an increasingly understandable, user-friendly environments, allowing for more and more decision making to be based on machine learning algorithms. This lecture will show how sports and energy industries are leveraging cloud-based ML tools to gain some interesting and business-critical insights. Apart from the technical part, the talk will entail discussion about how Microsoft Development Center Serbia is increasingly using data science for the development of cloud services.Igor Ilić bio:
Worked in science, energy, finance and, now, IT industry. Holds PhD in Electrical Engineering and MSc in Economics. When it comes to data science experience, prior to joining Microsoft, worked as a quantitative analyst in investment bank Goldman Sachs. Currently employed as a part of the Azure SQL team, involved in building technical and business models related to performance of customers’ workloads.
Academic community, future data scientists mainly study methods for data mining using open source software like R or Python. However, pharmaceutical, financial and other big companies are using commercial tools such as SAS, SPSS,.. Moving to commercial tools can sometimes be difficult and expensive. That is why companies often decide to hire a senior or a person with experience in working with commercial data mining tools. So, it legitimate is to ask ourselves questions: Is studying commercial data mining tools a privilege for those working in big companies? Is it possible to study commercial data mining tools for free? This presentation will show us the developing path of a data miner as well as which SAS services can be used for free while studying. Several characteristic examples will show different SAS OnDemand for Academics possibilities.Vladimir Marković bio:
Vladimir have comprehensive experience in DW/BI design and development primarily based on Microsoft and SAS BI platforms and products. This experience is further bolstered by years of working in implementation and development different kinds of DW/BI solutions and products. During his work in the bank, he’s gained broad business background in different fields of BI application especially in analytical customer intelligence, credit risk scoring, credit risk portfolio management and accounting. He is an experienced trainer and presenter. He enjoys sharing enthusiasm by presenting and promoting DW/BI at courses, user groups, technical events and conferences. Vladimir holds a MSc in Math and Computer Science from Faculty of Mathematics, University of Belgrade. Areas of his interest are dimensional modeling and data mining.
This talk is about data science and statistics applied to flight safety in commercial aviation worldwide. In the introductory part, we will stress the importance of monitoring your flight data and show you some real records coming from flight data recorders (aircraft “black boxes”). We will then explain how the data is recorded, downloaded, analysed, converted to safety events and finally, validated by experts in the field - flight data analysts. Data aggregation across many flights will result with statistical images of safety risks in airlines’ operations. However, this valuable tool can turn into a deadly weapon if used negligently – we’ll support this claim with examples. We are convinced the audience will know about some of these traps, regardless of the industry they are coming from, but hopefully there will be something valuable to take home, too. In the second part, we are saying goodbye to the data analyst and the statistician – the two dominant guys from the first part of the presentation. However, a data scientist will stem from valuable experiences and domain knowledges of the two. This guy will walk the audience through three simple, but working examples. The first one is about how we can improve the accuracy of automated analysis by using historic data and a probabilistic, Bayesian approach. The second example is about finding novel safety risks in airlines’ operations by using simple principal component analysis. Lastly, we’ll use a Markov model to detect aircraft which have changed behaviour with respect to frequency of data downloads, so we collect as many flight data as possible. We will try to make this chat as interesting and as interactive as we can and are looking forward to meeting you at this fun and interesting conference!Marko Vasiljevski bio:
Marko is a physicist leading the data science department at Flight data services. His everyday work is about exploration of all available sources of aviation data, algorithm development, statistics, chatting with people and research. For data exploration and algorithm development he mainly uses open source tools like R, Python and PostgreSQL. The chats are mainly with fellow data scientists, pilots, developers and airlines’ flight safety officers, whilst the research covers reading the material on forgotten and novel approaches in machine learning, statistics and programming. Having fun is a very important aspect of his life and work, but he takes aviation safety extremely seriously and is keen to help airlines with interpretation and making use of their data. At flight safety and data science seminars and conferences, Marko is a speaker and promoter of safe flying, data sharing and data-driven decision making.Raffaele Rainone bio:
Raffaele joined Flight data services as a data scientist having previously worked in a location analytics company as RF/DSP developer where, among other R&D tasks, he dealt with BigData and contributed to developing and improving a WiFi localisation algorithm by combining his knowledge of Python and PostgreSQL with probabilistic tools such as Markov chain Monte Carlo methods. Raff mainly focuses on analysing flight data with a goal to develop machine learning algorithms aimed at improving flight safety. He also spends a great deal of time on exploration of job parallelisation and distribution using Apache Spark, Dask and Numba. He obtained BSc in Mathematics from the University of Naples, and MSc in Mathematics from the Universities of Padua and Leiden. In 2014, he was awarded a PhD in Pure Mathematics by the University of Southampton with a thesis in group theory.
TBABojan Sovilj bio:
Bojan is one of the pioneers of the Data Science scene in Serbia. He finished Math faculty at Belgrade University. His vast experience comes from more than 12 years of working experience in Mozzart Bet - where he worked on big and complex systems. Curently, he is working at his own company Cloudwalker on the position of the 1st Walker.
Web log analysis is a standard procedure on most sites. As the number of visits grow this is one of the first practical applications of Big Data systems. The goal of the presentation is to demonstrate, on an example, how to build a system to analyse web logs. As a basic tool I'm suggestion Cloudera CDH, as a tool for data collection StreamSets, and for keeping I suggest parallel storage in two formats: Tab separated for analysis, and ElasticSearch with Kibana front-end for quick insights and dashboards.Nikola Krgović bio:
Nikola is Senior System Administrator with over 15 years of experience. Passionate about server architectures, unix systems and data storage.
Big Data paradigm is reality of every modern company. In order to meet such challenges, Ibis developed solution that offers efficient data storage and access to humongous amounts of data. In addition, analytic layer based on MapReduce and Machine learning is implemented. The goal of this presentation is to introduce Ibis Performance Insight, its architecture and ecosystem, as well as the capabilities, functionalities and successfull implementations.Milan Simaković bio:
Milan is Technical Specialist at Ibis Instruments. He faces problems related to system integration, analytics and Big Data on daily basis. One of the main developers of IPI. Passionate about data analytics, system architecture, Spark, Linux, etc.
Big Data is the reality of modern business: from big companies to small ones, everybody is trying to find their own benefit. Big Data technologies are not meant to replace traditional ones, but to be complementary to them. In this presentation you will hear what is Big Data and Data Lake and what are the most popular technologies used in Big Data world. We will also speak about Hadoop and Spark, and how they integrate with traditional systems and their benefits.Darko Marjanović bio:
Darko is cofounder and director of the company Things Solver, which main focus is on Big Data technologies. He is one of the founders of Data Science community in Serbia. He is mainly engaged in architecture of the Big Data application, collection and preparation of the data. His focus is on the Hadoop and Spark.
Having the domain knowledge, learning and improving your technical skills is a way to go and what is expected from any of us if we want to be considered as professionals. Ognjen will show you another dimension of business which will help you excel at what you do. He will tell you a story of how you can visualize and communicate your solutions.Ognjen Zelenbabić bio:
Ognjen is a Storyteller - Unveiling and communicating the secrets the data is hiding
The new IBM Data Science Experience, built upon Apache Spark and Open Source R, allows data scientists to create projects for collaboration with other data scientists and data engineers, sharing notebooks, data sets, data connections, or using RStudio within the same web site. It allows to easily share notebooks and insights with stakeholders, and provides a community area with curated data sets, sample notebooks, samples, and tutorials, to learn from and use as a starting point. This session is an overview and demo of the Data Science Experience.Mladen Jovanovski bio:
Mladen is a Client Technical Specialist in IBM focused on both traditional data processing and leveraging potential of Big Data processing. His main areas of expertise are overall solution architecture, data integration, storing data and subsequent data analysis.
TBAĐorđe Nedeljković bio:
Đorđe is teaching assistant in the group of courses related to the application of information technology in civil engineering and geodesy (Computer aided design in civil engineering, Object oriented programming, Introduction to programming in Matlab/Visual Basic, Databases in civil engineering). Additional fields of interest include knowledge management and text mining in construction project documents and Building information modeling.
Apache Spark, and its components Spark ML and MLlib in particular, offer a range of possibilities for machine learning on large data sets. Spark contains powerful algorithms for supervised and unsupervised learning, regression, classification and clustering, as well as methods for data transformation and preparation. I will give an overview of these algorithms and methods and examples of Apache Spark programs using them.Petar Zečević bio:
Petar is working as Java developer, software architect, IBM software consultant for 15 years now. Author of "Spark in Action". Organizer of Apache Spark Zagreb Meetups.
Neural network are systems modeled on the human brain which consist of number of neurons and connections between them. The neural networks weights are that what makes memory possible, i.e. acquiring certain knowledge, and they are modified through iterative learning process.In the process of learning, weight modifications are done by a learning algorithm and back-propagation (gradient descent) is the most famous one. However, the final result of back-propagation training is significantly dependent on initial weight values. Genetic algorithm is a stochastic search tool based on evolutive principles, which can be used as a learning algorithm without limitations. The scope of genetically trained networks is examined through the problem of credit risk assessment in banking, the research area known as credit scoring. Compared to back-propagation algorithms, experimental results on well known benchmark problems in this area (Australian and German credit data), show certain advantages of the genetic learning networks.Srđan Mlađenović bio:
Srđan is Business Development Manager in Comtrade System Integration in charge for data analytics solutions development. His areas of expertise are information management, business intelligence (BI) and specifically predictive analytics where he have strong academic background in machine learning. Among more than fifty projects where he has been involved, the most mentionable are data warehouse and business intelligence project in Telekom Serbia as one of most data intensive project in sense of data volume and complexity of transformation required (ETL). His research interests are focused on predictive classification problems applied in churn prediction, credit scoring and other related domains.
As huge number of traditional TV programs and on demand video streams offered via Internet is now simultaneously available through hybrid broadcast broadband television, the search for an interesting content often turns into a time-consuming task for a viewer. In a situation like this, both the providers and the viewers would benefit from personalized recommender systems. The choice of neural network architecture and learning algorithm is mainly influenced by users’ privacy concerns and characteristics of data collected from user interactions. In this session, it will be discussed how to overcome these challenges by using feedforward neural network trained by cost-sensitive version of Extreme Learning Machine (ELM) algorithm and sparse ELM autoencoder trained with fast iterative shrinkage-thresholding algorithm, considering cases with and without “dislike” interactions, respectively. Through a series of tests it will be shown that proposed solutions improve system performance and consequently increase users’ satisfaction."Marko Krstić bio:
Marko received both his bachelor and master degrees in Electrical and Computer Engineering from School of Electrical Engineering (ETF), University of Belgrade. He is currently a Ph.D. candidate at the same faculty. Although his interests during bachelor studies were closely related to the telecommunication networks, his research on master and Ph.D. studies is mainly focused on recommender systems and machine learning techniques. Three years ago he started to work at Regulatory Agency for Electronic Communications and Postal Services (RATEL) as Advisor in IT department. He is a holder of many certificates from which he would emphasize Data Science Associate certificate provided by EMC.
Information extraction can be based on a method of distributional vector space embedding of words and phrases. Embedding assumes that words and phrases are represented as dense real-valued vectors, and it is designed to satisfy the distributional hypothesis: words and phrases that occur in similar contexts tend to have similar meanings, and therefore they should have vectors which are close to each other in a vector space. In this speech we will show you how we manage to extract phrases using Pointwise Mutual Information and then learned word and phrase vectors, using as a training corpora set of business articles, job vacancies and employee resumes.Jelena Milovanović bio:
Jelena is final year student of the MSc course at the Department of Computer Science, Faculty of Science and Mathematics, Nis. Passed all exams with top mark and currently finishing her MSc thesis in the field of Machine learning. Employed at the Research and Development Institute NIRI as a Research assistant. Jelena is the receiver of the annual city of Nis award for the best student of the faculty in 2016.
Recurrent Neural Networks (RNN) form a wide class of neural networks in which feedback connections between processing units are allowed. Applications of RNNs range from industrial process identification, modelling and adaptive control to financial time series prediction and classification, audio and video signal processing and sequence labeling in natural language processing. Echo state recurrent neural networks (ESNs) are arguably one of the most interesting recently proposed learning models in this field, since they have been considered as possible learning model in biological brains. In this presentation we first establish connection of ESN with some previously known recurrent network architectures and then propose a set of on line training algorithms, derived from recursive Bayesian joint estimation of RNN states and parameters.Branimir Todorovic bio:
Branimir is associate professor at Computer Science Department, Faculty of Mathematics and Sciences, University of Nis and Lead Scientist in NIRI, Nis. He received his Doctor of Science degree from Faculty of Electrical Engineering, University of Belgrade. His research interest include sequential Bayesian training of feed forward and recurrent neural networks, blind source separation and deconvolution, on line training of structural classifiers, active and semi-supervised learning algorithms and natural language processing.
On this speach you will get an overview of Apache Spark, a distributed computational framework, and see why it's a logical successor to Hadoop's MapReduce. Petar will describe the main Spark components - Core, SQL, Streaming, GraphX, and ML - and will show how to use them through short code examples.Petar Zečević bio:
Petar is working as Java developer, software architect, IBM software consultant for 15 years now. Author of "Spark in Action". Organizer of Apache Spark Zagreb Meetups.
A department, somewhere in EU, depends on having a steady input of 3000 new textual documents per day, 365 days a year. Documents come from 10 different sources and each document comes pre-classified into a single category of a large taxonomy. The department is unhappy: the accuracy of incoming document classifications seems to be low. Even after the department puts additional 800% FTE throughout the year to manually repair or discard wrongly classified documents, the accuracy still lags behind their targets. NIRI was hired to conduct a research and develop an accurate document classifier. The plan was to use NIRI’s classifier to replace the unreliable classes coming with documents, and thus solve the problem of low accuracy, as well as reduce the high cost of 800% FTE. In this talk we will share our experiences: classification approach used to meet the needs of our client, challenges in demonstrating progress during the project, and the approach used for the acceptance-validation of our classifier.Marko Smiljanić bio:
Marko is a graduate from the Faculty of Electronics in Niš, on which he’s been working as an associate in research and lecturing. He had become a science emigrant, and earned a PHD in computer sciences in the Netherlands in 2006 and he is an author of ten international science publications. After returning to Serbia, he turns to business world, and forms a group for developing software and consulting services for Text Analytics and Data Mining in company NIRI. Today, besides being CEO in NIRI, Marko is a guest lecturer on faculties in Niš, organizes student internships and is included in advanced technologies cluster in Niš.
The main idea of a Data Lake is to expose the company data in an agile and flexible way to the people within the company, but preserve safeguard and auditing features that are required for the company’s critical data. The way that most projects in this direction start out is by depositing all of the data in Hadoop, trying to infer the schema on top of the data and then use the data for analytics purposes via Hive or Spark. Described stack is a really good approach for many use cases, as it provides cheaply storing data in files and rich analytics on top. But many pitfalls and problems might show up on this road, which can be easily met by extending the toolset. The potential bottlenecks will be displayed as soon as the users arrive and start exploiting the Lake. From all of these reasons, planning and building a Data Lake within an organization requires strategic approach, in order to build an architecture that can support it.Miloš Milovanović bio:
Miloš is co-founder and data engineer in company Things Solver. Also, he is one the founders of the Data Science community in Serbia. His prime focus is on analytics on big amount of data and data visualisation.
Back in the days, you had a single machine and you could scroll down the single log file to figure out what is going on. In this Big Data world you need to combine a lot of logs together to figure out what is going on. Data is coming in huge volumes, with high speed so choosing important information and getting rid of noise becomes real challenge. There is a need for a centralized monitoring platform which will aid the engineers operating the systems, and serve the right information at the right time. This talk will try to help you understand all the challenges and you will get an idea which tools and technology stacks are good fit to successfully monitor Big Data systems. The focus will be on open source and free solutions. The problem can be separated in two domains which both are the subject of this talk: metrics stack to gather simple metrics on central place and log stack to aggregate logs from different machines to central place.Nenad Božić bio:
Nenad is a craftsman with rich software engineering experience, an all-arounder, but when he does backend coding he feels right at home. Striving for knowledge is his main drive, which is why he enjoys learning about new tools and languages, blogging, working on open source, presenting. His current focus is Big Data and the surrounding ecosystem of tools which is why he co-founded the SmartCat company. Strong believer in balance between good technical skills and soft skills. A family guy who tries to spend the most of his free time with his wife and daughter.
Success stories of the Big Data paradigm and Predictive Analytics in many application areas led to the wide recognition of their high potential impact application areas like healthcare, marketing, finance etc. However, there is still a large gap between actual and potential data usage, because of numerous challenges: high dimensionality, sparsity, data heterogeneity, privacy concerns, the need for collaboration between domain experts and data scientists, demand for highly accurate and interpretable models etc. On the other side, extensive efforts of scientific research offer many partial or complete solutions for the aforementioned challenges. Coordination of research and industry efforts (fusion of cutting edge predictive analytics methodologies with commercial or non-commercial products) should lead to increased exploitation of Big Data promise, better satisfaction of industry needs and new methodological breakthroughs.Milan Vukićević bio:
Milan is an Associate Professor at the University of Belgrade, Faculty of Organizational Sciences. He is also CEO and one of the founders of Big Data Analytics company, for consulting, research and development in the area of Predictive Analytics. He worked as a Visiting Researcher at the Data Analysis and Biomedical Analytics (DABI) Center at Temple University (2014-2015). His work was published in multiple conferences, journals and book chapters. Milan also had several invited talks on bioinformatics and healthcare predictive analytics topics.
Business Dev manager