Data+Warehouses

="Big Data" -- Data Warehouses=

=
Data Warehouses are a special type of database geared to analytic processing. The data in a data warehouse generally comes from transactional systems. The transactional data is cleansed and transformed through a process known as Extract, Transform, and Load (ETL) from a stream of raw transactions to a structure that allows for very efficient querying and analysis, but slow updates. As we will discover in class today, there are many benefits to keeping the transactional systems separate from the analytics systems.=====

By the end of today's class you should be able to:

 * =====Explain the most important attributes of a data warehouse and how it differs from a transactional database=====
 * =====Explain what happens in each of the steps of the Extract, Transform, and Load process and why this process is important=====
 * =====List, and provide examples of, many of the most common data integrity problems that are repaired in the ETL process=====

=
Prior to class you need to read chapter 8 of [HBP09]. The reading is available, along with further instructions and guidance about the key things to focus on in the reading on Blackboard's digital reserves, in the 'Course Documents' section.=====

=
For phase 3 of the Karwa project, the company's management team would like to create a data warehouse that they can use to analyze and improve their operations with a data-driven approach. Briefly answer **one** of the two questions below prior to class:=====

=
1) Describe an analysis that Karwa might do on the data they collect in their data warehouse that would help them run their business more effectively. You should specify the question that they will ask, briefly describe the data that they will need to collect and integrate to answer that question, and how they will perform the analysis. You should take no more than one paragraph in your response.=====

=
2) Describe a valuable analysis that the Karwa managers could do on the data in their data warehouse to improve their operations if they were able to integrate data from an external source. Briefly describe the analysis, what data they are missing, and where they might be able to get the missing data.=====

[Hussain Hejji]: I think that a valuable analysis that Karwa could use in their data warehouse would be their competitors' information. Although there aren't any official ones, but there are a lot of independent taxi drivers and also a private taxi company called FOX. the analysis would be about how they can get these information from their competitors and that why some people would go for these services rather than choosing Karwa. I think they could run surveys for their customers or maybe even by asking their customers, they could get the information they need of their competitors. Karwa also can observe why some people would choose these kind of services over Karwa. One of the reasons I bealive is maybe because of the availability and price, then maybe Karwa can reduce their prices or offer more taxi cars for their customers.

[Layal Al-Alami]: If Karwa is able to forecast demand patterns during weekends and demand of cabs in specific areas, they will be able to supply the right amount of cabs by making more vacant cabs available in that area. The question they will ask is: How is demand during the weekend affected by the location of the customer ? The data they need includes the number of orders they receive during week days, the number of orders they receive during weekends and the number of orders they receive for all locations. They can then use analytical IS to be able to forecast the demand patters, or use the expertise of their analysts. They will need to find the percentage of orders for every area during weekends, and then compare that to the percentage of orders for every area during week days to find the the significance of the increase.

[Walied El Hag Ali]: A question that Karwa needs to answer to help them run their business more effectively is "How should it allocate its taxis around the country to match their demand?". To answer this question Karwa will need to gather information about the population of Qatar, such as which areas or cities have a higher populations than others. Also Karwa will need to collect data about the number of trips made by its taxis relative to the areas they were assigned to go to. Karwa should perform this analysis by conducting experiments. They can assign a certain number of taxis to a certain city and then observe whether the demand is matched or whether they need to send more taxis. With all of this information in its data warehouse, Karwa will be able to overcome the demand mismatch issue and run more effectively.

[Dua'a Althabatah]: I think Karwa must consider and gather data about the locations and destinations in Qatar. As I have heard from many people that Karwa cabs take so long to arrive to the location to pick them up and so they need to call the cab an hour or more prior to the time they actually need a ride. So Karwa also have to distribute it's karwa taxies more in the places where the demand is more. to do this, they need to conduct researches and study the behavior of their customers in term of where do the most of calls/orders come from? and so statics to figure out where they need to distribute more taxies. by doing this they will have more happy customers and a better reputation and also will help them run their business more effectively.

[Khadeejah Al-Husseiny]: Question Karwa might ask is " Where are the most popular ares? And how often are they visited?" in order to help their business run more effectively. They can answer this question by collecting data regarding the number of karwa users, and the locations visited. They can then begin to see future patterns, this will help them match demand an supply as well as help dispatch taxis more efficiently in populated popular areas.

[Fatima Al-Khayat]: Karwa needs to answer this question “How to match their resources (taxis) with the demand from their customer?” in order to run their business more effectively. They could answer this question by doing an analysis for the data that they have collected from previous orders from customers. These data must include the number of orders Karwa has received in each month, the number of orders that has been confirmed and has been declined. They also need data about the most occurring locations that customers are picked up from and the most visited location by customers. This would help in supplying the right amount of taxis in the right place and the right time. As the analysist, would be able to identify if there is a mismatch between the supply and demand, he will also be able to identify the reason why some requests are declined and find solutions for it. This data analysis will help in balancing the demand for taxis with the number of taxis Karwa has.

[Ahmad Al-Sarraf]: I think Karwa should focus on collecting and integrating data about the uncertainty in names and numbers of some streets in some places in Qatar. As we saw in the case, customers and Karwa don’t have the same name and number of some places. Meaning, Karwa drivers might have problems in understanding some places the customer wants to go to. The question Karwa wants to answer is, how can we fix this naming and numbering issue? I think Karwa should use external data, industry information from Ministry of Municipality and Urban Planning and add the exact street names and numbers to their data warehouse.

[Mohammed Al-Rawahi]: In order for Karwa to be more profitable, it should be able to answer this question: Which customers are more profitable than others? To be able to answer this question, Karwa should first classify their customers that they have recorded history of, by dividing the customers into profitable and non-profitable customers. Then after getting this part of the analysis, Karwa should focus more on profitable customers. For example, if Karwa received to orders simultaneously from a profitable and non-profitable customer, and they have only one vehicle that is available to pick one of them, the taxi should be directed to to the profitable customer, and should apologies to the non-profitable customer. By knowing their primary target, Karwa can be more profitable by properly analyzing their stored data.

[Maryam Al-Subaie]: Question 1 Karwa occasionally has a demand mismatch problem in which they have taxis available and unoccupied in one part of town while there is large demand for the taxis elsewhere, but none available. Karwa should use their data warehouse to find the most common locations passengers are frequently picked-up from. They should then make more taxis available in areas where there is a large demand for them rather than the contrary. To answer this question, Karwa would need to query the data warehouse to retrieve a list of the most popular/common pick-up locations and the specific times at which they are demanded (e.g. weekends vs weekdays). Since data warehouses store information from other databases in a common format, all of this data (needed to answer the question) should be available and fairly easily retrieved.

[Maryam Al-Thani]: If Karwa were to integrate the information in their data warehouse with data from either Qatar Airways or the Ministry of Interior there operations would be more effective. Using this external information Karwa can find out the percent of each nationality that arrives in Qatar on average for a day, week or month. Airports are one of the most important locations that are always in demand for taxis. By analyzing the data, Karwa can ensure that the drivers at the airport speak the language(s) of their passenger. If most people arriving at Qatar Airport on a daily basis speak Indian or Sri Lankan, Karwa can ensure that most drivers that are at the airport speak either Indian or Sri Lankan. This would help solve the language barrier between driver and passenger.

[Fatima Abdulla]: I think that the valuable analysis that Kawra could run is to gather information about the most traffic places or locations in Doha, and the peak times. I think that gathering these information are important to Kawra, because knowing what are the streets that they are jammed and at what times, will help the drivers. It will help them to change their way and avoid these locations at these specific times, and find short-cut ways or locations that they are not busy. I think this information will help them to get more customers in less time. I think that Kawra should use external sources such as the Ministry of interior to help them getting these information.

[Al-Jawhra Al-Mana]: I think Karwa should answer the question "What are the most visited places in Qatar?" and to answer this question they can refer back to the history of taxi deliveries they made, and look for the most visited landmarks in Qatar. Karwa needs to gather this information so that they can assign more available taxis in that area, and this will make the time of ordering taxi and waiting for it to arrive less than it usually do for the customers.

[Orkhan Rustamzade]: I think that one of the main challenges that faces Karwa is the punctuality. In my opinion Karwa should gather information about the traffic patterns and customer demand at that time. The roads and trips which have a lot traffic jams should be avoided and drivers should have chance to know shortcuts or other routes.These data can be collected by taxi drivers taking notes regarding the traffic jams. these short notes have to be integrated into database and worked out by the IT team to dot the places with large traffic jams and the times. Moreover; customer demand at specific times can be recorder and matched up against the traffic jam information. These will help karwa to be more flexible and satisfy customers.

[Aisha Al-Zaman]: One question that might help Karwa in increasing their system's effectiveness can be: " What is the most effective way to communicate with our customer?" Communication is certainly a very important factor plays in the system's productivity and efficiency. Karwa can collect their data by looking back to history of the way they used to communicate with customer and evaluate it. They can also start doing a research by surveying their potential customer and asking about their prefrences in terms of the way of communication. By that, Karwa can implement this analysis by having data and information being researched and tested.

[Noor Al-Mohannadi]: I think Karwa should focus on the challenges they face such as "How to overcome the language barriers?" and "How to solve the inconsistency in naming and numbering streets and landmarks". The first question can be answered through collecting what are the most requested languages other than the ones the application offers. The second question regarding the inconsistency problem can lead to inefficient services, therefore it is important to gather information and update the names of the streets for Karwa's services to be more effective.

[Firas Bata]: In order to improve their operations, Karwa needs to analyze data about first, the demographics of Qatar (Which areas are the most populated and what not) and second, car ownership in Qatar. That is, Karwa needs to understand how to allocate its taxis according to areas with high demand (areas with high population density and low car ownership). It is highly unlikely that Karwa has the data regarding car ownership. However, since Karwa is government owned, it could easily request that data from the traffic department.

[Mohammed Hadi Takiddin]: Question 1. I think Karwa should focus on collecting the data that deals with time since time is very important in their services. By time I mean the measurement of the required time for the customers to request a taxi. This includes the time needed to log in the system, choose the location, and confirm the order. The second time issue that they should focus on collecting is the delivering time, how long will it take the Taxi to arrive to the customer. I think by focusing on these the company will be able to have enough data to make their analysis and eventually improve their services and understand the risks that might affect their services.

[Dalia Saleh Hassan]: Question 1: For Karwa customers, the most important thing that they want in the service in order to be satisfied is making the order easily without facing any problems. Before implementing the system, the communication process was one of the most challenging noises that the company has. In order to measure how successful the system was, the company should focus on collecting data about the requesting process itself. The languages that the customers used and the languages that the taxi drivers used while communicating with the customers. All of these data can be very crucial when it comes to evaluating the implementation of the new system and will help the company making decisions regarding it.

[M Hammad Abbasi]:

Regarding question 1, Karwa could ask how long it takes for taxis to reach their customers, by tracking taxi GPS. Karwa can analyse the locations of the customer pickups and the current locations of their drivers and allocate effectively in this manner. It can use information from the government, Google Earth, traffic police and other data warehouses to composite together a map or a ETA on each taxi per location.

[Ayah Abujarbou] Karwa should check their data and see what area's did the drivers have most demand and then they should analyze that in a way that they see what areas had the most demand and at what time were they demanded and then plot that to make sure that they can better allocate their taxi's in the future. given that this currently is one of Karwa's main problems. thus, it should be one of the main analysis they should focus on.

[Nijat Ibrahimov]

Karwa obviously has demand-supply problem. Sometimes they have cabs in one part of the city where they don't have so much demand. The main question here is: How is Karwa customers are spread around the city depending on the location and time of day or week? Having data of the received orders and analyzing them depending on the day time and location will give Karwa management valuable information about demand – supply of the Doha city. Analyzing this data will explain how Karwa cubs have to be distributed across the city, which improve their profits and customer satisfaction.

[Abdullrahman Al-Muftah] Karwa seems to be dealing with a lot of issues regarding finding a specific location to pick up its customer.Karwa has recognized that not all pick up points are known landmarks. To solve this problem, Karwa should answer the following question, "what are some of the places drivers are unable to easily locate" by collecting this data from Karwa's data warehouse. By having this infromation, Karwa will have the ability to track down the list of places driver are unfamiliar, with. This, will allow "unfamiliar" destinations to be easily located the next time a customer requests it.

[Hamsa M Al-Massri] Question 1: An analysis that Karwa might do on the data they collect is the analysis of their drivers and the areas they are mostly familiar with. The question might be, “who from the drivers is mostly familiar with a specific area.” They need to collect history of the drivers and their knowledge, and the different areas in Doha.

[Anas Ali Chaudry]

Question 1 Karwa might collect the data about the current locations of the drivers on duty. The question might be, "Where exactly a driver is positioned at a given point in time?" This information can help Karwa to assign a customer to a driver by checking the nearest possible location of the driver with respect to the customer. Karwa will need to analyse time and location of a driver on duty in this case. This will allow an efficient way for Karwa to assign a taxi to a customer as soon as possible when they make a reservation through their smart phones.

[Najla Al-Madhadhai]: The people at Karwa should check or track the demand so that they can know from which area in qatar people mostly demand taxis. this way, people or customers will be able of feeling that they are more appreciated and karwa drivers will not waste too much time looking for customers. this way, karwa can save more time and more petrol instead of wasting it looking for customers in demands areas.

[Sara Al-Mannai] Question 1 Karwa can track their customer's regular travels and help in them in stationing their taxi drivers in places where there is a demand on taxis. It will help them in compiling the data and decreasing the time it takes for people to order a taxi if they were planning on it since there is a taxi in the area already. They can collect the information from their new system where they can track the maps of the people ordering the taxi and keep a record of it.

=
An analysis that Karwa might do on the data they collect in their data warehouse that would help them run their business more effectively by asking how are they going to allocate their drivers around Qatar and how are they going to meet their demand? karwa need to use data about their demand and figure out the patterns and then they need to figure out a way to aloocate their drivers in the places that are most popular and in demand. =====

[Mashael Al Misnad]

=
1) By collecting data about locations of drivers at a given time, Karwa can run their business more effectively. This is because they can tell their customers how long the driver needs to reach them and use the driver that is nearest to the customer. Question: Where is the driver at a specific time? They would thus need to collect the exact location of the driver whether interms of coordinates or street and the time. This will eventually speed up the Karwa services and make it more effective. =====